Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
High
|
Unassigned |
Bug Description
[ Impact ]
The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS. The default kernel is bionic/
This problem only appears on metal instances, which uses NVME instead of XVDA devices.
[ Fix ]
It was discovered that after reverting the following two commits from upstream stable the 4.15 kernels can be booted again on the affected AWS metal instance:
PCI/MSI: Enforce that MSI-X table entry is masked for update
PCI/MSI: Enforce MSI[X] entry updates to be visible
[ Test Case ]
Deploy a r5.metal instance on AWS with a bionic image, which should boot initially with bionic/
[ Where problems could occur ]
These two commits are part of a larger patchset fixing PCI/MSI issues which were backported to some upstream stable releases. By reverting only part of the set we might end up with MSI issues that were not present before the whole set was applied. Regression potential can be minimized by testing the kernels with these two reverted patches on all the platforms available.
[ Original Description ]
When creating an r5.metal instance on AWS, the default kernel is bionic/
If I remove these patches the instance correctly boots the 4.15 kernel
https:/
With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly.
This problem only appears on metal instances, which uses NVME instead of XVDA devices.
AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel.
I have been unable to capture a stack trace using 'aws get-console-
CVE References
affects: | ubuntu → linux-aws (Ubuntu) |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in linux-aws (Ubuntu Bionic): | |
importance: | Undecided → High |
status: | New → In Progress |
affects: | linux-aws (Ubuntu) → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
status: | New → Invalid |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Hey Ian, thanks for the bug report! I'm checking this on AWS.