kvm: properly tear down PV features on hibernate
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned | ||
linux-aws (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
In LP: #1918694 we applied a fix and a workaround to solve the hibernation issues on c5.18xlarge. The workaround was in the form of a SAUCE patch:
"UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot"
It looks like we can replace this workaround with a proper fix, by applying this patch:
http://<email address hidden>/
This is required because various PV features (Async PF, PV EOI, steal time) work through memory shared with hypervisor and when we restore from hibernation we must properly tear down all these features to make sure hypervisor doesn't write to stale locations after we jump to the previously hibernated kernel.
For this reason it is safe to apply this patch set also to all the generic kernels and not just AWS.
[Test plan]
This can be easily tested on AWS (but it should be reproduced by hibernating any kvm instance with multiple CPUs). Create a c5.18xlarge instance, run the memory stress test script (the same test script that we are using to stress test hibernation), trigger the hibernate event, trigger the resume event. Repeat a couple of times and the problem is very likely to happen.
[Fix]
On the AWS kernel replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" with:
http://<email address hidden>/
For the other kernels, simply apply this patch set.
The fix has been tested extensively in the AWS infrastructure with positive results.
[Regression potential]
This new code introduced by the fix can be executed also when a CPU is put offline, so we may see potential regressions in the KVM CPU hot-plugging.
description: | updated |
description: | updated |
description: | updated |
summary: |
- aws: proper fix for c5.18xlarge hibernation issues + properly tear down KVM PV features on hibernate |
summary: |
- properly tear down KVM PV features on hibernate + kvm: properly tear down PV features on hibernate |
description: | updated |
Changed in linux (Ubuntu Hirsute): | |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu Groovy): | |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu Focal): | |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Groovy): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Hirsute): | |
status: | In Progress → Fix Committed |
Changed in linux-aws (Ubuntu Focal): | |
status: | New → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1920944
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.