focal: backport kexec fallback patch

Bug #1969365 reported by Dan Watkins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Triaged
Low
Unassigned

Bug Description

It would be great if focal's systemd could have https://github.com/systemd/systemd/commit/71180f8e57f8fbb55978b00a13990c79093ff7b3 backported to it.

[Impact]

We have observed that kexec'ing to another kernel will fail as the drive containing the `kexec` binary has been unmounted by the time systemd attempts to do so, indicated in the console:

         Starting Reboot via kexec...
[ 163.960938] shutdown[1]: (sd-kexec) failed with exit status 1.
[ 163.963463] reboot: Restarting system

[Test Plan]

1) Launch a 20.04 instance
2) `apt-get install kexec-tools`
3) In `/boot`, filling in whatever <cmdline> needed in your environment:

kexec -l vmlinuz --initrd initrd.img --append '<cmdline>'

4) `reboot`

(I have reproduced this in a single-disk VM, so I assume it reproduces ~everywhere: if not, `apt-get remove kexec-tools` before the `reboot` could be used to emulate the unmounting.)

[Where problems could occur]

Users could inadvertently be relying on the current behaviour: if they have configured their systems to kexec, they currently will be rebooting normally, and this patch would cause them to start actually kexec'ing.

[Other info]

We're currently maintaining a systemd tree with only this patch added to focal's tree: this patch has received a bunch of testing from us in focal.

This patch landed in v246, so it's already present in supported releases later than focal.

Related branches

Revision history for this message
Nick Rosbrook (enr0n) wrote :

The patch for this is indeed present in Jammy and newer. I don't currently see a strong enough reason to SRU this to Focal, but if you or someone else thinks it's important, feel free to explain here.

Changed in systemd (Ubuntu):
status: New → Fix Released
Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the reply, Nick!

I think it's important enough to land because:

* you cannot execute `kexec` correctly on an Ubuntu 20.04 system without this patch (it will fall back to performing a full reboot),
* kexec can be used to reduce downtime for critical systems which take a long time to reboot (e.g. because they have a lot of hardware to initialise), and
* kexec-tools is in main (and has been since at least trusty) which indicates to me that it is expected that kexec will work on Ubuntu

I'd also add that the patch is three lines in a code path which is only used by people opting into using `kexec`, so the potential downside is pretty minimal.

(I'll set the bug back to New for now, until you have a chance to respond.)

Changed in systemd (Ubuntu):
status: Fix Released → New
Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu):
status: New → Fix Released
Changed in systemd (Ubuntu Focal):
importance: Undecided → Low
Revision history for this message
Nick Rosbrook (enr0n) wrote :

Fair enough. Thanks for the justification and for filling out the SRU template already.

Changed in systemd (Ubuntu Focal):
status: New → Triaged
tags: added: systemd-sru-next
Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks Nick, much appreciated!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

The test case with removing kexec-tools before rebooting works for me. But I can only reproduce the issue by doing that. Can you share more about your setup so we can understand why exactly you hit this?

I think that having this fallback makes sense, and is fine for an SRU, but it would be good to understand the root cause better.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.