Cannot deploy to mirrored EFI partitions

Bug #1825011 reported by José Pekkarinen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
curtin
Fix Committed
Wishlist
Unassigned

Bug Description

Hi,

On an fce deployment, I need to deploy a centos node that uses
sw raid for the OS. The configuration of the storage layout from
fce is:

  centos-gpu:
    disks:
    - id: sda
      disk: 0
      type: disk
      ptable: gpt
    - id: sdb
      disk: 1
      type: disk
      ptable: gpt

      # Boot
    - id: sda-part1
      name: efi
      device: sda
      size: 100M
      type: partition
      number: 1
    - id: sdb-part1
      name: efi
      device: sdb
      size: 100M
      type: partition
      number: 2

    - id: sda-part2
      device: sda
      size: 398800M
      type: partition
      number: 3
    - id: sdb-part2
      device: sdb
      size: 398800M
      type: partition
      number: 4

    - id: md0
      name: md0
      raidlevel: 1
      type: raid
      spare_devices: []
      devices:
      - sda-part1
      - sdb-part1
    - id: md0-format
      volume: md0
      label: boot
      type: format
      fstype: vfat
    - id: md0-mount
      device: md0-format
      path: /boot/efi
      type: mount

    - id: md1
      name: md1
      raidlevel: 1
      type: raid
      spare_devices: []
      devices:
      - sda-part2
      - sdb-part2
    - id: md1-format
      volume: md1
      label: root
      type: format
      fstype: ext4
    - id: md1-mount
      device: md1-format
      path: /
      type: mount

Deploying the machine, everything looks to be correct, when it reaches the
point to reboot, only pxe interfaces are detected as boot devices, no boot
disk detected. Booting the node in rescue mode, and mounting the fs in mnt
the lsblk looks like:

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 180.3M 0 loop /media/root-ro
sda 8:0 0 745.2G 0 disk
├─sda1 8:1 0 92M 0 part
│ └─md127 9:127 0 91M 0 raid1 /mnt/boot/efi
└─sda2 8:2 0 371.4G 0 part
  └─md126 9:126 0 371.3G 0 raid1 /mnt
sdb 8:16 0 745.2G 0 disk
├─sdb1 8:17 0 92M 0 part
│ └─md127 9:127 0 91M 0 raid1 /mnt/boot/efi
└─sdb2 8:18 0 371.4G 0 part
  └─md126 9:126 0 371.3G 0 raid1 /mnt

# ls /mnt/boot/grub2
fonts grub.cfg grubenv i386-pc locale
#

Curtin information will follow.

Thanks!

José.

Related branches

Revision history for this message
José Pekkarinen (koalinux) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

Hi,

I suspect that you're missing 'grub_device: true' on your two disks in the raid1 array.

Adjusting your config like so should work:

    - id: sda
      disk: 0
      type: disk
      ptable: gpt
      grub_device: true
    - id: sdb
      disk: 1
      type: disk
      ptable: gpt
      grub_device: true

An existing storage config for this scenario (MirrorbootUEFI) is here:

https://git.launchpad.net/curtin/tree/examples/tests/mirrorboot-uefi.yaml

Revision history for this message
Ryan Harper (raharper) wrote :

This likely will be resolved with an updated config. Please let me know if those suggested changes resolve your boot issue.

Changed in curtin:
status: New → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Please provide the data requested here:

https://discourse.maas.io/t/getting-curtin-debug-logs/169

And /etc/maas/preseeds/<whatever preseed is being changed>

Changed in maas:
status: New → Incomplete
Revision history for this message
José Pekkarinen (koalinux) wrote :

mmmm... In my case it doesn't make it, can I ask why your example have
one disk using dos partition, and one using gpt?

Revision history for this message
Ryan Harper (raharper) wrote :

That looks like a cut-n-paste error. I'll fix that, thanks.

W.r.t your failure, as Andres requested, definitely need the logs to determine what else might be failing.

The config you pasted looks odd, so I suspect something else is composing the actual storage config that gets sent.

Revision history for this message
José Pekkarinen (koalinux) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

I see now, you're attempting to mirror /boot/efi; that's not a support configuration. The general issues are:

1) Firmware that handles EFI does not expect these partitions to be under raid.
2) The raid format may obfuscate the devices from the firmware (it won't see the devices)
3) Given that firmware doesn't know about raid, one needs two entries to handle the case that
either device is bad
4) There are sync issues, in the case that the firmware updates the EFI partition, this data may be lost due to linux raid syncing over the "bad" data on the other disk.

The efibootmgr output looks strange for the target devices:

......A.........................T.H.N.S.F.8.8.0.0.C.A.M.E.................................................................................. ......A.........................T.H.N.S.F.8.8.0.0.C.A.M.E...

This may be related to the raid on the devices, but could also just be something else between firmware and those disks.

summary: - centos 7 deployed machine installs grub i386
+ Cannot deploy to mirrored EFI partitions
Revision history for this message
Ryan Harper (raharper) wrote :

I'm marking this as triaged as I understand the scenario, I've outlined that there is work to do make this work and it needs validation to ensure things are reliable. Some elements are out of our control w.r.t how Firmware treats a raided partition.

Changed in curtin:
importance: Undecided → Wishlist
status: Incomplete → Triaged
Revision history for this message
José Pekkarinen (koalinux) wrote :

These are dell poweredge c4130, so perhaps we can get better insight on
the firmware support.

Revision history for this message
José Pekkarinen (koalinux) wrote :

Ok, to remove the firmware variable from the bug, I'm trying right
now this config:

  centos-gpu:
    disks:
    - id: sda
      disk: 0
      type: disk
      ptable: gpt
      grub_device: true
    - id: sdb
      disk: 1
      type: disk
      ptable: gpt
      grub_device: true

      # Boot
    - id: sda-part1
      name: efi
      device: sda
      size: 100M
      type: partition
      number: 1
    - id: sdb-part1
      name: efi
      device: sdb
      size: 100M
      type: partition
      number: 2

    - id: sda-part2
      device: sda
      size: 398800M
      type: partition
      number: 3
    - id: sdb-part2
      device: sdb
      size: 398800M
      type: partition
      number: 4

    - id: sda-part1-format
      volume: sda-part1
      label: boot
      type: format
      fstype: vfat
    - id: sda-part1-mount
      device: sda-part1-format
      path: /boot/efi
      type: mount

    - id: md1
      name: md1
      raidlevel: 1
      type: raid
      spare_devices: []
      devices:
      - sda-part2
      - sdb-part2
    - id: md1-format
      volume: md1
      label: root
      type: format
      fstype: ext4
    - id: md1-mount
      device: md1-format
      path: /
      type: mount

It still doesn't make it, I'll be uploading curtin info in the followings.

Revision history for this message
José Pekkarinen (koalinux) wrote :

False alarm this later config actually make it, just seems that there was some hicups in the node
to get the ephemeral OS booting to run the curtin, but the node is currently deployed with one efi
raw fat partition, one empty fat on the second disk, and a raid on top using the second partition
of any node.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Marking this as invalid, since EFI doesn't really support software raid, and there's not much MAAS can do about it.

Changed in maas:
status: Incomplete → Invalid
Revision history for this message
Ryan Harper (raharper) wrote :

We recently landed code to enable grub2 in ubuntu to keep multiple ESPs in-sync. with this in place, one can now supported multiple ESPs, and example config used is here:

https://git.launchpad.net/curtin/tree/examples/tests/mirrorboot-uefi.yaml

Note, instead of creating /boot/efi on the raid mirror itself, one creates multiple ESP partitions,
mount one at /boot/EFI and grub2 will handle replicating the data whenever there's an
update to grub.

Changed in curtin:
status: Triaged → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.