We (Cert) just updated MAAS from 3.3.x to 3.4.0-RC1. We have, in testflinger, default partition definitions that, because of how MAAS identifies partitions and disks is very reliant on MAAS ids for disk devices and partitions.
For example, prior to the move to 3.4.0, this was the definition for one server (these change and grow more or less complex depending on the number of disks in a machine):
2 default_disks:
3 - id: '216'
4 name: nvme0n1
5 parent_disk_blkid: '216'
6 ptable: GPT
7 type: disk
8 - device: '882'
9 id: nvme0n1-part1
10 number: '882'
11 parent_disk: '216'
12 parent_disk_blkid: '216'
13 size: '536870912'
14 type: partition
15 - fstype: fat32
16 id: 882-format
17 label: efi
18 parent_disk: '216'
19 parent_disk_blkid: '216'
20 type: format
21 volume: '882'
22 - device: 882-format
23 id: 882-mount
24 parent_disk: '216'
25 parent_disk_blkid: '216'
26 path: /boot/efi
27 type: mount
28 - device: '883'
29 id: nvme0n1-part2
30 number: '883'
31 parent_disk: '216'
32 parent_disk_blkid: '216'
33 size: '1599778848768'
34 type: partition
35 - fstype: ext4
36 id: 883-format
37 label: root
38 parent_disk: '216'
39 parent_disk_blkid: '216'
40 type: format
41 volume: '883' 42 - device: 883-format
43 id: 883-mount
44 parent_disk: '216'
45 parent_disk_blkid: '216'
46 path: /
47 type: mount
As you can see, this spells out partitions on a disk with the ID of 216, where the partition id is 882 and 883 to spell out the /boot/efi filesystem and the root filesystem respectively. These IDs were pulled from MAAS and reflected what on would get from a 'maas <name> partition reads <disk_id>. This allows us to provide a means for users to define their own partition scheme (e.g. set up something ceph-like, or bcache or whatever) and then revert things to the default.
After the update, all testflinger deployments now fail seemingly because apparently the partition IDs have been changed. Looking at a dump of this machine via the MAAS CLI, the disk ID has remained the same but the partition IDs are now all it the 16,000s:
I am pretty sure that testflinger is failing because it expects to see a partition ID of 882 and 883 on disk 216, but those no longer exist.
Should we expect the partition IDs to change every time MAAS is updated, or is this a weird bug this time around (I don't think we've updated MAAS since we implemented the disk layout in testflinger, so it's possible this has always been the case and we just never had a problem with it before).
Note, the only thing that has changed on our end was the MAAS snap update to 3.4.0, we did not update anything in the testflinger agents from yesterday to today, so I'm reasonably certain this is the root cause here, at least from what I have seen over the last 30 minutes or so of poking at this.
We (Cert) just updated MAAS from 3.3.x to 3.4.0-RC1. We have, in testflinger, default partition definitions that, because of how MAAS identifies partitions and disks is very reliant on MAAS ids for disk devices and partitions.
For example, prior to the move to 3.4.0, this was the definition for one server (these change and grow more or less complex depending on the number of disks in a machine):
2 default_disks:
3 - id: '216'
4 name: nvme0n1
5 parent_disk_blkid: '216'
6 ptable: GPT
7 type: disk
8 - device: '882'
9 id: nvme0n1-part1
10 number: '882'
11 parent_disk: '216'
12 parent_disk_blkid: '216'
13 size: '536870912'
14 type: partition
15 - fstype: fat32
16 id: 882-format
17 label: efi
18 parent_disk: '216'
19 parent_disk_blkid: '216'
20 type: format
21 volume: '882'
22 - device: 882-format
23 id: 882-mount
24 parent_disk: '216'
25 parent_disk_blkid: '216'
26 path: /boot/efi
27 type: mount
28 - device: '883'
29 id: nvme0n1-part2
30 number: '883'
31 parent_disk: '216'
32 parent_disk_blkid: '216'
33 size: '1599778848768'
34 type: partition
35 - fstype: ext4
36 id: 883-format
37 label: root
38 parent_disk: '216'
39 parent_disk_blkid: '216'
40 type: format
41 volume: '883' 42 - device: 883-format
43 id: 883-mount
44 parent_disk: '216'
45 parent_disk_blkid: '216'
46 path: /
47 type: mount
As you can see, this spells out partitions on a disk with the ID of 216, where the partition id is 882 and 883 to spell out the /boot/efi filesystem and the root filesystem respectively. These IDs were pulled from MAAS and reflected what on would get from a 'maas <name> partition reads <disk_id>. This allows us to provide a means for users to define their own partition scheme (e.g. set up something ceph-like, or bcache or whatever) and then revert things to the default.
After the update, all testflinger deployments now fail seemingly because apparently the partition IDs have been changed. Looking at a dump of this machine via the MAAS CLI, the disk ID has remained the same but the partition IDs are now all it the 16,000s:
bladernr@weavile:~$ maas bladernr partitions read 8pk6f8 216 3266-44da- bdbe-2a90b75df6 17", by-dname/ nvme0n1- part2",
"device_ id": 216,
"filesystem" : {
"fstype" : "ext4",
"label" : "root", f0f7-4166- 9e62-57e6504cac 8d",
"mount_ point": "/",
"mount_ options" : ""
"system_ id": "8pk6f8",
"resource_ uri": "/MAAS/ api/2.0/ nodes/8pk6f8/ blockdevices/ 216/partition/ 16153" c024-454b- b9f2-5c3b79b296 11", by-dname/ nvme0n1- part1",
"device_ id": 216,
"filesystem" : {
"fstype" : "fat32",
"label" : "efi", af66-4594- a6d2-56e00f0971 08",
"mount_ point": "/boot/efi",
"mount_ options" : ""
"system_ id": "8pk6f8",
"resource_ uri": "/MAAS/ api/2.0/ nodes/8pk6f8/ blockdevices/ 216/partition/ 16152"
Success.
Machine-readable output follows:
[
{
"uuid": "b838b3db-
"size": 1599778848768,
"bootable": false,
"tags": [],
"used_for": "ext4 formatted filesystem mounted at /",
"type": "partition",
"path": "/dev/disk/
"uuid": "21aa8167-
},
"id": 16153,
},
{
"uuid": "94256eca-
"size": 536870912,
"bootable": false,
"tags": [],
"used_for": "fat32 formatted filesystem mounted at /boot/efi",
"type": "partition",
"path": "/dev/disk/
"uuid": "1b93141c-
},
"id": 16152,
}
]
I am pretty sure that testflinger is failing because it expects to see a partition ID of 882 and 883 on disk 216, but those no longer exist.
Should we expect the partition IDs to change every time MAAS is updated, or is this a weird bug this time around (I don't think we've updated MAAS since we implemented the disk layout in testflinger, so it's possible this has always been the case and we just never had a problem with it before).
Note, the only thing that has changed on our end was the MAAS snap update to 3.4.0, we did not update anything in the testflinger agents from yesterday to today, so I'm reasonably certain this is the root cause here, at least from what I have seen over the last 30 minutes or so of poking at this.