smartctl-validate is borked in a recent release
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Lee Trager | ||
2.7 |
New
|
Undecided
|
Unassigned | ||
2.8 |
Fix Released
|
Critical
|
Lee Trager | ||
lxd |
Fix Released
|
Unknown
|
Bug Description
Bug (maybe?) details first, diatribe second.
Bug Summary: multi-hdd / raid with multiple drives / multiple devices or something along those lines cannot be commissioned anymore: 2.4.x worked fine, 2.7.0 does not.
Here is the script output of smartctl-validate:
-----
# /dev/sda (Model: PERC 6/i, Serial: 6842b2b0740e990
Unable to run 'smartctl-
This indicates the storage device has been removed or the OS is unable to find it due to a hardware failure. Please re-commission this node to re-discover the storage devices, or delete this device manually.
Given parameters:
{'storage': {'argument_format': '{path
}', 'type': 'storage', 'value': {'id_path': '/dev/disk/
}
}
}
Discovered storage devices: [
{'NAME': 'sda', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e99
},
{'NAME': 'sdb', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e99
},
{'NAME': 'sr0', 'MODEL': 'TEAC_DVD-
}
]
Discovered interfaces: {'xx: xx: xx: xx: xx: xx': 'eno1'
}
-----
-----
# /dev/sdb (Model: PERC 6/i, Serial: 6842b2b0740e990
Unable to run 'smartctl-
This indicates the storage device has been removed or the OS is unable to find it due to a hardware failure. Please re-commission this node to re-discover the storage devices, or delete this device manually.
Given parameters: {'storage': {'argument_format': '{path
}', 'type': 'storage', 'value': {'id_path': '/dev/disk/
}
}
}
Discovered storage devices: [
{'NAME': 'sda', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e99
},
{'NAME': 'sdb', 'MODEL': 'PERC_6/i', 'SERIAL': '6842b2b0740e99
},
{'NAME': 'sr0', 'MODEL': 'TEAC_DVD-
}
]
Discovered interfaces: {'xx: xx: xx: xx: xx: xx': 'eno1'
}
-----
You can see that it says the storage cannot be found and immediately lists it as a discovered device. It does it for both tests (one for each drive), and for both servers
Bug Details:
I had maas 2.4.x for the longest time over my journey (see below journey) and have never had any problems re-commissioning (or deleting and re-discovering over boot PXE) 2 of my servers (r610, r710).
r610 has an iPERC 6, four 10K X00GB drives configured in a RAID10, 1 virtual disk.
r710 has an iPERC 6, 6x 2TB drives, configured in a RAID10, 2 virtual disks
So commission after commission trying to get through my journey, 0 problems. After I finally get everything figured out on the juju, network/vlan, quad-nic end, I go to re-commission and I cannot. smartctl-validate fails on both, over and over again. I even destroyed and re-created the raid/VDs, still not.
After spending so much time on it I remembered that it was the first time I had tried to re-commission these two servers since doing an upgrade from 2.4.x->2.7 in an effort to use the updated KVM integration to add a couple more guests. Once I got all everything figured out I went to re-commission everything and boom.
[Upgrade path notes]
In full disclosure, in case this matters. I was on apt install of 2.4.x and using snap for 2.7, except it didn't work. So I read on how to do apt 2.7 and did that and did not uninstall snap 2.7 yet. I wanted to migrate from apt to snap but do not know how to without losing all maas data and could not find docs on it, so a problem for another day. But in case that is part of the problem for some odd reason, I wanted to share.
[Diatribe]
My journey to get maas+juju+
I did want to say thanks to those made/maintain maas. Despite the problems I somehow always run into I have enjoyed figuring it out.
-Red
Related branches
- Lee Trager (community): Approve
-
Diff: 57 lines (+26/-3)1 file modifiedsrc/machine-resources/src/machine-resources/Gopkg.lock (+26/-3)
- Lee Trager (community): Approve
-
Diff: 57 lines (+3/-26)1 file modifiedsrc/machine-resources/src/machine-resources/Gopkg.lock (+3/-26)
- MAAS Lander: Approve
- Lee Trager (community): Approve
-
Diff: 22 lines (+2/-2)1 file modifiedsrc/machine-resources/src/machine-resources/Gopkg.lock (+2/-2)
- Lee Trager (community): Approve
-
Diff: 22 lines (+2/-2)1 file modifiedsrc/machine-resources/src/machine-resources/Gopkg.lock (+2/-2)
tags: | removed: champagne |
Changed in lxd: | |
status: | Unknown → Fix Released |
Changed in lxd: | |
status: | Fix Released → Unknown |
Changed in maas: | |
importance: | Undecided → High |
importance: | High → Critical |
milestone: | none → 2.9.0b1 |
assignee: | nobody → Lee Trager (ltrager) |
Changed in lxd: | |
status: | Unknown → New |
Changed in maas: | |
status: | Triaged → Fix Committed |
no longer affects: | util-linux (Ubuntu) |
Changed in lxd: | |
status: | New → Fix Released |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Hi, can you please attach the output of the "50-maas- 01-commissionin g" commissioning script?