Commission fails testing on smartctl because drive serial numbers don't match

Bug #2011733 reported by Paul Jonason
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned

Bug Description

I'm using debian/package 1:3.1.1-10918-g.9cbd96fd2-0ubuntu1~20.04.1

Commissioning under 22.04 fails testing with the following information:

Unable to run 'smartctl-validate': Storage device 'MZ7L3480HCHQAD3' with serial '5002538f02142ce9' not found!

This indicates the storage device has been removed or the OS is unable to find it due to a hardware failure. Please re-commission this node to re-discover the storage devices, or delete this device manually.

Given parameters:
{'storage': {'argument_format': '{path}', 'type': 'storage', 'value': {'id': 77, 'id_path': '/dev/disk/by-id/wwn-0x5002538f02142ce9', 'model': 'MZ7L3480HCHQAD3', 'name': 'sda', 'serial': '5002538f02142ce9'}}}

Discovered storage devices:
[{'NAME': 'sda', 'MODEL': 'MZ7L3480HCHQAD3', 'SERIAL': 'S6NANE0T137235', 'MAJ:MIN': '8:0', 'MODEL_ENC': 'MZ7L3480HCHQAD3'}, {'NAME': 'nvme0n1', 'MODEL': 'Dell Ent NVMe CM6 MU 6.4TB', 'SERIAL': '22X0A0GATCA8', 'MAJ:MIN': '259:0'}]
Discovered interfaces:
{'b4:96:91:e6:31:98': 'eno12399'}

Commissioning under 20.04 works fine, as smartctl does not seem to verify serial numbers:

INFO: Verifying SMART support for the following drive: /dev/sda
INFO: Running command: sudo -n smartctl --all /dev/sda
INFO: SMART support is available; continuing...
INFO: Verifying SMART data on /dev/sda
INFO: Running command: sudo -n smartctl --health /dev/sda
SUCCESS: SMART validation has PASSED for: /dev/sda
--------------------------------------------------------------------------------
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-126-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

This bug appears similar to https://bugs.launchpad.net/maas/+bug/1869116, which involves a difference in disk name returned from lsblk versus that returned by LXD, only this time it's the serial number.

It seems lsblk has the correct serial number rather than LXD. This behavior has been present a while, though hasn't affected commissions until trying 22.04 as the commissioning ephemeral OS.

rsyslog snippet:

2023-03-14T14:59:05-05:00 3CKTQN3 cloud-init[3543]: Starting testing scripts...
2023-03-14T14:59:05-05:00 3CKTQN3 cloud-init[3543]: Installing apt packages for smartctl-validate (id: 3236, script_ve
rsion_id: 70)
2023-03-14T14:59:05-05:00 3CKTQN3 cloud-init[3543]: Installing apt packages for smartctl-validate (id: 3237, script_ve
rsion_id: 70)
2023-03-14T14:59:05-05:00 3CKTQN3 sudo: root : PWD=/ ; USER=root ; COMMAND=/usr/bin/apt-get -qy --no-install-recom
mends install smartmontools
2023-03-14T14:59:05-05:00 3CKTQN3 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
2023-03-14T14:59:10-05:00 3CKTQN3 dhclient[4132]: XMT: Solicit on eno12409, interval 62840ms.
2023-03-14T14:59:13-05:00 3CKTQN3 systemd[1]: Reloading.
2023-03-14T14:59:13-05:00 3CKTQN3 systemd[1]: message repeated 2 times: [ Reloading.]
2023-03-14T14:59:13-05:00 3CKTQN3 systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-56-generic] (local bu
ild)
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontool
s.org
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Opened configuration file /etc/smartd.conf
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smar
td.conf
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scan
ning devices
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda, type changed from 'scsi' to 'sat'
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda [SAT], opened
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda [SAT], MZ7L3480HCHQAD3, S/N:S6NANE0T137235, WWN:5-002
538-f02142ce9, FW:HJ53, 480 GB
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda [SAT], not found in smartd database.
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - n
o Attribute 197
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/nvme0, opened
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/nvme0, Dell Ent NVMe CM6 MU 6.4TB, S/N:22X0A0GATCA8, FW:2
.1.8, 6.40 TB
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd
.MZ7L3480HCHQAD3-S6NANE0T137235.ata.state
2023-03-14T14:59:13-05:00 3CKTQN3 smartd[4646]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Del
l_Ent_NVMe_CM6_MU_6_4TB-22X0A0GATCA8.nvme.state
2023-03-14T14:59:13-05:00 3CKTQN3 systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: pam_unix(sudo:session): session closed for user root
2023-03-14T14:59:16-05:00 3CKTQN3 cloud-init[3543]: Starting smartctl-validate (id: 3236, script_version_id: 70)
2023-03-14T14:59:16-05:00 3CKTQN3 cloud-init[3543]: Starting smartctl-validate (id: 3237, script_version_id: 70)
2023-03-14T14:59:16-05:00 3CKTQN3 cloud-init[3543]: Failed to execute smartctl-validate (id: 3237, script_version_id:
70): 2
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: root : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl --all /dev/nvme0n1
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: pam_unix(sudo:session): session closed for user root
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: root : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl --health /dev/nvme0n
1
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
2023-03-14T14:59:16-05:00 3CKTQN3 sudo: pam_unix(sudo:session): session closed for user root
2023-03-14T14:59:16-05:00 3CKTQN3 cloud-init[3543]: Finished smartctl-validate (id: 3236, script_version_id: 70): 0
2023-03-14T14:59:16-05:00 3CKTQN3 cloud-init[3543]: 1 test scripts failed to run

Revision history for this message
Paul Jonason (pjonason) wrote :

From maas-lshw:

          <node id="disk" claimed="true" class="disk" handle="GUID:77151294-29e9-4479-...">
           <description>ATA Disk</description>
           <product>MZ7L3480HCHQAD3</product>
           <physid>0.0.0</physid>
           <businfo>scsi@2:0.0.0</businfo>
           <logicalname>/dev/sda</logicalname>
           <dev>8:0</dev>
           <version>HJ53</version>
           <serial>S6NANE0T137235</serial>
           <size units="bytes">480103981056</size>

Revision history for this message
Paul Jonason (pjonason) wrote :

From 40-maas-01-machine-resources:

"storage": {
            "disks": [
                {
                    "id": "nvme0n1",
                    "device": "259:0",
                    "model": "Dell Ent NVMe CM6 MU 6.4TB",
                    "type": "nvme",
                    "read_only": false,
                    "size": 6401252745216,
                    "removable": false,
                    "wwn": "eui.00000000000000008ce38ee20decd501",
                    "numa_node": 0,
                    "device_path": "pci-0000:65:00.0-nvme-1",
                    "block_size": 4096,
                    "firmware_version": "2.1.8",
                    "rpm": 0,
                    "serial": "22X0A0GATCA8",
                    "device_id": "nvme-eui.00000000000000008ce38ee20decd501",
                    "partitions": []
                },
                {
                    "id": "sda",
                    "device": "8:0",
                    "model": "MZ7L3480HCHQAD3",
                    "type": "scsi",
                    "read_only": false,
                    "size": 480103981056,
                    "removable": false,
                    "numa_node": 0,
                    "device_path": "pci-0000:67:00.0-sas-phy15-lun-0",
                    "block_size": 4096,
                    "firmware_version": "HJ53",
                    "rpm": 0,
                    "serial": "5002538f02142ce9",
                    "device_id": "wwn-0x5002538f02142ce9",
                    "partitions": [
                    ]
                }
            ],
            "total": 4
        },

Changed in maas:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → 3.4.0
Changed in maas:
status: Confirmed → Won't Fix
status: Won't Fix → Triaged
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.