Systems with mutiple disk controllers may fail on boot

Bug #1849673 reported by Steven Parker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mdadm
New
Undecided
Unassigned

Bug Description

mdadm shouldn't probe disks until all controllers are online.

Notice how last two drives are two events behind.

 sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 4
    Persistence : Superblock is persistent

          State : inactive <<<<-----

           Name : hostname:1 (local to host hostname)
           UUID : a9a651a4:22a41611:d3e16f39:3eaf8095
         Events : 7747

    Number Major Minor RaidDevice

       - 8 98 - /dev/sdg2
       - 8 114 - /dev/sdh2
       - 8 146 - /dev/sdj2
       - 8 162 - /dev/sdk2

ubuntu@hostname:/var/log$ sudo mdadm --examine /dev/sdg2
/dev/sdg2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : a9a651a4:22a41611:d3e16f39:3eaf8095
           Name : hostname:1 (local to host hostname)
  Creation Time : Mon Dec 3 18:02:12 2018
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 7812235264 (3725.16 GiB 3999.86 GB)
     Array Size : 7812235264 (7450.33 GiB 7999.73 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 980085a5:43d71fe0:9d79efe6:61ffc1af

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jun 12 18:04:53 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ba15e561 - correct
         Events : 7747 <<<<-----

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)

ubuntu@hostname:/var/log$ sudo mdadm --examine /dev/sdh2
/dev/sdh2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : a9a651a4:22a41611:d3e16f39:3eaf8095
           Name : hostname:1 (local to host hostname)
  Creation Time : Mon Dec 3 18:02:12 2018
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 7812235264 (3725.16 GiB 3999.86 GB)
     Array Size : 7812235264 (7450.33 GiB 7999.73 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 40ea4305:c81c36e7:bba5dfed:58c9e85c

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jun 12 18:04:53 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : d5020aa3 - correct
         Events : 7747 <<<<------

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)

ubuntu@hostname:/var/log$ sudo mdadm --examine /dev/sdj2
/dev/sdj2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : a9a651a4:22a41611:d3e16f39:3eaf8095
           Name : hostname:1 (local to host hostname)
  Creation Time : Mon Dec 3 18:02:12 2018
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 7812235264 (3725.16 GiB 3999.86 GB)
     Array Size : 7812235264 (7450.33 GiB 7999.73 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : abfe40a8:7eab4965:a49e9cc5:2a544b85

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jun 12 18:04:48 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : f6373178 - correct
         Events : 7745 <<<<------

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

ubuntu@hostname:/var/log$ sudo mdadm --examine /dev/sdk2
/dev/sdk2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : a9a651a4:22a41611:d3e16f39:3eaf8095
           Name : hostname:1 (local to host hostname)
  Creation Time : Mon Dec 3 18:02:12 2018
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 7812235264 (3725.16 GiB 3999.86 GB)
     Array Size : 7812235264 (7450.33 GiB 7999.73 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : b26fe395:639080bf:6860768c:2a6eb759

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Jun 12 18:04:48 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : d9566329 - correct
         Events : 7745 <<<<-----

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

Revision history for this message
Steven Parker (sbparke) wrote :

Work around

sudo mdadm --detail /dev/md1
sudo mdadm --examine /dev/sdh2
Look at events for all drives. If they are out of sync by 1-4 events then we should be ok.

There are a few less aggressive ways to reassemble the disk but this one is the only one
that actually seems to work.

sudo mdadm --create /dev/md1 -l 10 -n 4 /dev/sdg2 /dev/sdh2 /dev/sdj2 /dev/sdk2

It is always a good idea to check the drive status

sudo sudo mdadm --detail /dev/md1

If you see removed drives then add them to array.

Add a dropped drive
sudo mdadm --manage /dev/md1 --add /dev/sdk2

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.