During an Autopilot deployment on gMAAS, Juju had hung running a mon-relation-changed hook
$ ps afxwww | grep -A 4 [m]on-relation-changed 29118 ? S 0:03 \_ /usr/bin/python /var/lib/juju/agents/unit-ceph-1/charm/hooks/mon-relation-changed 37996 ? S 0:00 \_ /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --zap-disk /dev/sdb 37998 ? S 0:00 \_ /usr/bin/python /usr/sbin/ceph-disk prepare --fs-type xfs --zap-disk /dev/sdb 38016 ? D 0:00 \_ /sbin/sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb
This had been in this state for > 10m. The logs[1] from the unit in question showed that something was up with the partition tables on that disk.
I fixed this by hand using gdisk[2]
[1] https://pastebin.canonical.com/135426/ [2] http://paste.ubuntu.com/11887096/
During an Autopilot deployment on gMAAS, Juju had hung running a mon-relation- changed hook
$ ps afxwww | grep -A 4 [m]on-relation- changed juju/agents/ unit-ceph- 1/charm/ hooks/mon- relation- changed ceph-disk- prepare --fs-type xfs --zap-disk /dev/sdb
29118 ? S 0:03 \_ /usr/bin/python /var/lib/
37996 ? S 0:00 \_ /bin/sh /usr/sbin/
37998 ? S 0:00 \_ /usr/bin/python /usr/sbin/ceph-disk prepare --fs-type xfs --zap-disk /dev/sdb
38016 ? D 0:00 \_ /sbin/sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb
This had been in this state for > 10m. The logs[1] from the unit in question showed that something was up with the partition tables on that disk.
I fixed this by hand using gdisk[2]
[1] https:/ /pastebin. canonical. com/135426/ paste.ubuntu. com/11887096/
[2] http://