Key repository setup fails after simultaneously removing several ceph-mon units

Bug #1921561 reported by Connor Chamberlain
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
New
Undecided
Unassigned
OpenStack Keystone Charm
Invalid
Undecided
Unassigned

Bug Description

Ceph-mon will reach a blocked state with a status of "Unit not clustered (no quorum)" if several clustered ceph-mon units are simultaneously removed while leaving enough units for quorum. This seems to be caused by the key repository setup failing in keystone.

This problem involves bad paths to keyrings on ceph-mon units. The expected keyring path is /var/lib/ceph/mon/ceph-<juju-machine>/keyring but the units are using /var/lib/ceph/mon/ceph-/keyring. An abbreviated portion of unit logs which shows this is included below.

WARNING osd-relation-changed 2021-03-26T23:33:15.200+0000 7f07e7300700 -1 auth: unable to find a keyring on /var/lib/ceph/mon/ceph-/keyring: (2) No such file or directory
WARNING osd-relation-changed 2021-03-26T23:33:15.200+0000 7f07e7300700 -1 AuthRegistry(0x7f07e0058960) no keyring found at /var/lib/ceph/mon/ceph-/keyring, disabling cephx
WARNING osd-relation-changed exported keyring for client.rgw.juju-81dd13-bug-hunt2-6

This bug is observed in a healthy test deployment which initially includes keystone, three ceph-mon units, among others. To reproduce this bug, four mon units are added then later removed after full deployment. The remaining ceph-mon units enter a blocked state after losing quorum.

On keystone, the following error is found:

INFO juju-log Checking no pids for apache2 exist
INFO juju-log Installing [] with options: ['--option=Dpkg::Options::=--force-confold']
ERROR juju-log Key repository setup failed, will retry in config-changed hook: Command '['sudo', '-u', 'keystone', 'keystone-manage', 'credential_migrate']' returned non-zero exit status 1.
WARNING config-changed ERROR no relation id specified

keystone version: 18.0.0
ceph-mon version: 15.2.8
Juju version: 2.8.10-focal-amd64
OpenStack version: openstack 5.2.0

Tags: scaleback
Revision history for this message
Connor Chamberlain (lcvcode) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This does not affect the keystone charm.

Changed in charm-keystone:
status: New → Invalid
tags: added: scaleback
Revision history for this message
Billy Olsen (billy-olsen) wrote :

The mon units have lost quorum as they have expanded to 7 units per the description and when scaled back, they no longer will have the necessary quorum (4 units since it knows about 7 units). Therefore, this bug is a duplicate of bug #1833252.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.