Removing penultimate unit from percona-cluster service renders service unusable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Percona Cluster Charm |
Triaged
|
Medium
|
Unassigned | ||
percona-cluster (Juju Charms Collection) |
Invalid
|
Medium
|
Unassigned |
Bug Description
When units are removed from percona-cluster service, the removing unit's mysql/percona is not shut down. When this is the penultimate unit in the cluster the remaining unit will loose quorum and switch to 'Disconnected' state and will disallow any queries to the database.
Of course, adding new units will fail too.
The way to recover from this is to shut down mysql/percona on the remaining unit, and then bootstrap it, with:
# service mysql bootstrap-pxc
After the unit is up and running it will allow both read and write queries. From there one can 'juju add-unit' to add more units to the service.
This is unfixable at the moment as juju doesn't support hooks that are run before -departed hooks. (See bug https:/
The workaround is to stop mysql on the unit to be removed, prior running 'juju remove-unit'. That way percona unit being stopped will signalize to the rest of the cluster (actually to the only remaining node) that it is being shut down in controlled manner and the remaining unit will continue to operate normally.
Once 'about-to-depart' (or similar) hook is implemented in juju this bug will be fixed.
Changed in charm-percona-cluster: | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in percona-cluster (Juju Charms Collection): | |
status: | Triaged → Invalid |
Mario
Looking at the date on this bug report, I think this was a trusty install; I tried to reproduce on xenial, and the last remaining unit went into state 'Initialized' not 'Disconnected'. When I then re-added another two units, they did correctly cluster with the remaining unit, and it did become the donor for the other two units.
I know there have been some improvements in this area between 5.5 -> 5.6, so this might be a much better story on xenial now.
That said, we probably should shutdown and purge pxc from any unit that is removed from a cluster; this is do-able via the 'stop' hook which is run on each unit as applications/ services are destroyed.