series-upgrade 'prepare' on non-leader units fails after juju config source=distro

Bug #1928023 reported by Drew Freiberger
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
New
Undecided
Unassigned

Bug Description

While following the playbook in the Charm Guide for stateful clustered application series upgrade from Xenial to Bionic on 21.04 latest charms, rabbitmq-server non-leader units refused to allow for upgrade-series prepare because the non-leader units were stuck in config-changed.

Process is documented here:
https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/upgrade-series-openstack.html#stateful-applications

Traceback was:

2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/hooks/config-changed", line 843, in <module>
2021-05-10 22:52:15 WARNING config-changed hooks.execute(sys.argv)
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/charmhelpers/core/hookenv.py", line 956, in execute
2021-05-10 22:52:15 WARNING config-changed self._hooks[hook_name]()
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/charmhelpers/contrib/openstack/utils.py", line 1893, in wrapped_f
2021-05-10 22:52:15 WARNING config-changed return f(*args, **kwargs)
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/charmhelpers/contrib/hardening/harden.py", line 93, in _harden_inner2
2021-05-10 22:52:15 WARNING config-changed return f(*args, **kwargs)
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/hooks/config-changed", line 734, in config_changed
2021-05-10 22:52:15 WARNING config-changed update_nrpe_checks()
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/hooks/config-changed", line 589, in update_nrpe_checks
2021-05-10 22:52:15 WARNING config-changed hostname, unit, vhosts, user, password = rabbit.get_nrpe_credentials()
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/hooks/rabbit_utils.py", line 1295, in get_nrpe_credentials
2021-05-10 22:52:15 WARNING config-changed create_user(user, password, ['monitoring'])
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/hooks/rabbit_utils.py", line 286, in create_user
2021-05-10 22:52:15 WARNING config-changed exists = user_exists(user)
2021-05-10 22:52:15 WARNING config-changed File "/var/lib/juju/agents/unit-rabbitmq-server-5/charm/hooks/rabbit_utils.py", line 276, in user_exists

Workaround is to on-unit patch config_changed in rabbitmq_server_relations.py with:
https://github.com/openstack/charm-rabbitmq-server/blob/stable/21.01/hooks/rabbitmq_server_relations.py#L700-L702
and run:
systemctl restart jujud-unit-rabbitmq-server-X

Revision history for this message
Drew Freiberger (afreiberger) wrote :

For clarity, I believe

Step 5. Set the value of the source configuration option to ‘distro’:
juju config percona-cluster source=distro

Happening before the other units are upgraded is the cause of the deadlock in the cluster series upgrade process.

My CLI showed an error at the "prepare" step of the non-leader unit that it was not in an idle state.

summary: - do-release-upgrade on non-leader units fails after juju config
+ series-upgrade 'prepare' on non-leader units fails after juju config
source=distro
James Troup (elmo)
tags: added: openstack-upgrade
James Troup (elmo)
tags: added: series-upgrade
removed: openstack-upgrade
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Having just upgraded serverstack from bionic to focal, I would concur with Drew; I did the config changed after the complete was done on all units; (i.e. not during the upgrade, but after it). So I think the documentation is wrong on this one.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Adding to my last comment: I'm not sure what we can do *in* the charm, unless we signal between the charms that one is doing the upgrade. I don't like that option (as it takes control away from the operator). I think what is debatable (and what I'm erring towards) is that OpenStack upgrades should be actions on units rather than "automatic" via config-changed. This would prevent this type of error from occurring. However, there would also be a need to indicate that the config didn't match the installed version. Hmm.

Revision history for this message
Adam Dyess (addyess) wrote :

The workaround in the description was SUPER effective

Workaround is to on-unit patch config_changed in rabbitmq_server_relations.py with:
https://github.com/openstack/charm-rabbitmq-server/blob/stable/21.01/hooks/rabbitmq_server_relations.py#L700-L702
and run:
systemctl restart jujud-unit-rabbitmq-server-X

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.