After action-managed-upgrade from queens to rocky with Neutron DVR enabled, neutron-dhcp-agent package is left removed

Bug #1919498 reported by Drew Freiberger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Triaged
High
Unassigned

Bug Description

We're upgrading a Neutron DVR cloud from bionic-queens to bionic-rocky.

On the nova-compute service(s) where we performed action-managed-upgrade=true openstack-upgrade actions, we found neutron-dhcp-agent was removed from each node as it was upgraded but not re-installed after.

Workaround is to install neutron-dhcp-agent across the openstack-upgraded nova-compute units.

We are currently checking whether action-managed-upgrade=false also presents the same way.

From the logs, we see the following in preparation for the upgrade from py2 to py3:

2021-03-17 17:50:04 DEBUG openstack-upgrade The following packages will be REMOVED:
2021-03-17 17:50:04 DEBUG openstack-upgrade python-ceilometer* python-neutron* python-neutron-fwaas* python-nova*
(edited)

And it's noted in lp#1828259 related to neutron-l3-agent being removed on DVR clouds upgrading to rocky, the nova-compute charm is supposed to take into account the neutron subordinate's package needs upon package upgrade and re-install the neutron agents removed prior.

https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1828259/comments/17

We may need to add neutron-dhcp-agent to that list of package exceptions noted by Corey in the nova-comptue charm.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Update, when performing a non-action-managed-upgrade, this error state does not occur.

juju config nova-compute openstack-origin=cloud:bionic-rocky action-managed-upgrade=false

User story for needing to perform action-managed-upgrade=true is to be able to know which VMs will be affected at what times when it comes to the openvswitch restarts that happen as part of the queens to rocky upgrade rather than having a random 30-60 minutes where OVS may drop out of all hypervisors.

Considering that our managed service data-plane uptimes presume workloads are spread across multiple availability zones, I wonder if an update in the nova-compute charm to locking upgrades to one AZ at a time would be possible when not using action-managed-upgrade.

Imagine an openstack upgrade of a cloud running a kubernetes cluster on the overlay, we'd want all of the HA services like metallb, kubernetes-master, kubeapi-loadbalancer, etc, that are properly distributed across AZs to not be taken offline, network-wise, at the same time so there aren't split-brain issues in clustered applications.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

I think it makes sense to make the exception here as well, from what Drew points out. The subordinate dependencies is kind of a tricky point that could definitely use some better handling and we've started this work. There's some work that has been done in https://bugs.launchpad.net/charm-keystone-ldap/+bug/1806111 to handle this for keystone scenarios, but it needs to be rolled through other charms as well in order to provide the various subordinate dependent packages that need to be monitored.

tags: added: openstack-upgrade
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Re: #2, note that some work around managing subordinates [1] has been done meanwhile, which may help fixing this bug.

[1]: https://review.opendev.org/c/openstack/charm-nova-compute/+/811139

Changed in charm-nova-compute:
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.