[upgrade] versions N and N+1 are not compatible

Bug #1825999 reported by Rodrigo Barbieri
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Fix Released
High
Chris MacNaughton
OpenStack Nova Compute Charm
Fix Released
High
Chris MacNaughton

Bug Description

According to guide [1], while upgrading OpenStack services, a service in given version N should be compatible with its N+1 version. This is important when not being able to upgrade the cloud all at once, or when you have several nodes running the same services.

I performed some testing upgrading from Ocata to Pike, and Pike to Queens, following the upgrade order in [2], and I encountered the following issues:

Ocata => Pike
=============

After neutron-gateway upgrade, nova-api-metadata could not talk to nova-cloud-controller services. It continuosly printed the error message below and new instances could not obtain metadata:

2019-04-11 19:04:39.560 7011 ERROR oslo_service.service RemoteError: Remote error: IncompatibleObjectVersion Version 1.22 of Service is not supported

When running "curl 169.254.169.254" the error message was:

500 Internal Server Error

Remote metadata server experienced an internal server error.

After nova-cloud-controller was upgraded, the nova-api-metadata functionality was restored, but the errors below were observed, rendering the nova API mostly useless:

Remote error: UnsupportedVersion Endpoint does not support RPC version 4.4. Attempted method: select_destinations

+ openstack server add volume ins2_old v2_old
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.rpc.client.RemoteError'> (HTTP 500) (Request-ID: req-757f2fc1-a622-4501-9934-e7d66b2d84e9)

+ openstack server add volume ins1_old v1_old
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.rpc.client.RemoteError'> (HTTP 500) (Request-ID: req-91552a98-5f93-430a-92c5-60d1c95fe6e9)

$ openstack server list
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_versionedobjects.exception.IncompatibleObjectVersion'> (HTTP 500) (Request-ID: req-d48dc209-ccde-4d4e-b00f-1a50637db30a)

Upgrading nova-compute restored functionality.

Pike => Queens
==============

Upgrading neutron-gateway did not cause nova-api-metadata problems when upgrading to Queens, but upgrading nova-cloud-controller did:

+ openstack server migrate ins2_old --live juju-335fd8-upgrade-27
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.rpc.client.RemoteError'> (HTTP 500) (Request-ID: req-998cbe26-af0b-40d5-b8fe-58a9a5dfdcb7)

+ openstack server add volume ins2_old v2_old
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.rpc.client.RemoteError'> (HTTP 500) (Request-ID: req-a49ef959-38da-4359-aef8-5fa0872a662c)

++ openstack server show ins1_new -c status -f value
Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_versionedobjects.exception.IncompatibleObjectVersion'> (HTTP 500) (Request-ID: req-03f78531-1f6d-4ac5-b06b-c588fb15b037)

Remote error: UnsupportedVersion Endpoint does not support RPC version 5.0. Attempted method: check_can
_live_migrate_destination

2019-04-16 20:09:15.485 30853 ERROR nova.api.openstack.wsgi [req-1c96d08b-3182-4c89-b757-8304884ac772 946bbb70210e42d1b08798e006adef1b f2bf2b35d763401099510ee4b5f07ea4 - default default] Unexpected exception in API method: RemoteError: Remote error: UnsupportedVersion Endpoint does not support RPC version 5.0. Attempted method: reserve_block_device_name

Again, upgrading nova-compute restored cloud functionality.

=================================

Most of those problems seem related to the fact that the current charm implementations do not allow locking down the RPC version when performing upgrades. as suggested by [1].

[1] https://docs.openstack.org/operations-guide/ops-upgrades.html

[2] https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html#ha-with-pause-resume

tags: added: canonical-bootstack
Revision history for this message
Drew Freiberger (afreiberger) wrote :

Marking field-high.

This bug causes every openstack cloud upgrade to experience an L2 severity outage for at least 2 hours on any production cloud we upgrade due to the availability of the API services but the failure of the backend services notifying the API layer that changes have been successful. This breaks any sort of automation, CI/CD, Orchestration against the cloud and causes need for operational escalation of what should be routine upgrades.

There is a workaround noted in https://bugs.launchpad.net/nova/+bug/1799186 which details to use
[upgrade_levels]
compute=auto

in nova.conf on the nova-cloud-controller units. This is effective to resolve issues for nova-conductor and nova-api-os-compute

Changed in charm-nova-cloud-controller:
status: New → Triaged
Changed in charm-nova-compute:
status: New → Triaged
Changed in charm-nova-cloud-controller:
importance: Undecided → High
Changed in charm-nova-compute:
importance: Undecided → High
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

https://review.opendev.org/659002 and https://review.opendev.org/659004 have been proposed to nova-compute and nova-cloud-controller to resolve this

Changed in charm-nova-cloud-controller:
status: Triaged → In Progress
Changed in charm-nova-compute:
status: Triaged → In Progress
Changed in charm-nova-cloud-controller:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Changed in charm-nova-compute:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Changed in charm-nova-cloud-controller:
milestone: none → 19.07
Changed in charm-nova-compute:
milestone: none → 19.07
David Ames (thedac)
Changed in charm-nova-cloud-controller:
milestone: 19.07 → 19.10
Changed in charm-nova-compute:
milestone: 19.07 → 19.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-nova-cloud-controller (master)

Change abandoned by Chris MacNaughton (icey) (<email address hidden>) on branch: master
Review: https://review.opendev.org/659004

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-nova-compute (master)

Change abandoned by Chris MacNaughton (icey) (<email address hidden>) on branch: master
Review: https://review.opendev.org/659002

David Ames (thedac)
Changed in charm-nova-cloud-controller:
milestone: 19.10 → 20.01
Changed in charm-nova-compute:
milestone: 19.10 → 20.01
Ryan Beisner (1chb1n)
tags: added: openstack-upgrade
Revision history for this message
Ryan Beisner (1chb1n) wrote :

We have scenario tests for openstack upgrades, they're just not in the third party CI gates. I have re-opened and rebased these reviews. We need to discuss testing approach and continue to drive the bug forward.

https://review.opendev.org/#/q/topic:bug/1825999+(status:open+OR+status:merged)

Changed in charm-nova-cloud-controller:
milestone: 20.01 → 20.02
Changed in charm-nova-compute:
milestone: 20.01 → 20.02
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (master)

Reviewed: https://review.opendev.org/659004
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=bfea6cc43f6d75598666bce3252fbcbb262a8775
Submitter: Zuul
Branch: master

commit bfea6cc43f6d75598666bce3252fbcbb262a8775
Author: Chris MacNaughton <email address hidden>
Date: Tue May 14 11:46:10 2019 +0200

    Ensure we set compute upgrade_level

    With this change, we are enabling OpenStack compute services
    to automatically determine the highest available RPC level
    to use based on the service versions in the deployment.

    Change-Id: I4e08de92ab8d0641398f3b54d7ea87d83c3b050a
    Closes-Bug: #1825999

Changed in charm-nova-cloud-controller:
status: In Progress → Fix Committed
Changed in charm-nova-compute:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/659002
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=fa79fbe252210f966d8d7a085b002ba497620500
Submitter: Zuul
Branch: master

commit fa79fbe252210f966d8d7a085b002ba497620500
Author: Chris MacNaughton <email address hidden>
Date: Tue May 14 11:44:28 2019 +0200

    Ensure we set compute upgrade_level

    With this change, we are enabling OpenStack compute services
    to automatically determine the highest available RPC level
    to use based on the service versions in the deployment.

    Change-Id: Ia5daf035cae1e844dec33fad07aa5c38e86a5f7b
    Closes-Bug: #1825999

Liam Young (gnuoy)
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.