Removing nova-compute unit with scheduled but stopped VM breaks hypervisor-list api call

Bug #1739253 reported by Drew Freiberger
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Invalid
Undecided
Unassigned
OpenStack Nova Compute Charm
Invalid
Undecided
Unassigned
Ubuntu Cloud Archive
Fix Released
Medium
Unassigned
Mitaka
Triaged
Medium
Unassigned
nova (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Triaged
Medium
Unassigned

Bug Description

After removing nova-compute unit on node mycloud-cs-003, nova hypervisor-list stopped working and started to return below error to user and nova-api-os-compute.log on the nova-api server.

While this may be an upstream issue, I believe the charm should probably handle this edge case.

When querying the database, I find that the service and the compute_node entry for the host are both in deleted status, but I see that there is a scheduled vm on the node mycloud-cs-003. I went in and did a nova delete <instanceid> on the instance that was scheduled on that node, and that succeeded, but the "running_vms" total in compute_nodes table did not decrease, so I updated that row to running_vms = 0, and I'm still experiencing the below traceback in nova-api-os-compute.log.

2017-12-19 17:59:35.733 218705 DEBUG nova.api.openstack.wsgi [req-a597c96f-a372-4a0c-9c79-d2859f1612db 2ac8326863b64ea3ba9ba96a7ab70214 51419d6b9c8f475db199c24b0e50a99d - - -] Calling method '<bound method HypervisorsController.index of <nova.api.openstack.compute.hypervisors.HypervisorsController object at 0x7fc679ba5d10>>' _process_stack /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:699
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions [req-a597c96f-a372-4a0c-9c79-d2859f1612db 2ac8326863b64ea3ba9ba96a7ab70214 51419d6b9c8f475db199c24b0e50a99d - - -] Unexpected exception in API method
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ File \"/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py\", line 478, in wrapped
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ return f(*args, **kwargs)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ File \"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/hypervisors.py\", line 88, in index
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ for hyp in compute_nodes])
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ File \"/usr/lib/python2.7/dist-packages/nova/compute/api.py\", line 3743, in service_get_by_compute_host
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ return objects.Service.get_by_compute_host(context, host_name)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions File \"/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py\", line 181, in wrapper
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ result = fn(cls, context, *args, **kwargs)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions File \"/usr/lib/python2.7/dist-packages/nova/objects/service.py\", line 243, in get_by_compute_host
--
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ File \"/usr/lib/python2.7/dist-packages/nova/objects/service.py\", line 238, in _db_service_get_by_compute_host
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ return db.service_get_by_compute_host(context, host)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions File \"/usr/lib/python2.7/dist-packages/nova/db/api.py\", line 163, in service_get_by_compute_host
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions return IMPL.service_get_by_compute_host(context, host)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions File \"/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py\", line 330, in wrapped
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions return f(context, *args, **kwargs)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ File \"/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py\", line 585, in service_get_by_compute_host
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions \ raise exception.ComputeHostNotFound(host=host)
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions ComputeHostNotFound: Compute host mycloud-cs-003 could not be found.
2017-12-19 17:59:36.044 218705 ERROR nova.api.openstack.extensions
2017-12-19 17:59:36.046 218705 INFO nova.api.openstack.wsgi [req-a597c96f-a372-4a0c-9c79-d2859f1612db 2ac8326863b64ea3ba9ba96a7ab70214 51419d6b9c8f475db199c24b0e50a99d - - -] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'nova.exception.ComputeHostNotFound'>

Steps to recreate:

1. deploy nova-cloud-controller and nova-compute with proper relations to keystone/mysql/etc
2. deploy a vm to the nova-compute environment
3. stop the instance
4. juju remove-unit <nova-compute/X> for the unit that the VM was scheduled on
5. nova hypervisor-list should exhibit this error.

Please let me know if this does not work.

Notes: this environment was previously upgraded from either icehouse or liberty to mitaka. (guessing liberty since the service deleted and compute_node deleted columns are ordered, incrementing numbers, and not just 0 or 1)

Running openstack 17.02 charms, I believe on trusty/mitaka cloud.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

FWIW the cloud mentioned was indeed upgraded from Icehouse to Mitaka, and is running trusty

James Page (james-page)
Changed in charm-nova-compute:
status: New → Invalid
Changed in charm-nova-cloud-controller:
status: New → Invalid
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This appears to be fixed upstream as of Newton via bug https://bugs.launchpad.net/nova/+bug/1646255 for which the following patch was provided:

commit f0d44c5b09f3f3c84038d40b621bb629a1f8110e
Author: Matt Riedemann <email address hidden>
Date: Sun Dec 4 15:08:04 2016 -0500

    Handle ComputeHostNotFound when listing hypervisors

    Compute node resources must currently be deleted manually
    in the database, and as such they can reference service
    records which have been deleted via the services delete API.
    Because of this when listing hypervisors (compute nodes), we
    may get a ComputeHostNotFound error when trying to lookup a
    service record for a compute node where the service was
    deleted. This causes the API to fail with a 500 since it's not
    handled.

    This change handles the ComputeHostNotFound when looping over
    compute nodes in the hypervisors index and detail methods and
    simply ignores them.

    Change-Id: I2717274bb1bd370870acbf58c03dc59cee30cc5e
    Closes-Bug: #1646255

Changed in nova (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Corey Bryant (corey.bryant) wrote :

I took a look at the code for Mitaka and while the patch won't cherry-pick cleanly a backport of the patch should be doable with some adjustments.

Changed in nova (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in nova (Ubuntu):
status: Triaged → Fix Released
Changed in cloud-archive:
status: New → Fix Released
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.