VMware: nova nodes crash on 503 errors, control of vSphere infrastructure completely lost

Bug #1292583 reported by Shawn Hartsock
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Shawn Hartsock
VMwareAPI-Team
In Progress
Critical
Shawn Hartsock

Bug Description

Related to https://bugs.launchpad.net/nova/+bug/1262288 when the Nova node shuts down it leaves it's vSphere SOAP session open. This session remains open blocking connection by administrators and by OpenStack.

The result is administrative control over the entire vSphere, vCenter, and ESX infrastructure is temporarily lost.

Python CLI users experience:
pyVmomi.VmomiSupport.HostConnectFault: (vim.fault.HostConnectFault) {
   dynamicType = <unset>,
   dynamicProperty = (vmodl.DynamicProperty) [],
   msg = '503 Service Unavailable',
   faultCause = <unset>,
   faultMessage = (vmodl.LocalizableMessage) []
}

VpXClient users experience:

Call "ServiceInstance.RetrieveContent" for object "ServiceInstance" on Server "<server_name>" failed.

OpenStack users experience:
2014-03-14 08:55:06.579 ERROR suds.client [-] <?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:ns0="urn:vim25" xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
   <ns1:Body>
      <ns0:RetrieveServiceContent>
         <ns0:_this type="ServiceInstance">ServiceInstance</ns0:_this>
      </ns0:RetrieveServiceContent>
   </ns1:Body>
</SOAP-ENV:Envelope>
2014-03-14 08:55:06.581 CRITICAL nova.virt.vmwareapi.driver [-] Unable to connect to server at 192.168.2.36, sleeping for 2 seconds
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver Traceback (most recent call last):
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 753, in _create_session
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver self.vim = self._get_vim_object()
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 742, in _get_vim_object
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver return vim.Vim(protocol=self._scheme, host=self._host_ip)
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver File "/opt/stack/nova/nova/virt/vmwareapi/vim.py", line 117, in __init__
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver self._service_content = self.retrieve_service_content()
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver File "/opt/stack/nova/nova/virt/vmwareapi/vim.py", line 120, in retrieve_service_content
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver return self.RetrieveServiceContent("ServiceInstance")
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver File "/opt/stack/nova/nova/virt/vmwareapi/vim.py", line 223, in vim_request_handler
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver _("Exception in %s ") % (attr_name), excep)
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver VimException: Exception in RetrieveServiceContent : (503, u'Service Unavailable')
2014-03-14 08:55:06.581 TRACE nova.virt.vmwareapi.driver

All administrative control of the cloud is temporarily lost for a timespan of up to 30 minutes.

Tags: vmware
Changed in nova:
status: New → In Progress
importance: Undecided → High
milestone: none → icehouse-rc1
Changed in openstack-vmwareapi-team:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Shawn Hartsock (hartsock)
Revision history for this message
Shawn Hartsock (hartsock) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/75262
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5fc7ee5f62903082bb86a4206ae6efce0e28a425
Submitter: Jenkins
Branch: master

commit 5fc7ee5f62903082bb86a4206ae6efce0e28a425
Author: Shawn Hartsock <email address hidden>
Date: Thu Feb 20 19:34:41 2014 -0500

    add support for host driver cleanup during shutdown

    Provides a place for any drivers to potentially
    setup graceful-shutdown code.

    VMware driver's proper driver-lifecycle code included.
    This is critical in environments where the the vmware
    driver is setup and torn down at high frequency.

    Prevents run-away vSphere by closing stateless
    HTTP management sessions gracefully.

    Change-Id: I67a91613643540243ab1210b333ed8e121f05802
    related to bug: 1262288
    Closes-bug: 1292583

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/82269

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-rc1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.