2013-09-25 03:35:13 |
Ryan Hsu |
bug |
|
|
added bug |
2013-09-25 03:47:22 |
Ryan Hsu |
description |
BUG-DESCRIPTION:
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. Either of the 2 following error messages can be seen in the logs when an instance fails to build.
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1408, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 609, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 440, in spawn
vmdk_file_size_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 795, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: File [ryan-nfs] vmware_base/e8c42ed8-05e7-45bc-90c3-49a34e5a37c6.vmdk was not found
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1228, in _allocate_network_async
dhcp_options=dhcp_options)
File "/opt/stack/nova/nova/network/api.py", line 93, in wrapped
return func(self, context, *args, **kwargs)
File "/opt/stack/nova/nova/network/api.py", line 49, in wrapper
res = f(self, context, *args, **kwargs)
File "/opt/stack/nova/nova/network/api.py", line 300, in allocate_for_instance
nw_info = self.network_rpcapi.allocate_for_instance(context, **args)
File "/opt/stack/nova/nova/network/rpcapi.py", line 184, in allocate_for_instance
macs=jsonutils.to_primitive(macs))
File "/opt/stack/nova/nova/rpcclient.py", line 85, in call
return self._invoke(self.proxy.call, ctxt, method, **kwargs)
File "/opt/stack/nova/nova/rpcclient.py", line 63, in _invoke
return cast_or_call(ctxt, msg, **self.kwargs)
File "/opt/stack/nova/nova/openstack/common/rpc/proxy.py", line 130, in call
exc.info, real_topic, msg.get('method'))
Here information from the 2 environments where the issue was observed:
Environment 1:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47410/
Environment 2:
- 1 datacenter, 1 cluster, 2 hosts
- iSCSI shared datastore
- was able to spawn ~30 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47467/ |
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. Either of the 2 following error messages can be seen in the logs when an instance fails to build.
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1408, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 609, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 440, in spawn
vmdk_file_size_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 795, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: File [ryan-nfs] vmware_base/e8c42ed8-05e7-45bc-90c3-49a34e5a37c6.vmdk was not found
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1228, in _allocate_network_async
dhcp_options=dhcp_options)
File "/opt/stack/nova/nova/network/api.py", line 93, in wrapped
return func(self, context, *args, **kwargs)
File "/opt/stack/nova/nova/network/api.py", line 49, in wrapper
res = f(self, context, *args, **kwargs)
File "/opt/stack/nova/nova/network/api.py", line 300, in allocate_for_instance
nw_info = self.network_rpcapi.allocate_for_instance(context, **args)
File "/opt/stack/nova/nova/network/rpcapi.py", line 184, in allocate_for_instance
macs=jsonutils.to_primitive(macs))
File "/opt/stack/nova/nova/rpcclient.py", line 85, in call
return self._invoke(self.proxy.call, ctxt, method, **kwargs)
File "/opt/stack/nova/nova/rpcclient.py", line 63, in _invoke
return cast_or_call(ctxt, msg, **self.kwargs)
File "/opt/stack/nova/nova/openstack/common/rpc/proxy.py", line 130, in call
exc.info, real_topic, msg.get('method'))
Here information from the 2 environments where the issue was observed:
Environment 1:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47410/
Environment 2:
- 1 datacenter, 1 cluster, 2 hosts
- iSCSI shared datastore
- was able to spawn ~30 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47467/ |
|
2013-09-25 23:05:36 |
Ryan Hsu |
description |
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. Either of the 2 following error messages can be seen in the logs when an instance fails to build.
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1408, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 609, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 440, in spawn
vmdk_file_size_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 795, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: File [ryan-nfs] vmware_base/e8c42ed8-05e7-45bc-90c3-49a34e5a37c6.vmdk was not found
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1228, in _allocate_network_async
dhcp_options=dhcp_options)
File "/opt/stack/nova/nova/network/api.py", line 93, in wrapped
return func(self, context, *args, **kwargs)
File "/opt/stack/nova/nova/network/api.py", line 49, in wrapper
res = f(self, context, *args, **kwargs)
File "/opt/stack/nova/nova/network/api.py", line 300, in allocate_for_instance
nw_info = self.network_rpcapi.allocate_for_instance(context, **args)
File "/opt/stack/nova/nova/network/rpcapi.py", line 184, in allocate_for_instance
macs=jsonutils.to_primitive(macs))
File "/opt/stack/nova/nova/rpcclient.py", line 85, in call
return self._invoke(self.proxy.call, ctxt, method, **kwargs)
File "/opt/stack/nova/nova/rpcclient.py", line 63, in _invoke
return cast_or_call(ctxt, msg, **self.kwargs)
File "/opt/stack/nova/nova/openstack/common/rpc/proxy.py", line 130, in call
exc.info, real_topic, msg.get('method'))
Here information from the 2 environments where the issue was observed:
Environment 1:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47410/
Environment 2:
- 1 datacenter, 1 cluster, 2 hosts
- iSCSI shared datastore
- was able to spawn ~30 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47467/ |
UPDATE: Removed information related to the iSCSI environment as the problem was due to testing using an Openstack server that had very little CPU and memory. The issue remains on the NFS server.
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. The following error message can be seen in the logs when an instance fails to build.
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1408, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 609, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 440, in spawn
vmdk_file_size_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 795, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: File [ryan-nfs] vmware_base/e8c42ed8-05e7-45bc-90c3-49a34e5a37c6.vmdk was not found
Environment information:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47410/ |
|
2013-09-25 23:06:07 |
Ryan Hsu |
summary |
VMware: errors spawning large amounts of VMs |
VMware: spawning large amounts of VMs sometimes causes errors |
|
2013-10-06 02:56:14 |
Ryan Hsu |
description |
UPDATE: Removed information related to the iSCSI environment as the problem was due to testing using an Openstack server that had very little CPU and memory. The issue remains on the NFS server.
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. The following error message can be seen in the logs when an instance fails to build.
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1408, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 609, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 440, in spawn
vmdk_file_size_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 795, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: File [ryan-nfs] vmware_base/e8c42ed8-05e7-45bc-90c3-49a34e5a37c6.vmdk was not found
Environment information:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47410/ |
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. There are two errors seen in the logs that are causing the instance spawn failures. The first is the ESX host not finding the image in the nfs datastore (even though it is there, otherwise other instances couldn't be spawned). The second is the ESX host not being able to access the vmdk image because it is locked.
Image not found error:
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1408, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 609, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 440, in spawn
vmdk_file_size_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 795, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: File [ryan-nfs] vmware_base/e8c42ed8-05e7-45bc-90c3-49a34e5a37c6.vmdk was not found
Image locked error:
Traceback (most recent call last):
File "/opt/stack/nova/nova/compute/manager.py", line 1407, in _spawn
block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 623, in spawn
admin_password, network_info, block_device_info)
File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 504, in spawn
root_gb_in_kb, linked_clone)
File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 71, in attach_disk_to_vm
self._session._wait_for_task(instance_uuid, reconfig_task)
File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 900, in _wait_for_task
ret_val = done.wait()
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
return hubs.get_hub().switch()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
return self.greenlet.switch()
NovaException: Unable to access file [ryan-nfs] vmware_base/f110bb94-2170-4a3a-ae0d-760f95eb8b47.0.
Environment information:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://paste.openstack.org/show/47410/ |
|
2013-10-09 17:33:28 |
Vui Lam |
nova: status |
New |
Confirmed |
|
2013-10-09 17:34:23 |
Vui Lam |
nova: assignee |
|
Vui Lam (vui) |
|
2013-10-09 17:41:25 |
Tracy Jones |
nova: importance |
Undecided |
High |
|
2013-11-19 18:37:19 |
Tracy Jones |
tags |
vmware |
havana-backport-potential vmware |
|
2013-11-19 18:37:39 |
Shawn Hartsock |
bug task added |
|
openstack-vmwareapi-team |
|
2013-11-19 18:37:46 |
Shawn Hartsock |
openstack-vmwareapi-team: status |
New |
Confirmed |
|
2013-11-19 18:37:49 |
Shawn Hartsock |
openstack-vmwareapi-team: importance |
Undecided |
High |
|
2013-11-19 18:37:59 |
Shawn Hartsock |
openstack-vmwareapi-team: assignee |
|
Vui Lam (vui) |
|
2013-11-26 22:54:17 |
Shawn Hartsock |
summary |
VMware: spawning large amounts of VMs sometimes causes errors |
VMware: spawning large amounts of VMs concurrently sometimes causes errors |
|
2013-12-01 07:31:33 |
Gary Kotton |
nova: assignee |
Vui Lam (vui) |
Gary Kotton (garyk) |
|
2013-12-01 07:31:38 |
Gary Kotton |
openstack-vmwareapi-team: assignee |
Vui Lam (vui) |
Gary Kotton (garyk) |
|
2013-12-01 07:31:39 |
Gary Kotton |
nova: milestone |
|
icehouse-1 |
|
2013-12-02 20:44:44 |
dan wendlandt |
summary |
VMware: spawning large amounts of VMs concurrently sometimes causes errors |
VMware: spawning large amounts of VMs concurrently sometimes causes "VMDK lock" error |
|
2013-12-03 22:56:11 |
Russell Bryant |
nova: milestone |
icehouse-1 |
icehouse-2 |
|
2013-12-05 09:53:47 |
OpenStack Infra |
nova: status |
Confirmed |
In Progress |
|
2014-01-22 20:23:59 |
Thierry Carrez |
nova: milestone |
icehouse-2 |
icehouse-3 |
|
2014-03-05 12:34:36 |
Thierry Carrez |
nova: milestone |
icehouse-3 |
icehouse-rc1 |
|
2014-03-06 13:50:08 |
OpenStack Infra |
nova: status |
In Progress |
Fix Committed |
|
2014-03-31 19:02:56 |
Thierry Carrez |
nova: status |
Fix Committed |
Fix Released |
|
2014-04-17 09:12:55 |
Thierry Carrez |
nova: milestone |
icehouse-rc1 |
2014.1 |
|