Instance boot fails if Nova exceeds a quota in Cinder leaving created volumes in attached state

Bug #1668865 reported by Anatolii Neliubin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Confirmed
Medium
MOS Nova
9.x
Confirmed
Medium
MOS Nova

Bug Description

Detailed bug description:
  Before starting a boot process Nova does not check whether it has enough resources to finish booting the new instance. For example, when we have Cinder quota limitation and we start booting the new instance with several volumes Nova might overuse available number of Cinder volumes. After deleting of the failed instance the newly created volumes cannot be deleted properly. On the Ceph backend the newly created volume got stuck in the "attached" status, although the failed instance is already deleted. On the EMC VNX backend volumes cannot be deleted as well.
Steps to reproduce:
Set Cinder quotas so that you are allowed to create only one volume. Here we have already created 9 of 10 allowed volumes, so we can create only one more volume:
root@node-4:~# cinder quota-usage 58fbda8b4a9448fab839788a01de1d4d
+------------------------+--------+----------+-------+
| Type | In_use | Reserved | Limit |
+------------------------+--------+----------+-------+
| backup_gigabytes | 0 | 0 | 1000 |
| backups | 0 | 0 | 10 |
| gigabytes | 9 | 0 | 1000 |
| gigabytes_volumes_ceph | 0 | 0 | -1 |
| per_volume_gigabytes | 0 | 0 | -1 |
| snapshots | 0 | 0 | 10 |
| snapshots_volumes_ceph | 0 | 0 | -1 |
| volumes | 9 | 0 | 10 |
| volumes_volumes_ceph | 0 | 0 | -1 |
+------------------------+--------+----------+-------+

Boot a new instance and simultaneously create two additional Cinder volumes:
root@node-4:~# nova boot --flavor m1.micro --block-device source=image,id=b088d5eb-43b8-44aa-8f53-65e84d0be8a3,dest=volume,size=1,shutdown=remove,bootindex=0 --nic net-id=4c2fbf9e-598c-47af-9268-c7a51765ad92 --block-device source=image,id=b088d5eb-43b8-44aa-8f53-65e84d0be8a3,dest=volume,size=2,shutdown=remove cirros-from-volume-1059

I omit the output of the previous command since it is not very informative. More interesting is the output of these commands:
root@node-4:~# nova show b02062dd-675a-495d-b9d8-d2cda6f82dd9
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | cirros-from-volume-1059 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-0000000f |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-nqrbhaj9 |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2017-03-01T05:59:45Z |
| description | - |
| fault | {"message": "Build of instance b02062dd-675a-495d-b9d8-d2cda6f82dd9 aborted: Volume resource quota exceeded", "code": 500, "details": " File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 1926, in _do_build_and_run_instance |
| | filter_properties) |
| | File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2083, in _build_and_run_instance |
| | 'create.error', fault=e) |
| | File \"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py\", line 220, in __exit__ |
| | self.force_reraise() |
| | File \"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py\", line 196, in force_reraise |
| | six.reraise(self.type_, self.value, self.tb) |
| | File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2048, in _build_and_run_instance |
| | block_device_mapping) as resources: |
| | File \"/usr/lib/python2.7/contextlib.py\", line 17, in __enter__ |
| | return self.gen.next() |
| | File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2206, in _build_resources |
| | reason=e.format_message()) |
| | ", "created": "2017-03-01T06:00:01Z"} |
| flavor | m1.micro (723364db-c01f-427b-82d2-8766ea47b277) |
| hostId | |
| host_status | |
| id | b02062dd-675a-495d-b9d8-d2cda6f82dd9 |
| image | Attempt to boot from volume - no image supplied |
| key_name | - |
| locked | False |
| metadata | {} |
| name | cirros-from-volume-1059 |
| os-extended-volumes:volumes_attached | [{"id": "bf7b83ee-413e-4cc6-a8c1-c8a1befa1811", "delete_on_termination": true}] |
| status | ERROR |
| tenant_id | 58fbda8b4a9448fab839788a01de1d4d |
| updated | 2017-03-01T06:00:01Z |
| user_id | 6df7e27730934dc587a0b322a2bafa7b |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@node-4:~# cinder quota-usage 58fbda8b4a9448fab839788a01de1d4d
+------------------------+--------+----------+-------+
| Type | In_use | Reserved | Limit |
+------------------------+--------+----------+-------+
| backup_gigabytes | 0 | 0 | 1000 |
| backups | 0 | 0 | 10 |
| gigabytes | 10 | 0 | 1000 |
| gigabytes_volumes_ceph | 0 | 0 | -1 |
| per_volume_gigabytes | 0 | 0 | -1 |
| snapshots | 0 | 0 | 10 |
| snapshots_volumes_ceph | 0 | 0 | -1 |
| volumes | 10 | 0 | 10 |
| volumes_volumes_ceph | 0 | 0 | -1 |
+------------------------+--------+----------+-------+
root@node-4:~# cinder list
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
| 138096e2-c842-4699-8907-be2cb6611a3c | available | - | 1 | - | false | |
| 403aeeda-29f1-4908-a01a-501398610e11 | available | - | 1 | - | false | |
| 817846aa-6ae8-426a-a5be-f4afd87e4535 | available | - | 1 | - | false | |
| 81d0b0c2-e9ed-4ad4-8a38-5db3809f9de4 | available | - | 1 | - | false | |
| 97933759-d1cf-483c-8248-e8abe4584e5f | available | - | 1 | - | false | |
| bf7b83ee-413e-4cc6-a8c1-c8a1befa1811 | in-use | | 1 | - | true | b02062dd-675a-495d-b9d8-d2cda6f82dd9 |
| c16dddf0-e6ff-48ef-a8fe-dd59ea9572c1 | available | - | 1 | - | false | |
| d57b916a-0c4e-4ec2-a1a8-cee78339d38e | available | - | 1 | - | false | |
| dc14a001-4a99-4748-816f-d88e85d5272f | available | - | 1 | - | false | |
| f8022272-9188-47ba-9d7a-01d43fd036fc | available | - | 1 | - | false | |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
As you can see, the instance is in "error" statem although one of the volumes has been creates successfully and is marked as "attached". If I delete the problematic instance and try to delete the volume as well, I got an error message that the volume cannot be deleted since it is in an "attached" state.
root@node-4:~# nova delete b02062dd-675a-495d-b9d8-d2cda6f82dd9
Request to delete server b02062dd-675a-495d-b9d8-d2cda6f82dd9 has been accepted.
root@node-4:~# cinder list
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
| 138096e2-c842-4699-8907-be2cb6611a3c | available | - | 1 | - | false | |
| 403aeeda-29f1-4908-a01a-501398610e11 | available | - | 1 | - | false | |
| 817846aa-6ae8-426a-a5be-f4afd87e4535 | available | - | 1 | - | false | |
| 81d0b0c2-e9ed-4ad4-8a38-5db3809f9de4 | available | - | 1 | - | false | |
| 97933759-d1cf-483c-8248-e8abe4584e5f | available | - | 1 | - | false | |
| bf7b83ee-413e-4cc6-a8c1-c8a1befa1811 | in-use | | 1 | - | true | b02062dd-675a-495d-b9d8-d2cda6f82dd9 |
| c16dddf0-e6ff-48ef-a8fe-dd59ea9572c1 | available | - | 1 | - | false | |
| d57b916a-0c4e-4ec2-a1a8-cee78339d38e | available | - | 1 | - | false | |
| dc14a001-4a99-4748-816f-d88e85d5272f | available | - | 1 | - | false | |
| f8022272-9188-47ba-9d7a-01d43fd036fc | available | - | 1 | - | false | |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
root@node-4:~# cinder delete bf7b83ee-413e-4cc6-a8c1-c8a1befa1811
Delete for volume bf7b83ee-413e-4cc6-a8c1-c8a1befa1811 failed: Invalid volume: Volume status must be available or error or error_restoring or error_extending and must not be migrating, attached, belong to a consistency group or have snapshots. (HTTP 400) (Request-ID: req-1d4e7702-14d6-492f-92d2-92746cd15759)
ERROR: Unable to delete any of the specified volumes.

The only way to delete this volume is to reset its state:
root@node-4:~# cinder reset-state --attach-status detached bf7b83ee-413e-4cc6-a8c1-c8a1befa1811
root@node-4:~# cinder delete bf7b83ee-413e-4cc6-a8c1-c8a1befa1811
Request to delete volume bf7b83ee-413e-4cc6-a8c1-c8a1befa1811 has been accepted.
root@node-4:~# cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| 138096e2-c842-4699-8907-be2cb6611a3c | available | - | 1 | - | false | |
| 403aeeda-29f1-4908-a01a-501398610e11 | available | - | 1 | - | false | |
| 817846aa-6ae8-426a-a5be-f4afd87e4535 | available | - | 1 | - | false | |
| 81d0b0c2-e9ed-4ad4-8a38-5db3809f9de4 | available | - | 1 | - | false | |
| 97933759-d1cf-483c-8248-e8abe4584e5f | available | - | 1 | - | false | |
| c16dddf0-e6ff-48ef-a8fe-dd59ea9572c1 | available | - | 1 | - | false | |
| d57b916a-0c4e-4ec2-a1a8-cee78339d38e | available | - | 1 | - | false | |
| dc14a001-4a99-4748-816f-d88e85d5272f | available | - | 1 | - | false | |
| f8022272-9188-47ba-9d7a-01d43fd036fc | available | - | 1 | - | false | |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+

Expected results:
I think Nova should not starting a new instance if it cannot obtain all available resources in order to finish the process of booting.
Workaround:
To remove problematic resources (volumes) manually
Description of the environment:
MOS 9.2

Tags: area-nova
Changed in mos:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → MOS Nova (mos-nova)
tags: added: area-nova
summary: - During the boot process Nova might overuse resources limited by quotas
+ Instance boot fails if Nova exceeds a quota in Cinder leaving created
+ volumes in attached state
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.