In Puppet OpenStack project, we have attempted to switch glance-api deployment from standalone to httpd+mod_wsgi.
However this has been blocked by the following failure in tempest execution.
example:
https://zuul.opendev.org/t/openstack/build/b0712432dc80455db6234b4ae201dda8
```
tearDownClass (tempest.api.image.v2.test_images.BasicOperationsImagesTest)
--------------------------------------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/tempest/test.py", line 267, in tearDownClass
raise value.with_traceback(trace)
File "/usr/lib/python3.9/site-packages/tempest/test.py", line 231, in tearDownClass
teardown()
File "/usr/lib/python3.9/site-packages/tempest/test.py", line 649, in resource_cleanup
raise testtools.MultipleExceptions(*cleanup_errors)
testtools.runtest.MultipleExceptions: ((<class 'tempest.lib.exceptions.ServerFault'>, Got server fault
Details: The server has either erred or is incapable of performing the requested operation.<br /><br />
, <traceback object at 0x7ff72b9d3580>), (<class 'tempest.lib.exceptions.TimeoutException'>, Request timed out
Details: (BasicOperationsImagesTest:tearDownClass) Failed to delete image 57b250ab-4f80-4063-957e-47b14610b7ff within the required time (300 s). Timer started at 1691251732. Timer ended at 1691252032. Waited for 300 s., <traceback object at 0x7ff72ba333c0>))
```
Looking into glance-api log. it seems glance attempted to delete the volume which is not in a deletable state.
```
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi [None req-c745bf78-4274-40a3-9547-2bba02e629a7 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Caught error: Failed to delete volume 7136a2b1-9c7d-4084-a63e-4d8b91ec2340: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-79684329-8a56-4648-bd3c-bf09bef88e56): glance_store.exceptions.BackendException: Failed to delete volume 7136a2b1-9c7d-4084-a63e-4d8b91ec2340: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-79684329-8a56-4648-bd3c-bf09bef88e56)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi Traceback (most recent call last):
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/glance_store/_drivers/cinder/store.py", line 1107, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi self.volume_api.delete(client, loc.volume_id)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/glance_store/common/cinder_utils.py", line 71, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi client.volumes.delete(volume_id)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/v3/volumes_base.py", line 235, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi return self._delete(loc)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/base.py", line 313, in _delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi resp, body = self.api.client.delete(url)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/client.py", line 229, in delete
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi return self._cs_request(url, 'DELETE', **kwargs)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/client.py", line 211, in _cs_request
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi return self.request(url, method, **kwargs)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi File "/usr/lib/python3.9/site-packages/cinderclient/client.py", line 197, in request
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi raise exceptions.from_response(resp, body)
2023-08-05 16:08:51.757 71470 ERROR glance.common.wsgi cinderclient.exceptions.BadRequest: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-79684329-8a56-4648-bd3c-bf09bef88e56)
```
Further investigation revealed that the volume was attachend when tempest downloaded image data, and was not attached before the subsequent delete calls were executed.
```
2023-08-05 16:08:43.290 71469 DEBUG glance.api.middleware.version_negotiation [None req-03ab3cae-b314-4640-8a8c-53214ab5419b 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: GET /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff/file Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
2023-08-05 16:08:43.302 71469 DEBUG glance_store._drivers.cinder.store [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Cinderclient connection created for user glance using URL: https://[::1]:5000/v3. get_cinderclient /usr/lib/python3.9/site-packages/glance_store/_drivers/cinder/store.py:632
2023-08-05 16:08:43.337 71469 DEBUG os_brick.utils [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] ==> get_connector_properties: call "{'root_helper': 'sudo glance-rootwrap /etc/glance/rootwrap.conf', 'my_ip': '2001:4800:7819:104:be76:4eff:fe03:bfe3', 'multipath': False, 'enforce_multipath': False, 'host': 'np0034868732', 'execute': None}" trace_logging_wrapper /usr/lib/python3.9/site-packages/os_brick/utils.py:175
...
2023-08-05 16:08:50.422 71469 DEBUG os_brick.initiator.connectors.base [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.connect_volume" :: held 1.135s inner /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/base.py:87
2023-08-05 16:08:50.423 71469 DEBUG os_brick.initiator.connectors.iscsi [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] <== connect_volume: return (1135ms) {'type': 'block', 'scsi_wwn': '360014053afd57155dd443b6aba7dcf1a', 'path': '/dev/sda'} trace_logging_wrapper /usr/lib/python3.9/site-packages/os_brick/utils.py:202
...
2023-08-05 16:08:51.656 71470 DEBUG glance.api.middleware.version_negotiation [None req-7e33cebd-3d0b-40f7-9f1b-6c90d7ec3b16 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
2023-08-05 16:08:51.823 71470 DEBUG glance.api.middleware.version_negotiation [None req-c745bf78-4274-40a3-9547-2bba02e629a7 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
2023-08-05 16:08:51.968 71470 DEBUG glance.api.middleware.version_negotiation [None req-4e120ce0-3581-4d01-9f31-f1555f8224b1 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Determining version of request: DELETE /v2/images/57b250ab-4f80-4063-957e-47b14610b7ff Accept: application/json process_request /usr/lib/python3.9/site-packages/glance/api/middleware/version_negotiation.py:44
...
2023-08-05 16:08:52.292 71469 DEBUG os_brick.initiator.connectors.base [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 0.285s inner /usr/lib/python3.9/site-packages/os_brick/initiator/connectors/base.py:87
2023-08-05 16:08:52.292 71469 DEBUG os_brick.initiator.connectors.iscsi [None req-3d502f08-a8c0-42da-93e7-51ebf87913c6 1c38ce29bb9c4914acfaaa6ad6e0e152 eab4de521fa9477fa424a39edd1a3cec - - default default] <== disconnect_volume: return (287ms) None trace_logging_wrapper /usr/lib/python3.9/site-packages/os_brick/utils.py:202
```
However the API request of image downloaded completed in ~8 seconds and I don't see it's stuck. From tempest's PoV the download completed so it's reasonable it attempted to delete the image.
```
2023-08-05 16:08:51.234 75640 INFO tempest.lib.common.rest_client [req-3d502f08-a8c0-42da-93e7-51ebf87913c6 req-3d502f08-a8c0-42da-93e7-51ebf87913c6 ] Request (BasicOperationsImagesTest:test_register_upload_get_image_file): 200 GET https://[::1]:9292/v2/images/57b250ab-4f80-4063-957e-47b14610b7ff/file 7.993s
2023-08-05 16:08:51.234 75640 DEBUG tempest.lib.common.rest_client [req-3d502f08-a8c0-42da-93e7-51ebf87913c6 req-3d502f08-a8c0-42da-93e7-51ebf87913c6 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
Body: None
Response - Headers: {'date': 'Sat, 05 Aug 2023 16:08:43 GMT', 'server': 'Apache', 'content-length': '1024', 'content-md5': 'eeb8857baba3a1928b130eb022bcaa55', 'x-openstack-request-id': 'req-3d502f08-a8c0-42da-93e7-51ebf87913c6', 'connection': 'close', 'content-type': 'application/octet-stream', 'status': '200', 'content-location': 'https://[::1]:9292/v2/images/57b250ab-4f80-4063-957e-47b14610b7ff/file'}
Body: b'V\x89\x85...' _log_request_full /usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py:464
```
We haven't seen this issue when glance-api is deployed as a standalone service.
I'm wondering if we can improve some handlings to avoid this timing problem.
It seems tempest attempted to delete the image 3 times and glance returned 500. I don't see further attempts made by tempest since then.
~~~ 2023:16: 08:51 +0000] "DELETE /v2/images/ 57b250ab- 4f80-4063- 957e-47b14610b7 ff HTTP/1.1" 500 186 "-" "python- urllib3/ 1.26.5" 2023:16: 08:51 +0000] "DELETE /v2/images/ 57b250ab- 4f80-4063- 957e-47b14610b7 ff HTTP/1.1" 500 186 "-" "python- urllib3/ 1.26.5" 2023:16: 08:51 +0000] "DELETE /v2/images/ 57b250ab- 4f80-4063- 957e-47b14610b7 ff HTTP/1.1" 500 186 "-" "python- urllib3/ 1.26.5"
::1 - - [05/Aug/
::1 - - [05/Aug/
::1 - - [05/Aug/
~~~
We might have to fix/tune tempest to improve this behavior.