Comment 10 for bug 1798688

Revision history for this message
Matt Riedemann (mriedem) wrote :

Well, https://review.openstack.org/#/c/583667/ looks to have changed the logic in processing the error response so that we aren't properly retrying. This is the error from placement:

Nov 06 19:48:37.013780 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: WARNING nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] Failed to save allocation for 6665f00a-dcf1-4286-b075-d7dcd7c37487. Got HTTP 409: {"errors": [{"status": 409, "request_id": "req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2", "code": "placement.concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 1 ", "title": "Conflict"}]}

From the placement logs, I see three different times that allocations are PUT for the consumer:

Nov 06 19:47:57.568237 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.requestlog [req-18ee3cd2-b2a1-4cf4-965f-10030c8e5f6c req-014d6d95-06da-489f-a2fe-8e546567974d service placement] Starting request: 198.72.124.206 "PUT /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" {{(pid=7281) __call__ /opt/stack/nova/nova/api/openstack/placement/requestlog.py:38}}
Nov 06 19:47:57.617423 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.handlers.allocation [req-18ee3cd2-b2a1-4cf4-965f-10030c8e5f6c req-014d6d95-06da-489f-a2fe-8e546567974d service placement] Successfully wrote allocations AllocationList[Allocation(consumer=Consumer(6665f00a-dcf1-4286-b075-d7dcd7c37487),created_at=<?>,id=150,resource_class='VCPU',resource_provider=ResourceProvider(3ceb7eab-549c-40ba-a70c-320822c310ab),updated_at=<?>,used=1), Allocation(consumer=Consumer(6665f00a-dcf1-4286-b075-d7dcd7c37487),created_at=<?>,id=151,resource_class='MEMORY_MB',resource_provider=ResourceProvider(3ceb7eab-549c-40ba-a70c-320822c310ab),updated_at=<?>,used=64)] {{(pid=7281) _create_allocations /opt/stack/nova/nova/api/openstack/placement/handlers/allocation.py:441}}
Nov 06 19:47:57.617981 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: INFO nova.api.openstack.placement.requestlog [req-18ee3cd2-b2a1-4cf4-965f-10030c8e5f6c req-014d6d95-06da-489f-a2fe-8e546567974d service placement] 198.72.124.206 "PUT /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" status: 204 len: 0 microversion: 1.29

The first one has to be when initially scheduling the instance.

Then we should shelve offload the instance and delete it's allocations, but I'm not seeing a DELETE allocations request for consumer 6665f00a-dcf1-4286-b075-d7dcd7c37487 in the placement logs.

I then see a second PUT allocations request:

Nov 06 19:48:36.931411 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.requestlog [req-ef6a4e27-111b-4f9c-864d-5ab710954ab5 req-e32df7fa-ef0c-4e31-9428-d5bcd0b3c9be service placement] Starting request: 198.72.124.206 "PUT /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" {{(pid=7281) __call__ /opt/stack/nova/nova/api/openstack/placement/requestlog.py:38}}
Nov 06 19:48:36.956513 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.requestlog [req-f266a0ff-2840-413d-9877-4500e61512f5 req-7461e49d-5b5f-45b7-8ae2-84704675e9fb service placement] Starting request: 198.72.124.206 "GET /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" {{(pid=7282) __call__ /opt/stack/nova/nova/api/openstack/placement/requestlog.py:38}}
Nov 06 19:48:36.966890 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.handlers.allocation [req-ef6a4e27-111b-4f9c-864d-5ab710954ab5 req-e32df7fa-ef0c-4e31-9428-d5bcd0b3c9be service placement] Successfully wrote allocations AllocationList[Allocation(consumer=Consumer(6665f00a-dcf1-4286-b075-d7dcd7c37487),created_at=<?>,id=150,resource_class='VCPU',resource_provider=ResourceProvider(3ceb7eab-549c-40ba-a70c-320822c310ab),updated_at=<?>,used=0), Allocation(consumer=Consumer(6665f00a-dcf1-4286-b075-d7dcd7c37487),created_at=<?>,id=151,resource_class='MEMORY_MB',resource_provider=ResourceProvider(3ceb7eab-549c-40ba-a70c-320822c310ab),updated_at=<?>,used=0)] {{(pid=7281) _create_allocations /opt/stack/nova/nova/api/openstack/placement/handlers/allocation.py:441}}
Nov 06 19:48:36.967449 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: INFO nova.api.openstack.placement.requestlog [req-ef6a4e27-111b-4f9c-864d-5ab710954ab5 req-e32df7fa-ef0c-4e31-9428-d5bcd0b3c9be service placement] 198.72.124.206 "PUT /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" status: 204 len: 0 microversion: 1.28

Which is successful. And then a third which fails:

Nov 06 19:48:36.994159 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.requestlog [req-f266a0ff-2840-413d-9877-4500e61512f5 req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2 service placement] Starting request: 198.72.124.206 "PUT /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" {{(pid=7281) __call__ /opt/stack/nova/nova/api/openstack/placement/requestlog.py:38}}
Nov 06 19:48:37.012282 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: DEBUG nova.api.openstack.placement.wsgi_wrapper [req-f266a0ff-2840-413d-9877-4500e61512f5 req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2 service placement] Placement API returning an error response: consumer generation conflict - expected null but got 1 {{(pid=7281) call_func /opt/stack/nova/nova/api/openstack/placement/wsgi_wrapper.py:31}}
Nov 06 19:48:37.012622 ubuntu-xenial-inap-mtl01-0000379614 <email address hidden>[7279]: INFO nova.api.openstack.placement.requestlog [req-f266a0ff-2840-413d-9877-4500e61512f5 req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2 service placement] 198.72.124.206 "PUT /placement/allocations/6665f00a-dcf1-4286-b075-d7dcd7c37487" status: 409 len: 280 microversion: 1.29

That third and failed PUT allocations request is from the scheduler, but I'm not sure where the 2nd PUT request is coming from. Looking at the request ID from the 2nd request, it looks like that's coming from the compute, probably to remove the allocations:

http://logs.openstack.org/26/615126/5/check/tempest-full/69d913a/controller/logs/screen-n-cpu.txt.gz#_Nov_06_19_48_28_559759

And that explains why I don't see the DELETE allocations, I see a PUT with {} because of this change:

https://review.openstack.org/#/c/591597/

So the consumer persists in placement across shelve/unshelve, and the consumer has a generation we're not accounting for during unshelve.