More discussion notes: (2:58:16 PM) dansmith: mriedem: so you found a test that confirmed the behavior of that thing? (2:58:23 PM) dansmith: mriedem: that deletes the consumer? (3:02:49 PM) mriedem: DeleteConsumerIfNoAllocsTestCase is the functional test that covers that case, (3:02:54 PM) mriedem: and it looks like a correct test to me, (3:03:05 PM) mriedem: creates 2 consumers each with 2 allocations on different resource classes, (3:03:13 PM) mriedem: clears the allocations for one of them and asserts the consumer is gone (3:03:33 PM) mriedem: i think we're just hitting a race with the shelve offloaded status change before we cleanup the allocations (3:03:43 PM) mriedem: but i've posted a couple of patches to add debug logs to help determine if that's the case (3:03:55 PM) mriedem: https://review.openstack.org/617016 (3:04:59 PM) dansmith: okay I'm not sure how we could race and see no allocations but a consumer and get that generation conflict (3:05:17 PM) dansmith: it'd be one thing if we thought the consumer was there and then disappeared out from under us (3:17:15 PM) mriedem: during unshelve the scheduler does see allocations (3:17:35 PM) mriedem: and it thinks we're doing a move (3:18:00 PM) dansmith: okay I thought you pasted a line showing that there was only one allocation going back to placement (3:18:11 PM) mriedem: there are 3 PUTs for allocations (3:18:15 PM) mriedem: 1. create the server - initial (3:18:27 PM) mriedem: 2. shelve offload - wipe the allocations to {} - which should delete the consumer (3:18:37 PM) mriedem: 3. unshelve - scheduler claims resources with the wrong consumer generation (3:18:49 PM) mriedem: and when 3 happens, the scheduler gets allocations for hte consumer and they are there, (3:18:51 PM) dansmith: ...right (3:18:59 PM) mriedem: so it uses the consumer generation (1) from those allocations (3:19:07 PM) mriedem: then i think what happens is, (3:19:09 PM) dansmith: oh, so it passes generation=1 instead of generation=0, meaning new consumer? (3:19:15 PM) mriedem: placement recreates the consumer which will have generation null (3:19:19 PM) mriedem: yes (3:19:26 PM) dansmith: okay I see (3:19:50 PM) dansmith: I thought you were seeing consumer generation was null or zero or whatever in the third put, but still getting a conflict (3:19:53 PM) dansmith: but that makes sense now (3:20:06 PM) mriedem: Nov 06 19:48:37.013780 ubuntu-xenial-inap-mtl01-0000379614 nova-scheduler[12154]: WARNING nova.scheduler.client.report [None req-f266a0ff-2840-413d-9877-4500e61512f5 tempest-ServersNegativeTestJSON-477704048 tempest-ServersNegativeTestJSON-477704048] Failed to save allocation for 6665f00a-dcf1-4286-b075-d7dcd7c37487. Got HTTP 409: {"errors": [{"status": 409, "request_id": "req-c9ba6cbd-3b6e-4e5d-b550-9588be8a49d2", "code": "placement.concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 1 ", "title": "Conflict"}]} (3:20:15 PM) mriedem: consumer generation conflict - expected null but got 1 (3:20:24 PM) mriedem: yup - so new consumer but we're passing a generation of 1 (3:20:28 PM) mriedem: from the old, now deleted consumer I'll push a patch for this.