Nova informs Placement too early upon Ironic instance deletion
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
In Progress
|
Low
|
Unassigned |
Bug Description
Description
===========
When an instance is deleted, it seems that Nova calls back into Placement and the corresponding resource provider becomes available again right away. For Ironic instances, the deletion is not instantaneous, however, and the node is not available at this point. Instance creation will hence fail. This is fixed once the resource tracker comes along and corrects the information in placement, but with hundreds of nodes and the way the resource tracker handles them, this leaves a window of several minutes.
Steps to reproduce
==================
Delete an instance and compare the state of the resource provider in placement with the provision state of the node:
# openstack baremetal node show --fit c55cb55d-
+------
| Field | Value |
+------
| provision_state | clean wait |
+------
# OS_PLACEMENT_
+---+--
| # | allocation | resource provider | inventory used/capacity |
+---+--
| 1 | CUSTOM_
+---+--
Expected result
===============
The resource provider should not become available in placement before the Ironic node moved to provision state available.
Actual result
=============
The resource provider is available in placement while the Ironic node is not in provision state available yet.
Environment
===========
Ironic on Train, Nova on Stein
tags: | added: ironic placement resource-tracker |
In theory, we first call _shutdown_ instance( ) [1] before destroying the instance which disallocate the resources [2].
When we call driver.destroy() in shutdown_ instance( ), we asynchronously call the Ironic API to unprovision the node and we hold until we are sure that the node is unprovisioned [3]
[1] https:/ /github. com/openstack/ nova/blob/ 90777d7/ nova/compute/ manager. py#L2980 /github. com/openstack/ nova/blob/ 90777d7/ nova/compute/ manager. py#L3013 /github. com/openstack/ nova/blob/ 90777d7/ nova/virt/ ironic/ driver. py#L1317
[2] https:/
[3] https:/
Could you please check the compute logs and tell us whether the timings tell us that the instance was destroyed *before* the node was unprovisioned ?