[ironic] list_instances/list_instance_uuid does not respect conductor_group/partition_key
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
The methods on the Ironic driver, list_instances and list_instance_uuids are not currently respecting the conductor_group option: https:/
This leads to significant performance degradation, as querying Ironic for all nodes (/v1/nodes) instead of all nodes managed by the compute (/v1/nodes?
In addition, this can lead to unexpected behavior for operators, such as an action being taken by a compute serving conductor group "A" to resolve an issue that would normally be resolved by a compute service conductor group "B".
While troubleshooting this error, we dug deeply into what this data is used for; it's used for two things:
- Reconciling deleted instances as a periodic job
- Ensuring no instances exist on a newly-started compute host
These are tasks which either could use stale data or would not be impacted by using the Ironic driver's existing node cache. Therefore, a suggested fix is:
Revise list_instances and list_instance_uuids to reuse the node cache to reduce the overall API calls being made to Ironic, and ensure all /v1/nodes calls use the same codepath in the Ironic driver. It's the belief of JayF, TheJulia, and Johnthetubaguy (on a video call right now) that using stale data, without refreshing the cache, should be safe for these use cases. (Even if we decide to refresh the cache, we should use this code path anyway.)
Changed in ironic: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
Changed in ironic: | |
assignee: | nobody → Jay Faulkner (jason-oldos) |
Changed in ironic: | |
status: | Confirmed → Triaged |
no longer affects: | ironic |
Fix proposed to branch: master /review. opendev. org/c/openstack /nova/+ /900831
Review: https:/