Comment 1 for bug 1526735

Revision history for this message
Vladyslav Drok (vdrok) wrote :

Copy of whiteboard:

"the API won't allow clients to register a node with an invalid driver"
-- I tested this today, and the API still allowed it. So I have filed this review to fix it:
   https://review.openstack.org/68018

I see a problem with both your proposed solutions.
[1] where does this periodic_task run? If it runs on all conductors, which one decides what nodes to mark offline?
[2] again, which surviving conductor is responsible for marking the nodes-now-owned-by-no-one as dead?

Take the extreme case -- what if all conductors are offline. Thus all nodes are unavailable, since the hash won't map any node to any where (there will be no drivers in the ring, right?).

I do not think this should be "conductor marks a node inactive in the database". Instead, I think we need to:
1) ensure that the nova driver only gets a list of actually-available nodes, and will remove no-longer-available nodes from its list, during each cycle where it refreshes the view of available resources
2) gracefully handle requests to the API to manage nodes which no longer have any active conductor.

I think that patch https://review.openstack.org/68018 goes to some degree to handle (2), but it may need more work. I suspect we have the means already (or most of it) to do (1) as well, but not sure if that's in the Nova driver or not.

Just my thoughts,
Devananda 2014-01-20