Race condition in network/deallocate_for_instance() leads to security issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Phil Day | ||
Essex |
Fix Released
|
High
|
Unassigned | ||
nova (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Precise |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
During compute/
However because original call to the network manager is a cast there is a chance that the compute manager will delete the instance record in the DB before the compute API (in the network manager) tries to retrieve the instance. At this point the security group refresh fails (leaving rules in place which are a security risk when the IP is reused), and potentially stopping other aspects of the network deallocation from completing.
This is one of the few places left in the network api where a cast is used, and it seems to me that an rpc.call would be better – the compute manager really needs to know that the network has been properly deallocated and cleaned up before it marks the VM as deleted.
In addition to this race condition, in general if there is an error during network clean up it is better that the VM remains in an error condition so that delete can be retried than for the VM to be deleted with the network left in some indeterminate state with no trigger available to retry the clean up. Hence some exception handling for the call to deallocate_
Note that aside from this specific use of a cast there are 4 other casts in the network API:
add_fixed_
remove_
add_network_
release_
Maybe these should all be considered for conversion to rpc.call to avoid race conditions and inconsistencies between the compute and network manager.
Related branches
- Chuck Short: Pending requested
-
Diff: 56 lines (+14/-4)3 files modifieddebian/changelog (+8/-0)
debian/control (+6/-3)
debian/nova-console.install (+0/-1)
Changed in nova: | |
assignee: | nobody → Phil Day (philip-day) |
status: | Triaged → In Progress |
Changed in nova: | |
milestone: | none → folsom-3 |
status: | Fix Committed → Fix Released |
Changed in nova (Ubuntu): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Precise): | |
status: | New → Confirmed |
Changed in nova: | |
milestone: | folsom-3 → 2012.2 |
Agreed, lets convert everything to call. We may be sacrificing a little performance, but It will prevent a lot of annoying race conditions like this one.