Flip Instance to Failed if Security Group or Default Rule Creation Fail During 'Instance Create'

Bug #1183519 reported by Saurabh Surana
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Fix Released
Medium
Denis M.

Bug Description

When trove fails to create security group (or its default rule) for a instance create action, it fails to perform clean rollback (either turn the instance to error state or delete the entry of the instance from the instances table). As the side effect of this, there is one instance added in BUILD state to the instances table, which never changes its state (and user cannot delete it).

Steps to reproduce:
1. Whatever quota for security groups on nova, create those many security groups under user's account on nova.
2. Now try to create a new instance through reddwarf API.
3. Instance creation will fail and API will return an error:
{"badRequest": {"message": "SecurityGroupLimitExceeded: Quota exceeded, too many security groups. (HTTP 413)
4. Now try to list the instances. You will see a new instance added to your account which is in BUILD state.

The possibility of somebody running into this issues is very less, but if for some reason security groups don't get deleted on instance DELETE then this error will pop up or if use is using the same account for trove instances and nova instances then possibility of seeing this issue is higher.

Changed in reddwarf:
assignee: nobody → Nikhil Manchanda (slicknik)
Changed in trove:
assignee: Nikhil Manchanda (slicknik) → nobody
importance: Undecided → Medium
Changed in trove:
assignee: nobody → Auston McReynolds (amcreynolds)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/45141

Changed in trove:
status: New → In Progress
summary: - Complete Rollback is not performed on failure Security Group to create
- Security Group at the time of instance creation
+ Flip Instance to Failed if Security Group or Default Rule Creation Fail
+ During 'Instance Create'
description: updated
Revision history for this message
Denis M. (dmakogon) wrote :

I have some ideas how to avoid this bug.
In trove-taskmanager we have method that waits until GA reports with any instance status. If timeout happaned, we could, in exception section:
1. Delete SecGr.
2. Delete floating ip
3. Delete server itself.
4. Delete records from database.

Revision history for this message
Denis M. (dmakogon) wrote :
Revision history for this message
Auston McReynolds (amcrn) wrote :

dmakogon: I agreed that a more general approach needs to be implemented, one that considers the termination and/or deassociation of artifacts. The scope of what you're suggesting is much larger than the targeted fix, and requires a well-documented bug or blueprint detailing what's eligible for deletion, etc. Can you please file a new bug or blueprint and detail your requirements? Thanks!

Revision history for this message
Denis M. (dmakogon) wrote :

Ok, i'll do that tomorrow(6.09.2013)

Revision history for this message
Denis M. (dmakogon) wrote :

Your propasl is only healing simptome. Main issue is ghost-instance-components. So if you want to get merged, i'm suggesting you to raise exception without any notes in database. Security group wouldn't be created it quota exceeded, and it nova-related stuff, so perform catch there, but not here, it's really segnificant.
So, when i finish implementing complete rollback all security group/role creating will be droped from trove-api and moved to taskmanager, because it is architecturaly should be there. If you take a look at ay API-components you will see, that they are only creating rpc calls and only. Because of immaturity, trove has alot of issues, and this is one of them, and you making it more complicated.

Revision history for this message
Denis M. (dmakogon) wrote :
Changed in trove:
status: In Progress → Incomplete
Revision history for this message
Denis M. (dmakogon) wrote :
Denis M. (dmakogon)
Changed in trove:
status: Incomplete → Fix Committed
assignee: Auston McReynolds (amcrn) → Denis M. (dmakogon)
Thierry Carrez (ttx)
Changed in trove:
milestone: none → icehouse-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in trove:
milestone: icehouse-1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.