Instances stuck in DELETING state if delete fails

Bug #1543511 reported by Radomir Dopieralski
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Low
Unassigned

Bug Description

In Liberty, when a transient error happens (such as lost connection to the database) while the compute is performing a delete of an instance, that instance is stuck in DELETING state, and cannot be deleted anymore. This persists until restarting of the compute service fixes this, as during initialization all deletes are retried, and the delete finishes.

It would be better if the service restart wasn't required.

Revision history for this message
jichenjc (jichenjc) wrote :

you might try

[root@lljcma mnadmin] # nova reset-state
usage: nova reset-state [--active] <server> [<server> ...]
error: too few arguments
Try 'nova help reset-state' for more information.

and can you provide call back as reference?

Changed in nova:
status: New → Incomplete
Revision history for this message
Radomir Dopieralski (deshipu) wrote :

Yes, I know you can recover such instances manually, but the whole point is that it would be nice if they recovered automatically after a certain timeout. I intend to work on this.

What do you mean by "call back"?

Revision history for this message
jichenjc (jichenjc) wrote :

I mean the error lead to exception, sorry a typo, it should be trace back
usually this kind of bug is handled by nova itself, most error case is we didn't include the right exception
into catch list, so the automatic revert didn't take effect
in case you provide the trace back of exception, it will be much easier to know why the automatic revert didn't work, thanks

Revision history for this message
Radomir Dopieralski (deshipu) wrote :

The errors is a lost connection to the rabbit, and no, you can't just catch and recover from it, because, well, you don't have connection to the conductor, and so no access to the database to update the state.

Revision history for this message
Sean Dague (sdague) wrote :

I think it would be fine to have another periodic task to handle stuck deleting instances.

Changed in nova:
status: Incomplete → Confirmed
importance: Undecided → Low
status: Confirmed → Triaged
Changed in nova:
assignee: nobody → Mohammed Ashraf (mohammed-asharaf)
Changed in nova:
status: Triaged → In Progress
Changed in nova:
assignee: Mohammed Ashraf (mohammed-asharaf) → nobody
Changed in nova:
status: In Progress → Confirmed
Rajesh Tailor (ratailor)
Changed in nova:
assignee: nobody → Rajesh Tailor (ratailor)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/294491

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/294491
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Rajesh Tailor (ratailor) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version liberty in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/598084

Changed in nova:
assignee: nobody → huanhongda (hongda)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/598084

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not totally sure if this is still an issue given how old this bug is so I'm going to mark it as Incomplete for now. I don't know if restarting the compute service will clean up the instance stuck in DELETING status or not, it would need to be tested (a functional recreate test for something like this would probably be good to exhibit the bug).

Changed in nova:
assignee: huanhongda (hongda) → nobody
status: In Progress → New
status: New → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.