destroy-controller timeout

Bug #1784876 reported by Chris MacNaughton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

When a controller has volumes, and Juju cannot delete the volumes, the destroy-controller controller will get stuck forever (or maybe it is only for the days that I've left it running) trying to destroy the controller, rather than hitting a timeout and exiting non-zero. If juju would exit non-zero in this case, additional tooling can be used to remove bits left around but instead, juju blocks forever

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This is using 2.4.0-xenial-amd64

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.4.2
importance: Undecided → High
status: New → Triaged
Changed in juju:
milestone: 2.4.2 → 2.4.3
Changed in juju:
milestone: 2.4.3 → none
Changed in juju:
milestone: none → 2.4.4
Changed in juju:
milestone: 2.4.4 → none
Ian Booth (wallyworld)
tags: added: teardown
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Chris MacNaughton,

We have done a lot of interesting re-work around destruction in Juju 2.6.

Is it possible for you to re-try with the new version?

If you are still experiencing difficulties, could you please provide a reproducible scenario and controller logs?

I'll mark this report as Incomplete for now.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

@anastasia-macmood We will likely retarget our test automation to juju-2.6 soon, but it doesn't sound like Juju's added controller destruction timeouts?

At the moment, we're just using `timeout juju destroy-controller ...` as a workaround but it would be nice to have a more official method to handle that.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Chris MacNaughton (chris.macnaughton),

It's true that the destruction of the controller itself was not touched. However, the destruction of individual entities that a controller would destroy anyway may have been. Consequently, it's possible that there is an improvement in your experience simply because of that work.

However, I will keep this report around and see if the timeout is something we can address.

Is it possible for you to give us a reproducible scenario with volumes as you've described?

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

The issue with undeletable volumes is an issue in OpenStack policies - when a volume gets stuck (usually in creating, weirdly), then a normal tenant cannot delete their own stuck volume - a cloud admin must delete the volume.

This is an entirely intermittent issue that we see in UOSCI (maybe 1 in 250 volumes get stuck this way) but, when there is a stuck volume, it is a manual process to un-stick things by deleting the volume as a cloud admin and then killing the controller outside of our automated processes.

The issue that leads to this bug being filed is that the `juju destroy-controller` command will never timeout in this state, and must be killed via an additional command like `timeout`. Because of the policy issue mentioned above, there is no way for the OpenStack tenant that the Juju Controller is using to destroy the stuck volume.

Changed in juju:
status: Incomplete → Triaged
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.