"Timed out trying to delete user" resolved by heat-engine restart

Bug #1257723 reported by Mark McLoughlin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Invalid
Undecided
Unassigned
OpenStack Identity (keystone)
Fix Released
Wishlist
Unassigned

Bug Description

I left a TripleO overcloud stack running for a couple of days and when I went to delete it, it went into DELETE_FAILED

 | notcomputeConfig | 29 | state changed | DELETE_COMPLETE | 2013-12-04T12:11:25Z |
 | CompletionCondition | 30 | state changed | DELETE_COMPLETE | 2013-12-04T12:11:25Z |
 | notcompute | 31 | state changed | DELETE_IN_PROGRESS | 2013-12-04T12:11:25Z |
 | CompletionHandle | 32 | state changed | DELETE_IN_PROGRESS | 2013-12-04T12:11:32Z |
 | CompletionHandle | 33 | Error: Timed out trying to delete user | DELETE_FAILED | 2013-12-04T12:11:42Z |
 | notcompute | 34 | Deletion aborted | DELETE_FAILED | 2013-12-04T12:11:42Z |

heat-engine log shows:

Traceback (most recent call last):
  File "/opt/stack/venvs/heat/lib/python2.7/site-packages/heat/engine/resource.py", line 575, in delete
    handle_data = self.handle_delete()
  File "/opt/stack/venvs/heat/lib/python2.7/site-packages/heat/engine/signal_responder.py", line 62, in handle_delete
    self.keystone().delete_stack_user(self.resource_id)
  File "/opt/stack/venvs/heat/lib/python2.7/site-packages/heat/common/heat_keystoneclient.py", line 302, in delete_stack_user
    raise exception.Error(reason)
Error: Timed out trying to delete user

Retrying stack-delete didn't help, until I restarted heat-engine and it quickly completed

I see we cache the keystone client instance with the stack - any obvious reason for the client getting foobared in a way that re-creating it would fix it?

versions:
  heat - daddc05
  keystone - f72f369
  keystoneclient - 7abf8d2

Revision history for this message
Mark McLoughlin (markmc) wrote :

Just hit this again, but the overcloud stack wasn't running nearly so long:

 | CompletionCondition | 80 | state changed | CREATE_COMPLETE | 2013-12-04T13:29:53Z |
 | CompletionHandle | 87 | state changed | DELETE_IN_PROGRESS | 2013-12-04T16:32:07Z |
 | CompletionHandle | 88 | Error: Timed out trying to delete user | DELETE_FAILED | 2013-12-04T16:32:18Z |

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I get timeouts on deleting users when I've left my devstack running for a period of time.

running keystone-manage token_flush *and* restarting keystone seems to stop this from happening for a while.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Mark, the restarting of heat-engine probably had no effect, chances are the user was actually deleted during the first timed-out keystone operation, so the second time you attempted to delete the stack it deleted without issue since the underlying keystone user was already deleted.

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 1257723] Re: "Timed out trying to delete user" resolved by heat-engine restart

Excerpts from Steve Baker's message of 2013-12-04 21:09:28 UTC:
> I get timeouts on deleting users when I've left my devstack running for
> a period of time.
>
> running keystone-manage token_flush *and* restarting keystone seems to
> stop this from happening for a while.
>

In TripleO we flush tokens every two hours.

Revision history for this message
Dolph Mathews (dolph) wrote :

Our long term solution here is to implement this blueprint:

  https://blueprints.launchpad.net/keystone/+spec/revocation-events

Which means lightweight consequences to things like user deletion, and gets us much closer to completely ephemeral PKI tokens.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

So the issue here is that deleting a user results in all the user's tokens being revoked, which results in a timed-out a request when there is enough tokens?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

This seems _entirely_ keystone's issue. And I agree that not doing many thousands of crud events is the right way to go. Closing the Heat task, and I suggest keystone mark it as a dupe or attach it to the revocation-events blueprint.

Changed in heat:
status: New → Invalid
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

If we can fix this without Revocation Events / Non-Persistent Tokens, I am all for accepting it, however, at this point there is no good solution short of the aforementioned BPs. I'll link it to the new non-persistent token BP as well.

Changed in keystone:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Steve Martinelli (stevemar) wrote :

revocation events (https://blueprints.launchpad.net/keystone/+spec/revocation-events) has been out for a while now... marking this as fix released

Changed in keystone:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.