early quorum + post_quorum_timeout miss = bogus logging

Bug #1274312 reported by Darrell Bishop
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

In the various early quorum scenarios, when the code using a GreenAsyncPile decides it has quorum and breaks out of its iteration of the GreenAsyncPile, and then when it gets tired of waiting self.app.post_quorum_timeout seconds for the laggards, the fired-off greenthread inside the GreenAsyncPile's GreenPool will happily keep "running" and bang into any *Timeout exceptions inside the code they're running, REGARDLESS OF ACTUAL SUCCESS OR FAILURE.

I don't think we should cancel any relevant *Timeouts registered with eventlet, even if that were easy, because we lose potentially useful information about the state of the cluster when the laggards *do* actually timeout.

Instead, I think GreenAsyncPile should have a new instance method called "give_up" which, when called, will spawn a single greenthread to drain the Pile. It would probably look like just spawning a greenthread which runs self.waitall with no timeout.

Alternatively, there should be a method called "wait_some_then_give_up" which is just like waitall() currently is, except that instead of just passing when catching GreenAsyncPileWaitallTimeout, it spawns a drainer if self._inflight > 0. Then returns results as normal. With this solution there would be no "give_up" method necessary. In fact, there may not even be any callers of GreenAsyncPile who do not want this behavior, in which case, the "waitall" method could be deleted and all callers changed to call "wait_some_then_give_up".

description: updated
description: updated
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.