Threadgroup: bug concerning the wait/stop methods

Bug #1196525 reported by Raphael.G
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
oslo-incubator
Fix Released
Undecided
Jason Dunsmore

Bug Description

The wait method does not always wait for all the (green)threads to be processed and the stop method does not always stop all threads.

Reason: in the method we iterate over the self.threads list. The problem is that this list can potentially be concurrently edited by the _thread_done callbacks, that remove some items in the list and thus can fool the iterator in the wait method.

1. Below a basic "test" to reproduce the problem (the reason why I have imbricated threadgroups is that it is the only way I managed to reproduce the issue. Statistically, I had to wait for around one minute to see the issue happen: END STUFF printed after the WAIT STUFF things).

2. Attached a correction proposal: one line in the ThreadGroup.wait/stop methods, shallow copy of the list before we iterate over it.
In a thread environment, I'm not even sure this would really fix the bug: indeed, if the shallow copy itself was not thread-safe, we would have to synchronize the _thread_done callbacks with the copy.
In a green thread environment, I may be wrong, but I think the shallow copy fixes the issue: we are saved by the fact that there won't be any greenthread switching during the shallow copy (no IO, nor explicit greenthread switching method called). The copy is thus greenthread safe

#######################################################################################
#TEST
#######################################################################################
#!/usr/bin/env python

import eventlet
eventlet.monkey_patch()
import random
from openstack.common import threadgroup

class A(object):
    def __init__(self):
        #self.thread_pool = eventlet.GreenPool()
        self.tg = threadgroup.ThreadGroup(100)

    def test1(self, i):
        eventlet.sleep(random.randint(1,3))
        print "END TEST %s" % i

    def run_test(self):

        while True:
            print "#"*100
# threads = []
            for i in range(0,100):
# threads.append(self.thread_pool.spawn(self.test1, i))
                self.tg.add_thread(self.test1, i)
            self.tg.wait()
# for thd in threads:
# thd.wait()
            print "WAIT SELF TG"
            eventlet.sleep(3)

if __name__ == '__main__':
    a = A()
    tg = threadgroup.ThreadGroup()
    tg.add_thread(a.run_test)
    tg.wait()

Raphael.G (raphael-g)
description: updated
description: updated
Raphael.G (raphael-g)
description: updated
Revision history for this message
Raphael.G (raphael-g) wrote :
summary: - Threadgroup: bug concerning the wait method
+ Threadgroup: bug concerning the wait/stop methods
description: updated
Ben Nemec (bnemec)
Changed in oslo:
status: New → Confirmed
Changed in oslo:
assignee: nobody → Bertrand Lallau (bertrand-lallau)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (master)

Fix proposed to branch: master
Review: https://review.openstack.org/60096

Changed in oslo:
assignee: Bertrand Lallau (bertrand-lallau) → Jason Dunsmore (jasondunsmore)
status: Confirmed → In Progress
Revision history for this message
Jason Dunsmore (jasondunsmore) wrote :

I reproduced this bug in a separate set of tests. See "Test 2":
http://paste.openstack.org/show/k8ZeXFPk9gIrhCHwkVvF/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (master)

Reviewed: https://review.openstack.org/60096
Committed: http://github.com/openstack/oslo-incubator/commit/25ff65e955b7ef1091722224f7c71f14fc971354
Submitter: Jenkins
Branch: master

commit 25ff65e955b7ef1091722224f7c71f14fc971354
Author: Jason Dunsmore <email address hidden>
Date: Wed Dec 4 16:13:57 2013 -0600

    Make wait & stop methods work on all threads

    Previously, ThreadGroup.wait and ThreadGroup.stop would iterate over
    the self.threads list and call the wait or stop method on each one.
    This list could potentially be edited concurrently by the _thread_done
    callbacks and remove some items in the list. As a result, the wait
    method would not actually wait for all threads in the thread group.

    Change-Id: Iad5d3e47c8235eadd52a8193657aebb92158568b
    Closes-Bug: #1196525

Changed in oslo:
status: In Progress → Fix Committed
Changed in oslo:
milestone: none → icehouse-2
Thierry Carrez (ttx)
Changed in oslo:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in oslo:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.