testtools

test_spinner.TestRunInReactor.test_clean_running_threads fails with an extra thread

Bug #666345 reported by Martin Packman on 2010-10-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	testtools	Fix Released	Critical	Robert Collins	testtools 0.9.8

Bug Description

======================================================================
FAIL: testtools.tests.test_spinner.TestRunInReactor.test_clean_running_threads
----------------------------------------------------------------------
Text attachment: traceback
------------
Traceback (most recent call last):
  File "C:\bzr\testtools\testtools\runtest.py", line 143, in _run_user
    return fn(*args)
  File "C:\bzr\testtools\testtools\testcase.py", line 539, in _run_test_method
    return self._get_test_method()()
  File "C:\bzr\testtools\testtools\tests\test_spinner.py", line 241, in test_clean_running_threads
    self.assertThat(list(threading.enumerate()), Equals(current_threads))
  File "C:\bzr\testtools\testtools\testcase.py", line 384, in assertThat
    self.fail('Match failed. Matchee: "%s"\nMatcher: %s\nDifference: %s\n'
AssertionError: Match failed. Matchee: "[<_MainThread(MainThread, started)>, <Thread(PoolThread-twisted.internet.reactor-1, stopped)>]"
Matcher: Equals([<_MainThread(MainThread, started)>])
Difference: !=:
reference = [<_MainThread(MainThread, started)>]
actual = [<_MainThread(MainThread, started)>,
<Thread(PoolThread-twisted.internet.reactor-1, stopped)>]

This may be related to my OS, my Python version (2.4) or my twisted version (8.1.0).

Tags:

Related branches

lp://qastaging/~testtools-committers/testtools/trunk

Revision history for this message

Jonathan Lange (jml) wrote on 2010-10-31:

Thanks for the report. The test looks to be passing on Hudson, so I think we can rule out Python 2.4.

* Do you get the failure reliably, or does it only happen from time to time?
* Could you please try with latest trunk and see if you still get the failure?
* If trunk still fails, could you please try this patch:

=== modified file 'testtools/tests/test_spinner.py'
--- testtools/tests/test_spinner.py 2010-10-26 10:42:59 +0000
+++ testtools/tests/test_spinner.py 2010-10-31 17:32:33 +0000
@@ -237,7 +237,7 @@ class TestRunInReactor(TestCase):
         reactor = self.make_reactor()
         timeout = self.make_timeout()
         spinner = self.make_spinner(reactor)
- spinner.run(timeout, reactor.callInThread, time.sleep, timeout / 2.0)
+ spinner.run(timeout * 10, reactor.callInThread, time.sleep, timeout)
         self.assertThat(list(threading.enumerate()), Equals(current_threads))

def test_leftover_junk_available(self):

Changed in testtools:
importance:	Undecided → High
status:	New → Incomplete

Revision history for this message

Martin Packman (gz) wrote on 2010-11-02:

Failure is completely reliable here, and is unchanged on trunk. Nice idea on the timeout, and yeah make_timeout should probably be using multiples of 1.0/60 on windows. Still always hit the race though, which appears to be in the spinner cleanup code.

Not sure exactly what this test is trying to check, but as background threads being spawned during the test run would make it fail anyway, just asserting we have one live thread works:

=== modified file 'testtools/tests/test_spinner.py'
--- testtools/tests/test_spinner.py 2010-10-26 10:42:59 +0000
+++ testtools/tests/test_spinner.py 2010-11-02 17:13:00 +0000
@@ -233,12 +233,11 @@
     def test_clean_running_threads(self):
         import threading
         import time
- current_threads = list(threading.enumerate())
         reactor = self.make_reactor()
         timeout = self.make_timeout()
         spinner = self.make_spinner(reactor)
         spinner.run(timeout, reactor.callInThread, time.sleep, timeout / 2.0)
- self.assertThat(list(threading.enumerate()), Equals(current_threads))
+ self.assertThat(sum(t.isAlive() for t in threading.enumerate()), Equals(1))

Alternatively, can add a fudge to lose the race:

=== modified file 'testtools/tests/test_spinner.py'
--- testtools/tests/test_spinner.py 2010-10-26 10:42:59 +0000
+++ testtools/tests/test_spinner.py 2010-11-02 17:23:08 +0000
@@ -237,7 +240,8 @@
         reactor = self.make_reactor()
         timeout = self.make_timeout()
         spinner = self.make_spinner(reactor)
         spinner.run(timeout, reactor.callInThread, time.sleep, timeout / 2.0)
+ time.sleep(0)
         self.assertThat(list(threading.enumerate()), Equals(current_threads))

def test_leftover_junk_available(self):

Changed in testtools:
status:	Incomplete → Confirmed

Revision history for this message

Jonathan Lange (jml) wrote on 2010-11-02:

Thanks Martin. I'm targeting this to "next", because I consider it a release blocker.

Changed in testtools:
status:	Confirmed → Triaged
milestone:	none → next

Revision history for this message

Jonathan Lange (jml) wrote on 2010-12-10:

Without a Windows machine I'm stuck. We might have to release with this bug still open.

tags:

added: win32

Revision history for this message

Robert Collins (lifeless) wrote on 2010-12-10:

Well, Martin has supplied a couple of options; my guess is thread priority differences on windows, FWIW. Or perhaps there is a missing join() somewhere and we're relying on gc behaviour - actually this seems very plausible to me.

Changed in testtools:
importance:	High → Critical

Revision history for this message

Martin Packman (gz) wrote on 2010-12-11:

I'm happy to try running whatever branches you'd like Jonathan, I'm just not steeped in enough twisted lore to know how to debug an issue that goes away under pdb.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-12-13:

This is a twisted bug in the suggestPoolSize code path - it does not synchronise thread removal from the thread pool.

in threadpool.py:
_worker will remove the thread from self.threads when it receives WorkerStop
stop will join() threads that *are* in self.threads when stop was called
threads that *are stopped due to pool size changes are never join()ed*.

So the race is this:
- _spinner calls suggestPoolSize(0) which asks threads to stop. These threads are not joined. ever.
- threads go to thread heaven by removing themselves from .threads
- _spinner triggers a stop()
- which sees an empty threads dict *IFF* the test worker had already seen WorkerStop
- if stop saw an empty threads dict, it doesn't call join() and the thread object is still around.

We need to work around this, and upstream twisted needs to actually join all threads, not just those that happened to be running at stop() being called.

To work around it, stop calling self._reactor.suggestThreadPoolSize(0)

Revision history for this message

Robert Collins (lifeless) wrote on 2010-12-13:

E.g.

=== modified file 'testtools/_spinner.py'
--- testtools/_spinner.py 2010-11-30 17:07:56 +0000
+++ testtools/_spinner.py 2010-12-13 20:13:47 +0000
@@ -232,7 +232,6 @@
             # we aren't going to bother.
             junk.append(selectable)
         if IReactorThreads.providedBy(self._reactor):
- self._reactor.suggestThreadPoolSize(0)
             if self._reactor.threadpool is not None:
                 self._reactor._stopThreadPool()
         self._junk.extend(junk)

should fix it.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-12-13:

Oh, and its worth noting that this is a genuine bug: on windows, due to scheduler differences, we'd be leaking threads very quickly for twisted threadpool using test suites.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-12-13:

#10

I've landed the proposed fix, waiting on confirmation from a windows user that it is actually fixed.

Changed in testtools:
assignee:	nobody → Robert Collins (lifeless)
status:	Triaged → Fix Committed

Robert Collins (lifeless) on 2010-12-13

Changed in testtools:
status:	Fix Committed → In Progress

Robert Collins (lifeless) on 2010-12-14

Changed in testtools:
status:	In Progress → Fix Committed

Robert Collins (lifeless) on 2010-12-18

Changed in testtools:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.