test_spinner.TestRunInReactor.test_clean_running_threads fails with an extra thread

Bug #666345 reported by Martin Packman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
testtools
Fix Released
Critical
Robert Collins

Bug Description

======================================================================
FAIL: testtools.tests.test_spinner.TestRunInReactor.test_clean_running_threads
----------------------------------------------------------------------
Text attachment: traceback
------------
Traceback (most recent call last):
  File "C:\bzr\testtools\testtools\runtest.py", line 143, in _run_user
    return fn(*args)
  File "C:\bzr\testtools\testtools\testcase.py", line 539, in _run_test_method
    return self._get_test_method()()
  File "C:\bzr\testtools\testtools\tests\test_spinner.py", line 241, in test_clean_running_threads
    self.assertThat(list(threading.enumerate()), Equals(current_threads))
  File "C:\bzr\testtools\testtools\testcase.py", line 384, in assertThat
    self.fail('Match failed. Matchee: "%s"\nMatcher: %s\nDifference: %s\n'
AssertionError: Match failed. Matchee: "[<_MainThread(MainThread, started)>, <Thread(PoolThread-twisted.internet.reactor-1, stopped)>]"
Matcher: Equals([<_MainThread(MainThread, started)>])
Difference: !=:
reference = [<_MainThread(MainThread, started)>]
actual = [<_MainThread(MainThread, started)>,
 <Thread(PoolThread-twisted.internet.reactor-1, stopped)>]

This may be related to my OS, my Python version (2.4) or my twisted version (8.1.0).

Tags: win32
Revision history for this message
Jonathan Lange (jml) wrote :

Thanks for the report. The test looks to be passing on Hudson, so I think we can rule out Python 2.4.

 * Do you get the failure reliably, or does it only happen from time to time?
 * Could you please try with latest trunk and see if you still get the failure?
 * If trunk still fails, could you please try this patch:

=== modified file 'testtools/tests/test_spinner.py'
--- testtools/tests/test_spinner.py 2010-10-26 10:42:59 +0000
+++ testtools/tests/test_spinner.py 2010-10-31 17:32:33 +0000
@@ -237,7 +237,7 @@ class TestRunInReactor(TestCase):
         reactor = self.make_reactor()
         timeout = self.make_timeout()
         spinner = self.make_spinner(reactor)
- spinner.run(timeout, reactor.callInThread, time.sleep, timeout / 2.0)
+ spinner.run(timeout * 10, reactor.callInThread, time.sleep, timeout)
         self.assertThat(list(threading.enumerate()), Equals(current_threads))

     def test_leftover_junk_available(self):

Changed in testtools:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Martin Packman (gz) wrote :

Failure is completely reliable here, and is unchanged on trunk. Nice idea on the timeout, and yeah make_timeout should probably be using multiples of 1.0/60 on windows. Still always hit the race though, which appears to be in the spinner cleanup code.

Not sure exactly what this test is trying to check, but as background threads being spawned during the test run would make it fail anyway, just asserting we have one live thread works:

=== modified file 'testtools/tests/test_spinner.py'
--- testtools/tests/test_spinner.py 2010-10-26 10:42:59 +0000
+++ testtools/tests/test_spinner.py 2010-11-02 17:13:00 +0000
@@ -233,12 +233,11 @@
     def test_clean_running_threads(self):
         import threading
         import time
- current_threads = list(threading.enumerate())
         reactor = self.make_reactor()
         timeout = self.make_timeout()
         spinner = self.make_spinner(reactor)
         spinner.run(timeout, reactor.callInThread, time.sleep, timeout / 2.0)
- self.assertThat(list(threading.enumerate()), Equals(current_threads))
+ self.assertThat(sum(t.isAlive() for t in threading.enumerate()), Equals(1))

Alternatively, can add a fudge to lose the race:

=== modified file 'testtools/tests/test_spinner.py'
--- testtools/tests/test_spinner.py 2010-10-26 10:42:59 +0000
+++ testtools/tests/test_spinner.py 2010-11-02 17:23:08 +0000
@@ -237,7 +240,8 @@
         reactor = self.make_reactor()
         timeout = self.make_timeout()
         spinner = self.make_spinner(reactor)
         spinner.run(timeout, reactor.callInThread, time.sleep, timeout / 2.0)
+ time.sleep(0)
         self.assertThat(list(threading.enumerate()), Equals(current_threads))

     def test_leftover_junk_available(self):

Changed in testtools:
status: Incomplete → Confirmed
Revision history for this message
Jonathan Lange (jml) wrote :

Thanks Martin. I'm targeting this to "next", because I consider it a release blocker.

Changed in testtools:
status: Confirmed → Triaged
milestone: none → next
Revision history for this message
Jonathan Lange (jml) wrote :

Without a Windows machine I'm stuck. We might have to release with this bug still open.

tags: added: win32
Revision history for this message
Robert Collins (lifeless) wrote :

Well, Martin has supplied a couple of options; my guess is thread priority differences on windows, FWIW. Or perhaps there is a missing join() somewhere and we're relying on gc behaviour - actually this seems very plausible to me.

Changed in testtools:
importance: High → Critical
Revision history for this message
Martin Packman (gz) wrote :

I'm happy to try running whatever branches you'd like Jonathan, I'm just not steeped in enough twisted lore to know how to debug an issue that goes away under pdb.

Revision history for this message
Robert Collins (lifeless) wrote :

This is a twisted bug in the suggestPoolSize code path - it does not synchronise thread removal from the thread pool.

in threadpool.py:
_worker will remove the thread from self.threads when it receives WorkerStop
stop will join() threads that *are* in self.threads when stop was called
threads that *are stopped due to pool size changes are never join()ed*.

So the race is this:
 - _spinner calls suggestPoolSize(0) which asks threads to stop. These threads are not joined. ever.
 - threads go to thread heaven by removing themselves from .threads
 - _spinner triggers a stop()
 - which sees an empty threads dict *IFF* the test worker had already seen WorkerStop
 - if stop saw an empty threads dict, it doesn't call join() and the thread object is still around.

We need to work around this, and upstream twisted needs to actually join all threads, not just those that happened to be running at stop() being called.

To work around it, stop calling self._reactor.suggestThreadPoolSize(0)

Revision history for this message
Robert Collins (lifeless) wrote :

E.g.

=== modified file 'testtools/_spinner.py'
--- testtools/_spinner.py 2010-11-30 17:07:56 +0000
+++ testtools/_spinner.py 2010-12-13 20:13:47 +0000
@@ -232,7 +232,6 @@
             # we aren't going to bother.
             junk.append(selectable)
         if IReactorThreads.providedBy(self._reactor):
- self._reactor.suggestThreadPoolSize(0)
             if self._reactor.threadpool is not None:
                 self._reactor._stopThreadPool()
         self._junk.extend(junk)

should fix it.

Revision history for this message
Robert Collins (lifeless) wrote :

Oh, and its worth noting that this is a genuine bug: on windows, due to scheduler differences, we'd be leaking threads very quickly for twisted threadpool using test suites.

Revision history for this message
Robert Collins (lifeless) wrote :

I've landed the proposed fix, waiting on confirmation from a windows user that it is actually fixed.

Changed in testtools:
assignee: nobody → Robert Collins (lifeless)
status: Triaged → Fix Committed
Changed in testtools:
status: Fix Committed → In Progress
Changed in testtools:
status: In Progress → Fix Committed
Changed in testtools:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.