celery hung after long git repository scan
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Triaged
|
Critical
|
Colin Watson |
Bug Description
launchpad@ackee's celery hung today. Log excerpts:
[2018-07-24 06:39:12,597: INFO/Worker-2] Running <GitRefScanJob for ~documentfounda
[2018-07-24 06:39:12,603: DEBUG3/Worker-2] commit <storm.
[2018-07-24 06:39:12,604: DEBUG3/Worker-2] commit
[2018-07-24 06:39:12,604: DEBUG3/Worker-2] new transaction
[2018-07-24 06:39:12,628: INFO/Worker-2] Starting new HTTP connection (1): git.launchpad.net
[2018-07-24 06:39:18,995: DEBUG/Worker-2] "GET /repo/2115/refs HTTP/1.1" 200 14755548
[2018-07-24 06:40:07,538: DEBUG/MainProcess] pidbox received method ping() [reply_
[2018-07-24 06:40:44,064: INFO/Worker-2] Requesting commit details for [u'f5b8531acc1e
[2018-07-24 06:42:28,485: DEBUG/MainProcess] basic.qos: prefetch_count->7
[2018-07-24 06:43:02,514: DEBUG/MainProcess] basic.qos: prefetch_count->6
[2018-07-24 06:44:12,577: WARNING/
[2018-07-24 06:44:58,615: DEBUG/MainProcess] basic.qos: prefetch_count->5
[2018-07-24 06:45:06,699: DEBUG/MainProcess] pidbox received method ping() [reply_
[2018-07-24 06:46:12,681: DEBUG/MainProcess] basic.qos: prefetch_count->4
[2018-07-24 06:48:00,775: DEBUG/MainProcess] basic.qos: prefetch_count->3
[2018-07-24 06:50:06,999: DEBUG/MainProcess] pidbox received method ping() [reply_
[2018-07-24 06:55:06,498: DEBUG/MainProcess] pidbox received method ping() [reply_
... and then nothing else for several hours. The libreoffice repository has been fixed, but it's not obvious why (apparently) the timeout stalled celery.
Related branches
- William Grant: Approve (code)
-
Diff: 47 lines (+20/-9)1 file modifiedlib/lp/services/timeout.py (+20/-9)
tags: | added: git lp-code regression |
Changed in launchpad: | |
status: | New → In Progress |
importance: | Undecided → Critical |
assignee: | nobody → Colin Watson (cjwatson) |
tags: |
added: qa-ok removed: qa-needstesting |
Changed in launchpad: | |
status: | Fix Committed → Triaged |
tags: | removed: qa-ok |
This happened again the very next time there was a soft timeout on a repository scan, so there's obviously something repeatably wrong here.