BranchMergeProposal:+index timeout

Bug #615647 reported by Robert Collins
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Robert Collins

Bug Description

OOPS-1868C346 - https://code.launchpad.net/%7Ebryce/launchpad/lp-617698-forwarding/+merge/41003/+index
OOPS-1868C344
OOPS-1681EC3984
OOPS-1899K170

BranchMergeProposal:+index is timing out.

51. 1904 11559ms SQL-launchpad-main-master

SELECT "_13637".branch,
       "_13637".revision,
       "_13637".sequence
FROM BranchRevision AS "_13637"
LEFT JOIN BranchRevision AS "_13638" ON "_13638".branch = 24637
AND "_13638".revision = "_13637".revision
WHERE "_13637".branch = 446546
  AND "_13637".sequence IS NOT NULL
  AND "_13638".branch IS NULL
ORDER BY "_13637".sequence DESC LIMIT 10

 with source AS (SELECT branchrevision.branch,
       branchrevision.revision,
       branchrevision.sequence
 FROM BranchRevision
where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
order by branchrevision.branch desc, branchrevision.sequence desc limit 10)
SELECT source.branch, source.revision, source.sequence from source where
source.revision not in (select revision from BranchRevision AS target where target.branch = 24637 and source.revision=target.revision)
 ORDER BY source.branch desc, source.sequence DESC LIMIT 10;

has a substantially cheaper plan

Related branches

Revision history for this message
Stuart Bishop (stub) wrote :

I call contention. There are over half a billion rows in the table. It gets its new rows in large chunks, often 10k records at a time. Still, 14 seconds is silly.

Assuming contention the BranchRevision table isn't disappearing in the near future, I think this needs to be addressed in the branch scanner with more frequent (or even constant) commits.

There is no link to the actual OOPS here, so flagging incomplete.

Changed in launchpad-code:
status: Triaged → Incomplete
Revision history for this message
Robert Collins (lifeless) wrote :

OOPS-1681EC3984 - added to description too for easy access

Changed in launchpad-code:
status: Incomplete → Triaged
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

https://devpad.canonical.com/~stub/ppr/lpnet/latest-daily-timeout-candidates.html

From the PPR:
hits total-time 99% mean stddev variance median
335 1219.31 65.23 3.64 20.53 421.53 1.08

So it certainly does look spiky, and around 1% are going past our timeout cap.

Revision history for this message
Stuart Bishop (stub) wrote :

What happened to the branch that was dropping BranchRevision.id?

Revision history for this message
Stuart Bishop (stub) wrote :

One thing that will certainly help is using a slave DB. Given we already need to wait for the branch scanner and diff generator to process things, pulling things from the master is pointless.

Revision history for this message
Tim Penhey (thumper) wrote : Re: [Bug 615647] Re: BranchMergeProposal:+index timeout

On Tue, 24 Aug 2010 17:14:34 you wrote:
> What happened to the branch that was dropping BranchRevision.id?

wasn't me...

I want to drop branchrevision in its current incarnation.

Revision history for this message
Robert Collins (lifeless) wrote :

Another instance: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1722EB303

1 13012.0 1 SQL-launchpad-main-slave SELECT "_2".branch, "_2".id, "_2".revision, "_2".sequence FROM BranchRevision AS "_2" LEFT JOIN BranchRevision AS "_3" ON "_3".revision = "_2".revision AND "_3".branch = %s WHERE "_2".branch = %s AND "_2".sequence IS NOT NULL AND "_3".id IS NULL ORDER BY "_2".sequence DESC LIMIT 10

Branch: launchpad-rev-11571
Revno: 11571
SQL time: 14267 ms
Non-sql time: 1065 ms
Total time: 15332 ms
Statement Count: 97

MVCC isn't meant to get contention on reads though; and this from a slave so only slony doing writes...

Revision history for this message
Stuart Bishop (stub) wrote :

Under PG 8.4, the following simpler variant of the slow query is twice as fast.

SELECT *
FROM BranchRevision
WHERE
    branch = 374617
    AND sequence IS NOT NULL
    AND revision NOT IN (
        SELECT revision FROM BranchRevision WHERE branch=24637)
ORDER BY sequence DESC LIMIT 10;

I have been unable to run the slow variant in over 1 second though. We should collect an OOPS to confirm if we are still seeing the issue under 8.4 or if the problem has been solved by the upgrade.

tags: added: pg83
Changed in launchpad:
importance: High → Critical
Revision history for this message
Stuart Bishop (stub) wrote :

This looks fixed. I can't find any modern OOPSes.

tags: removed: dba
Revision history for this message
Jonathan Lange (jml) wrote :

I guess we can re-open if it appears again.

Changed in launchpad:
status: Triaged → Fix Released
status: Fix Released → Triaged
Revision history for this message
Jonathan Lange (jml) wrote :

Oops, spoke too soon. There are modern OOPSes in the duplicate bug:
  https://lp-oops.canonical.com/oops.py/?oopsid=1868C346
  https://lp-oops.canonical.com/oops.py/?oopsid=1868C344

description: updated
tags: added: dba
removed: pg83
description: updated
description: updated
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

explain analyze of stubs rephrased query:

                                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=143706.60..143850.01 rows=10 width=16) (actual time=34791.563..34791.563 rows=0 loops=1)
   -> Index Scan Backward using revisionnumber_branch_sequence_unique on branchrevision (cost=143706.60..171212.17 rows=1918 width=16) (actual time=34791.561..34791.561 rows=0 loops=1)
         Index Cond: (branch = 374617)
         Filter: ((sequence IS NOT NULL) AND (NOT (hashed SubPlan 1)))
         SubPlan 1
           -> Index Scan using revisionnumber_branch_id_unique on branchrevision (cost=0.00..143450.41 rows=102478 width=4) (actual time=0.043..34521.567 rows=97221 loops=1)
                 Index Cond: (branch = 24637)
 Total runtime: 34791.814 ms
(8 rows)

I think a with may be better, exploiting structural knowledge:

Revision history for this message
Robert Collins (lifeless) wrote :

 explain analyze with source AS (SELECT branchrevision.branch,
       branchrevision.revision,
       branchrevision.sequence
 FROM BranchRevision
where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
order by branchrevision.branch desc, branchrevision.sequence desc limit 10)
SELECT source.branch, source.revision, source.sequence from source where
not exists (select true from BranchRevision AS target where target.branch = 24637 and source.revision=target.revision)
 ORDER BY source.branch desc, source.sequence DESC LIMIT 10;
                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=807.19..807.19 rows=1 width=12) (actual time=39.953..39.953 rows=0 loops=1)
   CTE source
     -> Limit (cost=0.00..71.56 rows=10 width=12) (actual time=39.738..39.767 rows=10 loops=1)
           -> Index Scan Backward using revisionnumber_branch_sequence_unique on branchrevision (cost=0.00..27457.21 rows=3837 width=12) (actual time=39.736..39.760 rows=10 loops=1)
                 Index Cond: (branch = 374617)
                 Filter: (sequence IS NOT NULL)
   -> Sort (cost=735.63..735.64 rows=1 width=12) (actual time=39.951..39.951 rows=0 loops=1)
         Sort Key: source.branch, source.sequence
         Sort Method: quicksort Memory: 25kB
         -> Nested Loop Anti Join (cost=0.00..735.62 rows=1 width=12) (actual time=39.941..39.941 rows=0 loops=1)
               -> CTE Scan on source (cost=0.00..0.20 rows=10 width=12) (actual time=39.749..39.792 rows=10 loops=1)
               -> Index Scan using revision__revision__branch__key on branchrevision target (cost=0.00..73.53 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=10)
                     Index Cond: ((source.revision = target.revision) AND (target.branch = 24637))
 Total runtime: 40.043 ms
(14 rows)

Time: 42.532 ms

Revision history for this message
Robert Collins (lifeless) wrote :

two more possibilities:

with source AS (SELECT branchrevision.branch,
       branchrevision.revision,
       branchrevision.sequence
 FROM BranchRevision
where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
order by branchrevision.branch desc, branchrevision.sequence desc limit 10),
 target as (SELECT branchrevision.revision from branchrevision where branchrevision.branch=24637 and branchrevision.revision in (select revision from source))
SELECT source.branch, source.revision, source.sequence from source where source.revision not in (select revision from target)
 ORDER BY source.branch desc, source.sequence DESC LIMIT 10;

and

explain analyze with source AS (SELECT branchrevision.branch,
       branchrevision.revision,
       branchrevision.sequence
 FROM BranchRevision
where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
order by branchrevision.branch desc, branchrevision.sequence desc limit 10)
SELECT source.branch, source.revision, source.sequence from source where
source.revision not in (select revision from BranchRevision AS target where target.branch = 24637 and source.revision=target.revision)
 ORDER BY source.branch desc, source.sequence DESC LIMIT 10;

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (5.9 KiB)

Plans :
launchpad_prod_3=# with source AS (SELECT branchrevision.branch,
launchpad_prod_3(# branchrevision.revision,
launchpad_prod_3(# branchrevision.sequence
launchpad_prod_3(# FROM BranchRevision
launchpad_prod_3(# where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
launchpad_prod_3(# order by branchrevision.branch desc, branchrevision.sequence desc limit 10),
launchpad_prod_3-# target as (SELECT branchrevision.revision from branchrevision where branchrevision.branch=24637 and branchrevision.revision in (select revision from source))
launchpad_prod_3-# SELECT source.branch, source.revision, source.sequence from source where source.revision not in (select revision from target)
launchpad_prod_3-# ORDER BY source.branch desc, source.sequence DESC LIMIT 10;
 branch | revision | sequence
--------+----------+----------
(0 rows)

Time: 64.510 ms
launchpad_prod_3=# explain analyze with source AS (SELECT branchrevision.branch,
launchpad_prod_3(# branchrevision.revision,
launchpad_prod_3(# branchrevision.sequence
launchpad_prod_3(# FROM BranchRevision
launchpad_prod_3(# where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
launchpad_prod_3(# order by branchrevision.branch desc, branchrevision.sequence desc limit 10),
launchpad_prod_3-# target as (SELECT branchrevision.revision from branchrevision where branchrevision.branch=24637 and branchrevision.revision in (select revision from source))
launchpad_prod_3-# SELECT source.branch, source.revision, source.sequence from source where source.revision not in (select revision from target)
launchpad_prod_3-# ORDER BY source.branch desc, source.sequence DESC LIMIT 10;
                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=555.33..555.34 rows=5 width=12) (actual time=48.053..48.053 rows=0 loops=1)
   CTE source
     -> Limit (cost=0.00..42.91 rows=10 width=12) (actual time=47.653..47.692 rows=10 loops=1)
           -> Index Scan Backward using revisionnumber_branch_sequence_unique on branchrevision (cost=0.00..92971.87 rows=21665 width=12) (actual time=47.651..47.684 rows=10 loops=1)
                 Index Cond: (branch = 374617)
                 Filter: (sequence IS NOT NULL)
   CTE target
     -> Nested Loop (cost=0.23..511.91 rows=10 width=4) (actual time=0.136..0.288 rows=10 loops=1)
           -> HashAggregate (cost=0.23..0.33 rows=10 width=4) (actual time=0.091..0.099 rows=10 loops=1)
                 -> CTE Scan on source (cost=0.00..0.20 rows=10 width=4) (actual time=0.002..0.060 rows=10 loops=1)
           -> Index Scan using revision__branch__revision__key on branchrevision (cost=0.00..51.15 rows=1 width=4) (actual time=0.015..0.016 rows=1 loops=10)
                 Index Cond: ((public.branchrevision.branch = 24637) AND (public.branchrevision.revision = source.revision))
   -> S...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

explain analyze with source AS (SELECT branchrevision.branch,
       branchrevision.revision,
       branchrevision.sequence
 FROM BranchRevision
where branchrevision.branch = 374617 and branchrevision.sequence is not NULL
order by branchrevision.branch desc, branchrevision.sequence desc limit 10)
SELECT source.branch, source.revision, source.sequence from source where
source.revision not in (select revision from BranchRevision AS target where target.branch = 24637 and source.revision=target.revision)
 ORDER BY source.branch desc, source.sequence DESC LIMIT 10;

appears to be the win

tags: removed: dba
description: updated
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Robert Collins (lifeless)
milestone: none → 11.04
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Robert Collins (lifeless) wrote :

All but one of the urls from the timeout report yesterday rendered flawlessly - some of them in as low as 3 seconds. However, one failed, but as this report is long and a clear improvement has been reached I'm going to close this bug for now, and open a new one for whatever shows up in tomorrows timeout report.

Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.