import_package seems slow to detirmine that there are no versions to import
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Distributed Development |
Confirmed
|
Low
|
Unassigned | ||
bzr-builddeb |
Confirmed
|
Low
|
Unassigned |
Bug Description
The import_package script seems to be doing something inefficient when it comes to determining if there are versions that need importing. As an example:
4.158 Time (UTC): 2010-07-21 20:00:26.377955
4.192 creating repository in file://
4.196 finding all versions of libsmbios
17.307 found 33 versions: [PackageToImpor
19.245 ssh implementation is OpenSSH
166.631 These versions are new: []
168.910 Using fetch logic to copy between RemoteRepositor
168.910 fetch up to rev {<email address hidden>}
177.679 creating branch <bzrlib.
177.744 created new branch BzrBranch7(
177.751 trying to create missing lock '/srv/package-
177.751 opening working tree '/srv/package-
178.255 opening working tree '/srv/package-
178.283 Using fetch logic to copy between RemoteRepositor
178.283 fetch up to rev {<email address hidden>}
178.427 Base revid: '<email address hidden>'
It is taking 17s to find the version listing, which isn't terrible. It then takes 166-17 = 149s to determine that all of those revisions are, in fact, present in the branch.
Looking at the code, this might be checking solely on a remote Branch, and using a fresh 'Graph' object for every version that it wants to evaluate. (It does a graph check to make sure that every Version maps to a bzr tag, and that all those tags are present in the ancestry of the Branch tip.)
One possibility is to grab a KnownGraph object, and to cache that between calls.
I wasn't sure about KnownGraph as it grabs the whole ancestry, but if we are going to check all Versions anyway, then it is fine, because we will have to check really old things anyway.
Another option is to just use a CachingParentsP
(This may actually be code in bzr-builddeb, I haven't traced it thoroughly.)
This only really matters if load gets higher than we can actually maintain. James mentioned that load only seems to really be a factor during the 6-month distribution rollouts, where we end up checking all packages.
However, some of that may change if we move off of the hardware we are currently on.