branches command slow

Bug #197597 reported by Robert Collins
52
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

The bzr branches command over bzr+ssh should use a smart server verb

 affects bzr
 status confirmed
 importance high

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Johan Walles (walles)
tags: added: performance
Revision history for this message
Stefan Monnier (monnier) wrote :

Actually, I don't really see why that would require a smart server verb.
The way I imagine it to work, it should work plenty fast over sftp.
What is it that makes it so slow?
The way I imagine it to work is that it basically does:
for entry in $(ls $url); do
   if [ ! -d $entry -o "." = $entry -o ".." = $entry ]; then
      :
   elif [ -d $entry/.bzr/branch ]; then
      echo $url/$entry
   else
      recurse on $url/$entry
done
So unless you have a lot of (and/or deep) unrelated directories in that area, I can't imagine what would make it take so long. Admitttedly, a smart server verb would avoid a lot of round-tripping, but all the -d test are independent so they shouldn't need to suffer that much from round-trip delays.

Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

I ran "bzr branches" inside the shared repository "C:\code\bzr_repo2" and looked with filemon at what it was doing. This line highlights a couple of potential optimizations:

bzr.exe:4248 OPEN C:\code\bzr_repo2\.bzr\README\.bzr\branch-format PATH NOT FOUND Options: Open Access: All

It probably doesn't need to look for ".bzr" directories within other ".bzr" directories. Looking for a subdirectory of a file doesn't make much sense either.

Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

I did some (rough) benchmarking of "bzr branches" today and it took approx. 15 seconds to return the list of branches. This was for a (treeless) repository with about 100 branches located on a separate machine running the bzr smart server.

Getting a recursive directory listing of the repository took less than 1 second when run locally on that machine. So clearly "bzr branches" has a lot of overhead somewhere.

Also, the majority files/directories in the repository could be skipped as they are already within ".bzr" directories.

Another interesting observation is that when two "bzr branches" are executed concurrently (from different client machines) they both take approx. double the time to execute.

Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

I should add that the server machine was on the same LAN as the client machines so the network latency would have been minimal.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 197597] Re: branches command slow

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gareth White wrote:
> I should add that the server machine was on the same LAN as the client
> machines so the network latency would have been minimal.
>

bzr branches is pretty 'stupid'. It does a listdir and then tries to
open a bzr branch for each entry it sees. I believe it even
screen-scrapes Apache index files and tries to do the same thing. This
is especially bad if you have working trees.

Also, all of the work is done on the client.

I wrote a script which implements a 'local_branches' operation, which is
significantly faster in practice. (Though where I'm using it also has
working trees, so it is extra significant.)

As for not needing to probe underneath .bzr. There are several
'colocated-branches' designs that use something like '.bzr/branches/*'
to hold extra branch definitions. So it isn't entirely true that we
don't need to probe underneath .bzr. We *could* stop probing underneath
.bzr/{checkout,branch,repository}

Note that my custom function does skip all of .bzr because I know I'm
not using anything fancy. It also *only* probes directories that have a
'.bzr' directory. Which means that it would give incorrect results if
you were using "bzr-svn" or "bzr-git" etc. (IIRC 'bzr branches' will
return svn and git branches if you have those plugins installed.)

John
=:->

def find_local_bzr_branches(repo):
    """Walk the filesystem, and find bzr branches.

    This skips over the 'repo.find_branches()' api, because that is sort
of a
    worst-case implementation. (It tries to open every object as a Branch,
    files, dirs, etc.)
    """
    all_branches = []
    root_path = repo.bzrdir.root_transport.local_abspath('.')
    for dir_info, files_info in osutils._walkdirs_utf8(root_path):
        utf8_relpath, dirpath = dir_info
        bzr_index = None
        for idx, (_, utf8_name, kind, _, _) in enumerate(files_info):
            if utf8_name == '.bzr' and kind == 'directory':
                bzr_index = idx
                break
        # For now, we don't recurse into .bzr directories. Note that this
        # behavior has to change based on how 'colocated' branches end up
        # getting implemented
        if bzr_index is not None:
            del files_info[bzr_index]
            try:
                b = branch.Branch.open(dirpath)
            except errors.NotBranchError:
                continue
            all_branches.append(b)
    return all_branches
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktMl0sACgkQJdeBCYSNAAOkwwCfVxPYdcUpJFcKdO5mb4SK8EQ9
nOIAn2yec/iec3DJwMiPWdf+W3b7JBI5
=RH40
-----END PGP SIGNATURE-----

Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

Thanks for the script. I may end up using something similar here to enumerate branches.

I hadn't considered the other cases like svn/git branches or colocated branches - they're not something I'll probably need to worry about for a while but it's good to know anyway!

Revision history for this message
Alexander Belchenko (bialix) wrote :

Speed of local usage could be improved as well if we don't try to open every file as bzrdir.
See some unscientific numbers here: https://lists.ubuntu.com/archives/bazaar/2011q2/072729.html

Jelmer Vernooij (jelmer)
tags: added: hpssnovfs
Changed in bzr:
importance: High → Medium
Jelmer Vernooij (jelmer)
tags: added: hpss-no-vfs
removed: hpssnovfs
Jelmer Vernooij (jelmer)
tags: added: branches
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.