bzr gui tools are too slow with old/large repositories

Bug #197429 reported by Kostja Osipov
6
Affects Status Importance Assigned to Milestone
Bazaar
Incomplete
Critical
John A Meinel
Bazaar GTK+ Frontends
New
Undecided
Unassigned
MySQL Server
Invalid
Undecided
Unassigned

Bug Description

I have a repository with a large number of revisions and experience significant delays in common operations:

1) bzr branch on my box takes ~10 minutes. This is not a big deal since I can work it around
by using cp -a
2) bzr visualise, bzr gannotate, bzr qblame take at least a few minutes to start, and when started
are not very responsive

Having acceptable speed of these is very important, since we research in our extensive history
every day to find out who wrote a line of code and why, trace regressions, browse old changeset
comments.

I am using bzr 1.2.0 and the latest versions of GUI tools:
kostja@dipika:~$ bzr --version
Bazaar (bzr) 1.2.0
  Python interpreter: /usr/bin/python 2.5.1.final.0
  Python standard library: /usr/lib/python2.5
  bzrlib: /usr/lib/python2.5/site-packages/bzrlib
  Bazaar configuration: /home/kostja/.bazaar
  Bazaar log file: /home/kostja/.bzr.log

Copyright 2005, 2006, 2007, 2008 Canonical Ltd.
http://bazaar-vcs.org/

bzr comes with ABSOLUTELY NO WARRANTY. bzr is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.

kostja@dipika:~$ uname -a
Linux dipika 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

kostja@dipika:~$ cat /etc/issue
Ubuntu 7.10 \n \l

kostja@dipika:~/mysql-packs$ time bzr info --verbose
Repository branch (format: pack-0.92)
Location:
  shared repository: .
  repository branch: .

Related branches:
  parent branch: mysql-5.1

Format:
       control: Meta directory format 1
        branch: Branch format 6
    repository: Packs containing knits without subtree support

Branch history:
      2624 revisions
       871 committers
      2770 days old
   first revision: Mon 2000-07-31 21:10:05 +0200
  latest revision: Wed 2008-01-16 01:17:05 +0300

Repository:
     53168 revisions
    637344 KiB
bzr info --verbose 42.46s user 1.13s system 99% cpu 43.677 total

James Westby (james-w)
Changed in mysql:
status: New → Invalid
Martin Pool (mbp)
Changed in bzr:
importance: Undecided → High
Martin Pool (mbp)
Changed in bzr:
assignee: nobody → mbp
Revision history for this message
Elliot Murphy (statik) wrote :

Hi Kostja!

Are you using a shared repository when creating a new branch?

Revision history for this message
Kostja Osipov (kostja) wrote : Re: [Bug 197429] Re: bzr gui tools are too slow with old/large repositories

* Elliot Murphy <email address hidden> [08/03/04 19:10]:
> Are you using a shared repository when creating a new branch?

How do I find out?

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
John A Meinel (jameinel) wrote :

bzr info

in the target directory should include something like:
Repository branch (format: pack-0.92)
Location:
  shared repository: .
  repository branch: .

I'm a little surprised that these are "." as it means your shared repository is in the same location as your branch (and both are at ~/mysql-packs) rather than in subdirectories.

Revision history for this message
Kostja Osipov (kostja) wrote :

* Konstantin Osipov <email address hidden> [08/03/04 19:38]:
> * Elliot Murphy <email address hidden> [08/03/04 19:10]:
> > Are you using a shared repository when creating a new branch?

I'm using mysql-packs.tar.gz that were distributed in Orlando.

I do bk branch <tree name> mysql-packs/mysql-5.1

Then I work with the branch.

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
Martin Pool (mbp) wrote :

> How do I find out?

Run "bzr info" in your branch directory. You should see something like this:

mbp@lithe% bzr info ~/bzr/trunk
Repository tree (format: pack-0.92)
Location:
  shared repository: /home/mbp/bzr
  repository branch: .

If you do not see the 'shared repository' line, then bzr is having to filter and copy all the content when you branch.

The workflow we suggest is eg

  bzr init ~/mysql
  bzr branch http://......./ ~/mysql/trunk
  cd ~/mysql
  bzr branch trunk my-feature

Then the storage is shared and branch should be much faster.

Changed in bzr:
importance: High → Critical
Revision history for this message
Martin Pool (mbp) wrote :

So, from your bzr info report, it looks like you do not have a shared repository. (The message is a bit unclear, see bug 198425).

> I do bk branch <tree name> mysql-packs/mysql-5.1

Can you run 'bzr info mysql-packs'? I suspect there is not actually a repository there, or for some reason it is not used.

Revision history for this message
Kostja Osipov (kostja) wrote :

* John A Meinel <email address hidden> [08/03/04 20:25]:
> bzr info
>
> in the target directory should include something like:
> Repository branch (format: pack-0.92)
> Location:
> shared repository: .
> repository branch: .
>
> I'm a little surprised that these are "." as it means your shared
> repository is in the same location as your branch (and both are at
> ~/mysql-packs) rather than in subdirectories.

kostja@dipika:~/mysql-5.1-bzr$ bzr info
Standalone tree (format: pack-0.92)
Location:
  branch root: .

Related branches:
  parent branch: /home/kostja/mysql-packs/mysql-5.1

kostja@dipika:~/mysql-packs$ bzr info
Repository branch (format: pack-0.92)
Location:
  shared repository: .
  repository branch: .

Related branches:
  parent branch: mysql-5.1

kostja@dipika:~/mysql-packs/mysql-5.1$ bzr info
Repository branch (format: pack-0.92)
Location:
  shared repository: /home/kostja/mysql-packs
  repository branch: .

Related branches:
  parent branch: /home/kostja/snapshot/mysql-5.1

kostja@dipika:~/mysql-packs/mysql-6.0$ bzr info
Repository branch (format: pack-0.92)
Location:
  shared repository: /home/kostja/mysql-packs
  repository branch: .

Related branches:
  parent branch: /home/kostja/snapshot/mysql-6.0

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
Kostja Osipov (kostja) wrote :

* Martin Pool <email address hidden> [08/03/04 20:25]:
> mbp@lithe% bzr info ~/bzr/trunk
> Repository tree (format: pack-0.92)
> Location:
> shared repository: /home/mbp/bzr
> repository branch: .
>
> If you do not see the 'shared repository' line, then bzr is having to
> filter and copy all the content when you branch.
>
> The workflow we suggest is eg

bzr branch is not really an issue -- branching can be done in the
background, a branch is established once a day.

What matters is history research speed, with bzr
visualise/gannotate, which are used more often.

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
James Westby (james-w) wrote :

On Tue, 2008-03-04 at 16:58 +0000, Martin Pool wrote:
> The workflow we suggest is eg
>
> bzr init ~/mysql
> bzr branch http://......./ ~/mysql/trunk
> cd ~/mysql
> bzr branch trunk my-feature
>
> Then the storage is shared and branch should be much faster.

I believe the first line should read

  bzr init-repo ~/mysql

Thanks,

James

Revision history for this message
Kostja Osipov (kostja) wrote :

* Martin Pool <email address hidden> [08/03/04 20:56]:
> > I do bk branch <tree name> mysql-packs/mysql-5.1

kostja@dipika:~$ bzr info mysql-packs
Repository branch (format: pack-0.92)
Location:
  shared repository: mysql-packs
  repository branch: mysql-packs

Related branches:
  parent branch: mysql-packs/mysql-5.1

You should have the same mysql-packs.tar.gz that I have, shouldn't you?

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
Elliot Murphy (statik) wrote :

aha! Kostja, there is a big performance fix with annotate, which would aiso affect gannotate. This fix will be in the 1.3 release, and is in the dev tree already. I believe it dropped annotate time from 11 minutes to under a minute. Can you try annotate or gannotate using bzr.dev and let us know what kind of performance change you see?

Revision history for this message
Martin Pool (mbp) wrote :

I've branched from http://bazaar.launchpad.net/%7Estatik/%2Bjunk/mysql-5.0/ into a shared repository, and ran bzr viz, and gannotate on a few files including sql/mysqld.cc.

For me (1.4GHz Core2 Duo), bzr 1.2.0 the gui commands generally come up in a few seconds. mysqld.cc, which I understand has been heavily edited, comes up in about 30s. (It would be good to be faster than that, but it's not as bad as the original report.)

So I think there is some confounding factor. Is the machine swapping perhaps?

Changed in bzr:
status: New → Incomplete
Revision history for this message
Kostja Osipov (kostja) wrote :

* Martin Pool <email address hidden> [08/03/05 11:09]:
> I've branched from
> http://bazaar.launchpad.net/%7Estatik/%2Bjunk/mysql-5.0/ into a shared
> repository, and ran bzr viz, and gannotate on a few files including
> sql/mysqld.cc.
> For me (1.4GHz Core2 Duo), bzr 1.2.0 the gui commands generally come up
> in a few seconds. mysqld.cc, which I understand has been heavily
> edited, comes up in about 30s. (It would be good to be faster than
> that, but it's not as bad as the original report.)
>
> So I think there is some confounding factor. Is the machine swapping
> perhaps?

No, the load is cpu-bound, uses one core only.

I used 5.1 tree, which has more changesets than 5.0 tree.

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
John A Meinel (jameinel) wrote :

With the 5.1 tree on my laptop sql/mysqld.cc takes 41s, and I have a patch which drops it down to 20s (which should end up in 1.3). There is also a small patch for gannotate which should be in the next release as well.

As a point of curiosity, are you using these over a remote X connection? I've heard statements that GTK over remote X is rather slow, even on the local network.

Revision history for this message
Kostja Osipov (kostja) wrote :

* John A Meinel <email address hidden> [08/03/06 11:11]:
> With the 5.1 tree on my laptop sql/mysqld.cc takes 41s, and I have a
> patch which drops it down to 20s (which should end up in 1.3). There is
> also a small patch for gannotate which should be in the next release as
> well.
>
> As a point of curiosity, are you using these over a remote X connection?
> I've heard statements that GTK over remote X is rather slow, even on the
> local network.

No, I'm not.

1.3 seconds is a very reasonable time.

However, taking into account that the number of changes to
mysqld.cc and parse.cc will only grow, the faster you can get it the
better. I.e. best if it can be made two more orders of magnitude
faster -- then our future with bzr is safe.

--
-- Konstantin Osipov Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com The best DATABASE COMPANY in the GALAXY

Revision history for this message
Kostja Osipov (kostja) wrote :

* Elliot Murphy <email address hidden> [08/03/05 11:09]:
> aha! Kostja, there is a big performance fix with annotate, which would
> aiso affect gannotate. This fix will be in the 1.3 release, and is in
> the dev tree already. I believe it dropped annotate time from 11 minutes
> to under a minute. Can you try annotate or gannotate using bzr.dev and
> let us know what kind of performance change you see?

OK, I got my hands on bzr 1.3 and indeed got a substantial speed
increase with bzr visualise.

bzr gannotate is still taking a lot of time, 380 seconds (98% cpu)
for sql_parse.cc, 34 seconds for sql_insert.cc (93% CPU).

The speed is still below speed of BK for the same operations, so
further improvements are very much needed.

It would be nice to get some more training on bazaar, since I
don't yet see how I can get my daily tasks done with bzr,
partly because some tools are slow, partly because I don't know
how to use them, partly because some features may be missing.

kostja@dipika:~$ bzr13 --version
Bazaar (bzr) 1.3.0.dev.0
  from bzr checkout /home/kostja/bzr.dev
    revision: 3272
    revid: <email address hidden>
    branch nick: bzr.dev
  Python interpreter: /usr/bin/python 2.5.1.final.0
  Python standard library: /usr/lib/python2.5
  bzrlib: /home/kostja/bzr.dev/bzrlib
  Bazaar configuration: /home/kostja/.bazaar
  Bazaar log file: /home/kostja/.bzr.log

Copyright 2005, 2006, 2007, 2008 Canonical Ltd.
http://bazaar-vcs.org/

bzr comes with ABSOLUTELY NO WARRANTY. bzr is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.

Branched with:

kostja@dipika:~$ bzr branch http://bazaar-vcs.org/bzr/bzr.dev bzr.dev

--
Konstantin

Revision history for this message
Elliot Murphy (statik) wrote :

thanks for the update! we're chatting on IRC in more detail about training options now.

Revision history for this message
John A Meinel (jameinel) wrote :

Just to post my numbers...

In the 5.1 branch I get:

bzr.dev
% time bzr annotate --show-ids sql/sql_parse.cc >/dev/null
34.89s user 0.32s system 97% cpu 36.283 total
% time bzr annotate --show-ids sql/sql_insert.cc >/dev/null
4.70s user 0.10s system 93% cpu 5.136 total

With the branch associated with this bug, I get:
% time ~/dev/bzr/1.3-dev/annotate_cleanup/bzr annotate --show-ids sql/sql_parse.cc >/dev/null
11.56s user 0.13s system 99% cpu 11.739 total
% time ~/dev/bzr/1.3-dev/annotate_cleanup/bzr annotate --show-ids sql/sql_insert.cc >/dev/null
2.12s user 0.07s system 85% cpu 2.577 total

Now, 'gannotate' is doing a little bit more work, but mostly I wanted to point out the relative improvements. Specifically, 35s => 11.5s, and 4.7s => 2.1s.
So somewhere from 2-3x faster.

I'm also working on a proposal to start building annotation caches, which could easily get these numbers to drop below 1s. (The current algorithm has to evaluate all of the revisions of a file, which obviously gets slower the more commits you make. With a cache at appropriate times, you can limit the amount of history you need to inspect.)

Including a cache is going to take a bit more work. There is always a "quick-and-dirty" solution which would only work locally. But ideally you'd like to be able to transmit the cache so that everyone doesn't have to build it. Which starts to bring up efficiency issues during transmission.

I'll also be posting a couple of patches to bzr-gtk which will make jumping around in history faster. Specifically, it is much easier to grab an in-memory cache of the intermediate revisions.

Oh, I should also mention that the associated branch decreases memory consumption dramatically. On mysqld.cc it dropped it from 250MB => 50MB. I don't know how much RAM you have in your machine, but getting close to swap / running out of room for disk buffers would certainly impact performance dramatically.

Revision history for this message
Kostja Osipov (kostja) wrote :

* John A Meinel <email address hidden> [08/03/13 20:35]:
> Oh, I should also mention that the associated branch decreases memory
> consumption dramatically. On mysqld.cc it dropped it from 250MB => 50MB.
> I don't know how much RAM you have in your machine, but getting close to
> swap / running out of room for disk buffers would certainly impact
> performance dramatically.

I have 1G and have not seen the machine swapping when running bzr.

--
Konstantin

Revision history for this message
Martin Pool (mbp) wrote :

It looks like the remaining portion of this is now a duplicate of bug 153787.

Martin Pool (mbp)
Changed in bzr:
assignee: mbp → jameinel
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.