bzr update exceeds pypy's maximum recursion depth in gigantic repositories such as MariaDB.

Bug #1232992 reported by David Yingling
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
New
Undecided
Unassigned

Bug Description

I'm using bzr to access MariaDB's source code. MariaDB has a gigantic repository full of its and MySQL's years and years of history with tons of branches and tags. Using bzr with MariaDB's repository is extra slow, because the repository is so gigantic. If you don't believe me just do a check out of its repository:

cd $repo/maria # (ex: ~/repos/maria)
bzr branch lp:maria trunk

(from: https://mariadb.com/kb/en/getting-the-mariadb-source-code/ )

This will use an insane 1.2 gigs or so of ram, and take 30 minutes to an hour depending on how fast your computer is. This was before I switched to using pypy instead of python, so with pypy it will probably run faster, and use less memory.

After checking out MariaDB's repository, cd to it (cd maria/trunk), and then run the command:

bzr update -r tag:mariadb-10.0.4

This command should switch the working copy to the version tagged as MariaDB 10.0.4.

Instead it blows up because it exceeds python's recursion depth (Perl's is 100 levels, so chances are python's is similar.).

Apparently, open() in the file branch.py calls open_branch() in the file bzrdir.py, which in turn calls open() in the file branch.py, which in turn calls open_branch() in the file bzrdir.py. At least this is what the backtrace seems to indicate.

This is unusual, and really cool. I've never seen recursion among two functions calling each other repeatedly. Normally recursion is one function calling itself. But both create loops that can blow up in your face on large inputs such as MariaDB's gigantic repository.

The back trace is pasted below:

$ bzr update -r tag:mariadb-10.0.4

bzr: ERROR: exceptions.RuntimeError: maximum recursion depth exceeded

Traceback (most recent call last):
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/commands.py", line 930, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/commands.py", line 1141, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/commands.py", line 697, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/builtins.py", line 1742, in run
    tree = WorkingTree.open_containing('.')[0]
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/workingtree.py", line 298, in open_containing
    return control.open_workingtree(), relpath
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1095, in open_workingtree
    return format.open(self, _found=True)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/workingtree_4.py", line 1595, in open
    wt = self._open(a_bzrdir, self._open_control_files(a_bzrdir))
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/workingtree_4.py", line 1605, in _open
    branch=a_bzrdir.open_branch(),
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1079, in open_branch
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/branch.py", line 2256, in open
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1079, in open_branch
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/branch.py", line 2256, in open
    possible_transports=possible_transports)

<snip> See .bzr.log attached for full back trace.

    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1079, in open_branch
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/branch.py", line 2256, in open
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1079, in open_branch
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/branch.py", line 2256, in open
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1079, in open_branch
    possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/branch.py", line 2254, in open
    location, possible_transports=possible_transports)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/controldir.py", line 689, in open
    _unsupported=_unsupported)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/controldir.py", line 718, in open_from_transport
    find_format, transport, redirected)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/transport/__init__.py", line 1719, in do_catching_redirections
    return action(transport)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/controldir.py", line 706, in find_format
    probers=probers)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/controldir.py", line 1151, in find_format
    return prober.probe_transport(transport)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/bzrdir.py", line 1275, in probe_transport
    raise errors.NotBranchError(path=transport.base)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/errors.py", line 658, in __init__
    path = urlutils.unescape_for_display(path, 'ascii')
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/urlutils.py", line 711, in unescape_for_display
    path = local_path_from_url(url)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/urlutils.py", line 259, in _posix_local_path_from_url
    url = split_segment_parameters_raw(url)[0]
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/urlutils.py", line 522, in split_segment_parameters_raw
    lurl = strip_trailing_slash(url)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/urlutils.py", line 624, in strip_trailing_slash
    scheme_loc, first_path_slash = _find_scheme_and_separator(url)
  File "/usr/lib64/pypy-1.9/site-packages/bzrlib/urlutils.py", line 167, in _find_scheme_and_separator
    m = _url_scheme_re.match(url)
RuntimeError: maximum recursion depth exceeded

bzr 2.6b2 on python 2.7.2.42 (Linux-2.6.37.6-x86_64-Intel-R-_Core-TM-
    2_Duo_CPU_____E8400__@_3.00GHz-with-slackware-13.37.0)
arguments: ['/usr/bin/bzr', 'update', '-r', 'tag:mariadb-10.0.4']
plugins: __builtins__[unknown], bash_completion[2.6b2],
    changelog_merge[2.6b2], launchpad[2.6b2], netrc_credential_store[2.6b2],
    news_merge[2.6b2], po_merge[2.6b2], weave_fmt[2.6b2]
encoding: 'ascii', fsenc: 'ANSI_X3.4-1968', lang: None

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    including this traceback and a description of the problem.
bzr: warning: some compiled extensions could not be loaded; see <https://answers.launchpad.net/bzr/+faq/703>

I don't know if the fix is to remove the recursion, or just increase the max recursion depth threshold before throwing an exception?

Attached is my bzr log file.

Thanks,
Dave.

Revision history for this message
David Yingling (deeelwy) wrote :
Revision history for this message
David Yingling (deeelwy) wrote :

Apparently this is a real infinite loop bug, because increasing the maximum stack size by adding "sys.setrecursionlimit(10000)" to the bzr python script actually still triggers this bug. My now much larger .bzr.log file is attached.

Also, setting it much higher than 10000 results in a segmentation fault, because the stack overflows into the heap.

My computer is too old to run the binary compiled for Ubuntu on my Slackware setup, so I'm stuck with Cpython or pypy 1.9. I don't have enough RAM to compile pypy myself, because a 64bit build needs 4gig, and I only have 4gigs of total ram. pypy's build requirements are insane.

Thanks,
Dave.

Revision history for this message
David Yingling (deeelwy) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

The actual traceback indicates you have a branch that is pointing at itself. You might try looking at ".bzr/branch/location" information for a location that ends up pointing in a loop.

Revision history for this message
David Yingling (deeelwy) wrote :

That's kinda funny, because it's MariaDB's open source repo. I have made no local commits or changes. The only bzr commands I've run are ones to list tags, and to try to change to a different tag.

The only thing in ".bzr/branch/location" is:

file:///home/dly/Desktop/Code/maria/trunk/

Which is the local path to the repository. Does having this there create a loop? I deleted the file, and that caused me to get an error about the file not existing. Then, I created an empty file with the correct name using touch. I no longer got the error message, but I still got the crazy back trace. So, if there is a loop, it can't be because of the contents of that file, because it crashes with a back trace even when that file is empty. A new .bzr.log file has been attached. The latest stuff will be the last few bzr tags and bzr update runs.

I also created an archive of the MariaDB repo's .bzr directory. It's only 5.3 megabytes. The .bzr/branch/location.old file is the old .bzr/branch/location file you can rename it to that if you want.

Thanks,
Dave.

Revision history for this message
David Yingling (deeelwy) wrote :
Revision history for this message
David Yingling (deeelwy) wrote :
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1232992] Re: bzr update exceeds pypy's maximum recursion depth in gigantic repositories such as MariaDB.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-10-03 8:26, David Yingling wrote:
> That's kinda funny, because it's MariaDB's open source repo. I have
> made no local commits or changes. The only bzr commands I've run
> are ones to list tags, and to try to change to a different tag.
>
> The only thing in ".bzr/branch/location" is:
>
> file:///home/dly/Desktop/Code/maria/trunk/
>
> Which is the local path to the repository. Does having this there
> create a loop? I deleted the file, and that caused me to get an
> error about the file not existing. Then, I created an empty file
> with the correct name using touch. I no longer got the error
> message, but I still got the crazy back trace. So, if there is a
> loop, it can't be because of the contents of that file, because it
> crashes with a back trace even when that file is empty. A new
> .bzr.log file has been attached. The latest stuff will be the last
> few bzr tags and bzr update runs.

"The local path to the repository" means it is pointing at itself. If
you make it an empty file, I'm guessing we interpret "" using the
local directory, and again it points back at itself.

I don't know how you ended up checking out a branch to itself. (though
the freedom with which you edit files under .bzr/ is a bit concerning)
What I would have expected is to just have a regular branch there.

>
> I also created an archive of the MariaDB repo's .bzr directory.
> It's only 5.3 megabytes. The .bzr/branch/location.old file is the
> old .bzr/branch/location file you can rename it to that if you
> want.
>
> Thanks, Dave.
>

So (a) having a .bzr/branch/location file means that you don't have a
local branch, but you have a local working tree pointing at "some
other branch". In this case, that somehow ended up pointed directly
back at itself. So you had a lightweight checkout pointing at a branch
that is pointing at itself and we try to open it, to see that it is a
reference and we open that, etc.

It needs to point "somewhere else". Probably the easiest way to do
that is with "bzr switch --force" which would be something like:

bzr branch $MARIADB /home/dly/Desktop/Code/maria/alt-trunk
cd /home/dly/Desktop/Code/maria/trunk
bzr switch --force home/dly/Desktop/Code/maria/alt-trunk

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlJNNnkACgkQJdeBCYSNAAM07ACgk8SZQgg62LHaWbQuhpLoT6AT
wqoAn0cGjYntHQxEHZaD3yvSvbuCpPYq
=F40Y
-----END PGP SIGNATURE-----

Revision history for this message
David Yingling (deeelwy) wrote :
Download full text (3.6 KiB)

Wow, I can't believe it was a lightweight checkout. It really did take 1.2gigs or so of ram and like 45 minutes or so to download what is basically the same thing in a 45 meg MariaDB release + 20 megs of state for the .bzr directory.

I tried your suggestion, but it failed, because the first bzr branch command also fails with a python "maximum recursion depth exceeded" error message. The back trace is included in the latest .bzr.log file I've attached.
Apparently bzr can't really do anything when its location file is messed up.

So, is the bug fix just an error message for when the location file is messed up? I think an error message would be better than a python exception from an infinite recursion loop.

The other thing I tried was the MariaDB repo that is available for download. It's mentioned on the page https://mariadb.com/kb/en/getting-the-mariadb-source-code/ . You just download and unarchive it. However, it is just a bzr repo with no branches checked out, so you need to do a "bzr branch lp:maria" to get an actual bzr checkout, because most commands don't seem to work when nothing is checked out such as "bzr tags."

When running the bzr branch lp:maria command it seems to connect to lanuch pad, and download some stuff, and then it crashes with a segmentation fault:

dly@betty mariadb2> bzr branch lp:maria/5.5
You have not informed bzr of your Launchpad ID, and you must do this to
write to Launchpad or access private data. See "bzr help launchpad-login".
Segmentation fault -

The "Segmentation fault - " replaces where the network activity status was. A back trace of this error should also be in the attached .bzr.log. So using that repo doesn't work for me either.

I tried reinstalling bzr a few times. Each time I get the "Cannot build extension "bzrlib._annotator_pyx"." error. So, I do the "build_ext --allow-python-fallback" trick to get it to build anyway. I'm starting to doubt my bug is a real bug in bazaar, but instead a bug in my install of bzr, which seems to be messed up. I'm running an older Slackware 13.37 release, which is a few years old, which may have contributed to these issues.

So, could you please download the MariaDB repo as explained on the page https://mariadb.com/kb/en/getting-the-mariadb-source-code/ in the section "Source Tree Tarball." It's really simple just download the file. Remember to make a new directory and cd to it, because unarchiving the archive will create a .bzr directory in the current directory not in a containing directory. Then supposedly running the command "bzr branch lp:maria" will then checkout the latest version. I get a segmentation fault when I do this, but you shouldn't.

Anyway, after doing all that to setup a MariaDB repo just type the command "bzr tags." It will list a whole bunch of tags. Pick one representing a newish MariaDB release. I forget their exact naming--something like: "Maria-5.5" "MariaDB-10.0" or something with a 5.5 or 10.0 version number, which are the latest releases. Then try to checkout that tag using s...

Read more...

Revision history for this message
David Yingling (deeelwy) wrote :
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.