pull uses set_parent_trees

Bug #282941 reported by Robert Collins
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
Unassigned

Bug Description

dirstate has a fast api for updating a basis tree -
update_basis_by_delta. pull() should use this when possible, as it is
substantially faster on large trees. (It's only possible to use this for
pull when the tree has only one parent; the common case)

 affects bzr
 importance high
 tags performance
 status triaged
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Martin Pool (mbp)
Changed in bzr:
status: Triaged → Confirmed
Jelmer Vernooij (jelmer)
tags: added: performance pull
Revision history for this message
John A Meinel (jameinel) wrote :

I'm poking around on this at the moment with the idea that we can actually change "set_parent_trees([tree])" to call self.update_by_basis_delta instead. That way callers don't have to think too much about it, and it will benefit both uncommit (bug #780552) and pull .

Revision history for this message
John A Meinel (jameinel) wrote :

The change I'm doing does end up doing redundant work. Namely it computes the delta 2 times. The first time to figure out what to apply to the tree, and the second time to determine how to update the dirstate. So it should be changed to compute the delta, cache it, and then pass it to the WT.update_basis_by_delta().
So I'm going to leave this bug open for the short term.

Note also bugs like bug #780677 which make pull slower than it needs to be.

Revision history for this message
John A Meinel (jameinel) wrote :

So 'bzr pull' still uses set_parent_trees, only now that internally does "update_basis_by_delta" when there is only a single target tree.

I'm leaving this bug open, though, because there is still a little bit of double-handling. Specifically, Merge calls iter_changes and then processes it. And then we call WT.update_basis_by_delta(RevisionTree.inventory._make_delta(RevisionTree.inventory)), which is another iter_changes call.

Arguably what we would want is some way to get the iter_changes from the Merger object, and pass that through, rather than do 2 iter_changes.

The other bit is that we could at least re-use the basis_tree and revision_tree that "pull" is passing to Merger. In theory those trees have already loaded all the inventory pages that are interesting, rather than re-opening the repository.

However, in a tree with 70k entries and 10 changes in a given commit, by far the bulk of the time spent is re-writing the dirstate file. So until we make that faster (journal updates?), the overhead of doing 2 iter_changes calls doesn't really matter.

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
no longer affects: brz
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.