pack index has no topological locality of reference

Bug #165309 reported by Robert Collins
4
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

Pack indices are currently lexographically sorted. They provide
reasonable partial access performance, but we would have many fewer
round trips if they were topologically sorted. A candidate design for
this is:
---
prelude (as current - key count, reference list length, plus the used
hash length etc)
compressed hash table
key data (as currently formatted, possibly with binary lengths rather
than ascii)
---

The compressed hash table will contain a list:
[hash location]*

Where:
hash is the first N bytes of the sha/md5 of the serialised key.
location is a binary location in the file of the key with that hash.
Collisions are handled by repeating the hash with different locations.
(That is, linear chaining).

The hash table is sorted by the hash.
The number of used bytes in the hash table is determined by generating
the entire table and doing a single pass across it to determine how many
bits are needed to balance collisions vs size. (The cost of a collision
is having to read two locations to determine which has the key we want).

the key data would be topologically sorted by one of the key reference
lists - probably the graph one, but possibly the compression one.

Because it's easier to read additional keys after the one we jump to
(because they are internally delimited) we'd probably want children of a
key K to come after K in the file.

This will make a local and remote performance difference.

 affects bzr
 tag packs

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Martin Pool (mbp)
Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
tags: added: performance
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.