Comment 7 for bug 183156

Revision history for this message
John A Meinel (jameinel) wrote :

Note that if we change the loop to:
python -c "
from bzrlib import trace, branch
trace.enable_default_logging()
b = branch.Branch.open('launchpad-2a/devel')
b.lock_read()
maps = []
for vf_name in ['revisions', 'signatures', 'inventories', 'chk_bytes', 'texts']:
    vf = getattr(b.repository, vf_name)
    maps.append(vf.keys())
    trace.debug_memory('after %s' % (vf_name,))
b.unlock()
trace.debug_memory('after unlock')
del maps
trace.debug_memory('after del maps')
"
Internally, .keys() calls iter_all_entries() which does not cache the keys in the btree caches. At that point we have:
after revisions
VmPeak: 25900 kB
VmSize: 25900 kB
VmRSS: 23012 kB
after signatures
VmPeak: 28464 kB
VmSize: 28464 kB
VmRSS: 25560 kB
after inventories
VmPeak: 32080 kB
VmSize: 32080 kB
VmRSS: 29008 kB
after chk_bytes
VmPeak: 101212 kB
VmSize: 95252 kB
VmRSS: 92396 kB
after texts
VmPeak: 113188 kB
VmSize: 109976 kB
VmRSS: 107108 kB
after unlock
VmPeak: 113188 kB
VmSize: 109976 kB
VmRSS: 107108 kB
after del maps
VmPeak: 113188 kB
VmSize: 94596 kB
VmRSS: 91728 kB

If I add this debug loop:
from memory_dump import scanner, _scanner
scanner.dump_all_referenced(open(',,maps_refs.txt', 'wb'), maps)
size = dict.fromkeys('0123', 0)
size['0'] = _scanner.size_of(maps)
for x in maps:
  size['1'] += _scanner.size_of(x)
  for y in gc.get_referents(x):
    size['2'] += _scanner.size_of(y)
    for z in gc.get_referents(y):
      size['3'] += _scanner.size_of(z)
pprint.pprint(size)

I get:

Now if I did it correctly, maps is a list of lists of keys [of strings].
{'0': 64, '1': 15729200, '2': 26713160, '3': 92853917}
135,296,341

So we have 64 bytes allocated to the overall list
15.7M bytes allocated to the lists of keys
26.7M bytes allocated to the tuples
92.9M bytes allocated to strings

Now the strings are, in theory, duplicated and deduped via intern() and my size_of() loop does not check any of that.

If I get rid of the intern() calls by editing _btree_serializer_pyx.pyx to change the call to safe_interned_string_from_size to safe_string_from_size I end up with:
VmPeak: 162088 kB
VmSize: 158876 kB
VmRSS: 155868 kB
{'0': 64, '1': 15729200, '2': 26713160, '3': 92853917}
135,296,341

However, you can see that VmPeak went from 113188 kB to 162088 kB, so the intern() does seem to be helping.

The most concerning to me is that we don't seem to be reclaiming the memory when we are done. Which is strange.