Previously bup would ignore MAX_PER_TREE whenever it hit a long run of
data that didn't produce a non-zero hashplit "level" (see the
discussion of fanout in DESIGN). That can happen, for example, when
traversing a file containing large zero regions (sparse or not).
As a result, bup could produce an arbitrarily large number of blobs at
level 0 of the hashsplit tree, causing it to consume increasing memory
during split/save, and to behave badly during join/restore.
To fix that, don't try to outsmart _squish() -- just call it every
time, and let it enforce MAX_PER_TREE when appropriate.
Thanks to trebor <robert.rebstock@tempelhof-projekt.de> for reporting
the problem a long while back, Yung-Chin Oei <yungchin@yungchin.nl>
for tracking down the cause and proposing a slightly different fix,
and Aidan Hobson Sayers <aidanphs@gmail.com> for suggesting this
particular approach.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
stacks = [[]]
for (sha,size,level) in sl:
stacks[0].append((GIT_MODE_FILE, sha, size))
- if level:
- _squish(maketree, stacks, level)
+ _squish(maketree, stacks, level)
#log('stacks: %r\n' % [len(i) for i in stacks])
_squish(maketree, stacks, len(stacks)-1)
#log('stacks: %r\n' % [len(i) for i in stacks])