]> arthur.barton.de Git - bup.git/commitdiff
Enforce MAX_PER_TREE by always _squish()ing in spit_to_shalist().
authorRob Browning <rlb@defaultvalue.org>
Wed, 20 Nov 2013 20:57:53 +0000 (14:57 -0600)
committerRob Browning <rlb@defaultvalue.org>
Sat, 23 Nov 2013 19:12:45 +0000 (13:12 -0600)
Previously bup would ignore MAX_PER_TREE whenever it hit a long run of
data that didn't produce a non-zero hashplit "level" (see the
discussion of fanout in DESIGN).  That can happen, for example, when
traversing a file containing large zero regions (sparse or not).

As a result, bup could produce an arbitrarily large number of blobs at
level 0 of the hashsplit tree, causing it to consume increasing memory
during split/save, and to behave badly during join/restore.

To fix that, don't try to outsmart _squish() -- just call it every
time, and let it enforce MAX_PER_TREE when appropriate.

Thanks to trebor <robert.rebstock@tempelhof-projekt.de> for reporting
the problem a long while back, Yung-Chin Oei <yungchin@yungchin.nl>
for tracking down the cause and proposing a slightly different fix,
and Aidan Hobson Sayers <aidanphs@gmail.com> for suggesting this
particular approach.

Signed-off-by: Rob Browning <rlb@defaultvalue.org>
lib/bup/hashsplit.py

index 2c4ec3a286cd0b37b6a1b6371a64075f5195ceda..3dffbbe857874c3142d054d3b3cd15d9bd22c4d9 100644 (file)
@@ -159,8 +159,7 @@ def split_to_shalist(makeblob, maketree, files,
         stacks = [[]]
         for (sha,size,level) in sl:
             stacks[0].append((GIT_MODE_FILE, sha, size))
-            if level:
-                _squish(maketree, stacks, level)
+            _squish(maketree, stacks, level)
         #log('stacks: %r\n' % [len(i) for i in stacks])
         _squish(maketree, stacks, len(stacks)-1)
         #log('stacks: %r\n' % [len(i) for i in stacks])