From: Rob Browning Date: Wed, 20 Nov 2013 20:57:53 +0000 (-0600) Subject: Enforce MAX_PER_TREE by always _squish()ing in spit_to_shalist(). X-Git-Tag: 0.25-rc5~13 X-Git-Url: https://arthur.barton.de/cgi-bin/gitweb.cgi?p=bup.git;a=commitdiff_plain;h=f3c4f057d98f84f411c436c28c3e50230aed2f45 Enforce MAX_PER_TREE by always _squish()ing in spit_to_shalist(). Previously bup would ignore MAX_PER_TREE whenever it hit a long run of data that didn't produce a non-zero hashplit "level" (see the discussion of fanout in DESIGN). That can happen, for example, when traversing a file containing large zero regions (sparse or not). As a result, bup could produce an arbitrarily large number of blobs at level 0 of the hashsplit tree, causing it to consume increasing memory during split/save, and to behave badly during join/restore. To fix that, don't try to outsmart _squish() -- just call it every time, and let it enforce MAX_PER_TREE when appropriate. Thanks to trebor for reporting the problem a long while back, Yung-Chin Oei for tracking down the cause and proposing a slightly different fix, and Aidan Hobson Sayers for suggesting this particular approach. Signed-off-by: Rob Browning --- diff --git a/lib/bup/hashsplit.py b/lib/bup/hashsplit.py index 2c4ec3a..3dffbbe 100644 --- a/lib/bup/hashsplit.py +++ b/lib/bup/hashsplit.py @@ -159,8 +159,7 @@ def split_to_shalist(makeblob, maketree, files, stacks = [[]] for (sha,size,level) in sl: stacks[0].append((GIT_MODE_FILE, sha, size)) - if level: - _squish(maketree, stacks, level) + _squish(maketree, stacks, level) #log('stacks: %r\n' % [len(i) for i in stacks]) _squish(maketree, stacks, len(stacks)-1) #log('stacks: %r\n' % [len(i) for i in stacks])