From f3c4f057d98f84f411c436c28c3e50230aed2f45 Mon Sep 17 00:00:00 2001 From: Rob Browning Date: Wed, 20 Nov 2013 14:57:53 -0600 Subject: [PATCH] Enforce MAX_PER_TREE by always _squish()ing in spit_to_shalist(). Previously bup would ignore MAX_PER_TREE whenever it hit a long run of data that didn't produce a non-zero hashplit "level" (see the discussion of fanout in DESIGN). That can happen, for example, when traversing a file containing large zero regions (sparse or not). As a result, bup could produce an arbitrarily large number of blobs at level 0 of the hashsplit tree, causing it to consume increasing memory during split/save, and to behave badly during join/restore. To fix that, don't try to outsmart _squish() -- just call it every time, and let it enforce MAX_PER_TREE when appropriate. Thanks to trebor for reporting the problem a long while back, Yung-Chin Oei for tracking down the cause and proposing a slightly different fix, and Aidan Hobson Sayers for suggesting this particular approach. Signed-off-by: Rob Browning --- lib/bup/hashsplit.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/bup/hashsplit.py b/lib/bup/hashsplit.py index 2c4ec3a..3dffbbe 100644 --- a/lib/bup/hashsplit.py +++ b/lib/bup/hashsplit.py @@ -159,8 +159,7 @@ def split_to_shalist(makeblob, maketree, files, stacks = [[]] for (sha,size,level) in sl: stacks[0].append((GIT_MODE_FILE, sha, size)) - if level: - _squish(maketree, stacks, level) + _squish(maketree, stacks, level) #log('stacks: %r\n' % [len(i) for i in stacks]) _squish(maketree, stacks, len(stacks)-1) #log('stacks: %r\n' % [len(i) for i in stacks]) -- 2.39.2