1 % bup-margin(1) Bup %BUP_VERSION%
2 % Avery Pennarun <apenwarr@gmail.com>
7 bup-margin - figure out your deduplication safety margin
11 bup margin [options...]
15 `bup margin` iterates through all objects in your bup
16 repository, calculating the largest number of prefix bits
17 shared between any two entries. This number, `n`,
18 identifies the longest subset of SHA-1 you could use and still
19 encounter a collision between your object ids.
21 For example, one system that was tested had a collection of
22 11 million objects (70 GB), and `bup margin` returned 45.
23 That means a 46-bit hash would be sufficient to avoid all
24 collisions among that set of objects; each object in that
25 repository could be uniquely identified by its first 46
28 The number of bits needed seems to increase by about 1 or 2
29 for every doubling of the number of objects. Since SHA-1
30 hashes have 160 bits, that leaves 115 bits of margin. Of
31 course, because SHA-1 hashes are essentially random, it's
32 theoretically possible to use many more bits with far fewer
35 If you're paranoid about the possibility of SHA-1
36 collisions, you can monitor your repository by running `bup
37 margin` occasionally to see if you're getting dangerously
43 : Guess the offset into each index file where a
44 particular object will appear, and report the maximum
45 deviation of the correct answer from the guess. This
46 is potentially useful for tuning an interpolation
50 : don't use `.midx` files, use only `.idx` files. This is
51 only really useful when used with `--predict`.
56 Reading indexes: 100.00% (1612581/1612581), done.
58 40 matching prefix bits
59 1.94 bits per doubling
60 120 bits (61.86 doublings) remaining
61 4.19338e+18 times larger is possible
63 Everyone on earth could have 625878182 data sets
64 like yours, all in one repository, and we would
65 expect 1 object collision.
67 $ bup margin --predict
68 PackIdxList: using 1 index.
69 Reading indexes: 100.00% (1612581/1612581), done.
70 915 of 1612581 (0.057%)
75 `bup-midx`(1), `bup-save`(1)
79 Part of the `bup`(1) suite.