X-Git-Url: https://arthur.barton.de/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=lib%2Fbup%2Fbloom.py;h=ef896f564027158523594ff71728d51f20c64ee0;hb=7e1f05fe8b8d580d61c6a233ec1440adc248c97c;hp=5444fd5d0a67717e561bb23d92800a36dcfa1ade;hpb=37f43d741eb44ae8f9e36f42770d2bb494999d92;p=bup.git diff --git a/lib/bup/bloom.py b/lib/bup/bloom.py index 5444fd5..ef896f5 100644 --- a/lib/bup/bloom.py +++ b/lib/bup/bloom.py @@ -4,7 +4,7 @@ There are four basic things to consider when building a bloom filter: The size, in bits, of the filter The capacity, in entries, of the filter The probability of a false positive that is tolerable -The number of bits readily available to use for addresing filter bits +The number of bits readily available to use for addressing filter bits There is one major tunable that is not directly related to the above: k: the number of bits set in the filter per entry @@ -72,16 +72,20 @@ reasonable size. Based on these parameters, a combination of k=4 and k=5 provides the behavior that bup needs. As such, I've implemented bloom addressing, adding and checking functions in C for these two values. Because k=5 requires less space -and gives better overall pfalse_positive perofrmance, it is preferred if a +and gives better overall pfalse_positive performance, it is preferred if a table with k=5 can represent the repository. None of this tells us what max_pfalse_positive to choose. Brandon Low 2011-02-04 """ -import sys, os, math, mmap + +import sys, os, math, mmap, struct + from bup import _helpers -from bup.helpers import * +from bup.helpers import (debug1, debug2, log, mmap_read, mmap_readwrite, + mmap_readwrite_private) + BLOOM_VERSION = 2 MAX_BITS_EACH = 32 # Kinda arbitrary, but 4 bytes per entry is pretty big @@ -94,6 +98,9 @@ _total_steps = 0 bloom_contains = _helpers.bloom_contains bloom_add = _helpers.bloom_add +# FIXME: check bloom create() and ShaBloom handling/ownership of "f". +# The ownership semantics should be clarified since the caller needs +# to know who is responsible for closing it. class ShaBloom: """Wrapper which contains data from multiple index files. """