The size, in bits, of the filter
The capacity, in entries, of the filter
The probability of a false positive that is tolerable
-The number of bits readily available to use for addresing filter bits
+The number of bits readily available to use for addressing filter bits
There is one major tunable that is not directly related to the above:
k: the number of bits set in the filter per entry
Based on these parameters, a combination of k=4 and k=5 provides the behavior
that bup needs. As such, I've implemented bloom addressing, adding and
checking functions in C for these two values. Because k=5 requires less space
-and gives better overall pfalse_positive perofrmance, it is preferred if a
+and gives better overall pfalse_positive performance, it is preferred if a
table with k=5 can represent the repository.
None of this tells us what max_pfalse_positive to choose.
Brandon Low <lostlogic@lostlogicx.com> 2011-02-04
"""
-import sys, os, math, mmap
+
+import sys, os, math, mmap, struct
+
from bup import _helpers
-from bup.helpers import *
+from bup.helpers import (debug1, debug2, log, mmap_read, mmap_readwrite,
+ mmap_readwrite_private)
+
BLOOM_VERSION = 2
MAX_BITS_EACH = 32 # Kinda arbitrary, but 4 bytes per entry is pretty big
bloom_contains = _helpers.bloom_contains
bloom_add = _helpers.bloom_add
+# FIXME: check bloom create() and ShaBloom handling/ownership of "f".
+# The ownership semantics should be clarified since the caller needs
+# to know who is responsible for closing it.
class ShaBloom:
"""Wrapper which contains data from multiple index files. """