]> arthur.barton.de Git - bup.git/log
bup.git
13 years agoUse debug1() when reporting paths skipped bup-0.24a
Aneurin Price [Wed, 9 Mar 2011 15:36:48 +0000 (15:36 +0000)]
Use debug1() when reporting paths skipped

Skipping paths during indexing is a normal event not indicative of any
problems, so need not be reported in normal operation.

Signed-off-by: Aneurin Price <aneurin.price@gmail.com>
13 years agoSave was using a quoted instead of octal gitmode.
Brandon Low [Mon, 7 Mar 2011 19:17:40 +0000 (11:17 -0800)]
Save was using a quoted instead of octal gitmode.

This bugged in an assert on python 2.7 for me, and I believe was
incorrect but functional behavior.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoVerify permissions in check_repo_or_die()
Gabriel Filion [Thu, 10 Mar 2011 20:41:54 +0000 (12:41 -0800)]
Verify permissions in check_repo_or_die()

Currently, if one doesn't have read or access permission up to
repo('objects/pack'), bup exits with the following error:

error: repo() is not a bup/git repository

(with repo() replaced with the actual path).

This is misleading, since there is possibly really a repository there
but the user can't access it.

Make git.check_repo_or_die() verify that the current user has the
permission to access repo('objects/pack'), and if not, output a
meaningful error message.

As a bonus, we get an error if the bup_dir path is not a directory.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoMerge commit '6f02181' bup-0.24
Avery Pennarun [Sat, 26 Feb 2011 09:00:51 +0000 (01:00 -0800)]
Merge commit '6f02181'

* commit '6f02181':
  helpers: separately determine if stdout and stderr are ttys.
  cmd/newliner: restrict progress lines to the screen width.
  hashsplit: use shorter offset-filenames inside trees.
  Replace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE}
  git.py: rename treeparse to tree_decode() and add tree_encode().
  hashsplit.py: remove PackWriter-specific knowledge.
  cmd/split: fixup progress message, and print -b output incrementally.
  hashsplit.py: convert from 'bits' to 'level' earlier in the sequence.
  hashsplit.py: okay, *really* fix BLOB_MAX.
  hashsplit.py: simplify code and fix BLOB_MAX handling.
  options.py: o.fatal(): print error after, not before, usage message.
  options.py: make --usage just print the usage message.

13 years agomidx/bloom: use progress() and debug1() for non-critical messages
Gabriel Filion [Fri, 25 Feb 2011 16:16:05 +0000 (11:16 -0500)]
midx/bloom: use progress() and debug1() for non-critical messages

Some messages in these two commands indicate progress but are not
filtered out when the command is not run under a tty. This makes bup
return some unwanted messages when run under cron.

Using progress() and debug1() instead should fix that.

(Changed a few from progress() to debug1() by apenwarr.)

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohelpers: separately determine if stdout and stderr are ttys.
Avery Pennarun [Sun, 20 Feb 2011 05:21:45 +0000 (21:21 -0800)]
helpers: separately determine if stdout and stderr are ttys.

Previously we only cared if stderr was a tty (since we use that to determine
if we should print progress() or not).  But we might want to check stdout as
well, for the same reason that gzip does: we should be refusing to write
binary data to a terminal.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/newliner: restrict progress lines to the screen width.
Avery Pennarun [Sun, 20 Feb 2011 04:48:15 +0000 (20:48 -0800)]
cmd/newliner: restrict progress lines to the screen width.

Otherwise \r won't work as expected.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohashsplit: use shorter offset-filenames inside trees.
Avery Pennarun [Sun, 20 Feb 2011 02:48:06 +0000 (18:48 -0800)]
hashsplit: use shorter offset-filenames inside trees.

We previously zero-padded all the filenames (which are hexified versions of
the file offsets) to 16 characters, which corresponds to a maximum file size
that fits into a 64-bit integer.  I realized that there's no reason to
use a fixed padding length; just pad all the entries in a particular tree to
the length of the longest entry (to ensure that sorting
alphabetically is still equivalent to sorting numerically).

This saves a small amount of space in each tree, which is probably
irrelevant given that gzip compression can quite easily compress extra
zeroes.  But it also makes browsing the tree in git look a little prettier.

This is backwards compatible with old versions of vfs.py, since vfs.py has
always just treated the numbers as an ordered set of numbers, and doesn't
care how much zero padding they have.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoReplace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE}
Avery Pennarun [Sun, 20 Feb 2011 02:02:12 +0000 (18:02 -0800)]
Replace 040000 and 0100644 constants with GIT_MODE_{TREE,FILE}

Those constants were scattered in *way* too many places.  While we're there,
fix the inconsistent usage of strings vs. ints when specifying the file
mode; there's no good reason to be passing strings around (except that I
foolishly did that in the original code in version 0.01).

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agogit.py: rename treeparse to tree_decode() and add tree_encode().
Avery Pennarun [Sun, 20 Feb 2011 01:57:48 +0000 (17:57 -0800)]
git.py: rename treeparse to tree_decode() and add tree_encode().

tree_encode() gets most of its functionality from PackWriter.new_tree(),
which is not just a one liner that calls tree_encode().  We will soon want
to be able to calculate tree hashes without actually writing a tree to a
packfile, so let's split out that functionality.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohashsplit.py: remove PackWriter-specific knowledge.
Avery Pennarun [Thu, 17 Feb 2011 12:22:50 +0000 (04:22 -0800)]
hashsplit.py: remove PackWriter-specific knowledge.

Let's use callback functions explicitly instead of passing around special
objects; that makes the dependencies a bit more clear and hopefully opens
the way to some more refactoring for clarity.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/split: fixup progress message, and print -b output incrementally.
Avery Pennarun [Thu, 17 Feb 2011 11:10:23 +0000 (03:10 -0800)]
cmd/split: fixup progress message, and print -b output incrementally.

As a side effect, you can no longer combine -b with -t, -c, or -n.  But that
was kind of a pointless thing to do anyway, because it silently enforced
--fanout=0, which is almost certainly not what you wanted, precisely if you
were using -t, -c, or -n.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohashsplit.py: convert from 'bits' to 'level' earlier in the sequence.
Avery Pennarun [Thu, 17 Feb 2011 10:30:47 +0000 (02:30 -0800)]
hashsplit.py: convert from 'bits' to 'level' earlier in the sequence.

The hierarchy level is a more directly useful measurement than the bit count,
although right now neither is used very heavily.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohashsplit.py: okay, *really* fix BLOB_MAX.
Avery Pennarun [Sun, 20 Feb 2011 04:33:36 +0000 (20:33 -0800)]
hashsplit.py: okay, *really* fix BLOB_MAX.

In some conditions, we were still splitting into blobs larger than BLOB_MAX.
Fix that too.

Unfortunately adding an assertion about it in the 'bup split' main loop
slows things down by a measurable amount, so I can't easily add that to
prevent this from happening by accidenta again in the future.

After implementing this, it looks like 8192 (typical blob size) times two
isn't big enough to prevent this from kicking in in "normal" cases; let's
use 4x instead.  In my test file, we exceed this maximum much less.  (Every
time we exceed BLOB_MAX, it means the bupsplit algorithm isn't working, so
we won't be deduplicating as effectively.  So we want that to be rare.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohashsplit.py: simplify code and fix BLOB_MAX handling.
Avery Pennarun [Thu, 17 Feb 2011 09:56:31 +0000 (01:56 -0800)]
hashsplit.py: simplify code and fix BLOB_MAX handling.

This reduces the number of lines without removing functionality.  I renamed
a few constants to make more sense.

The only functional change is that BLOB_MAX is now an actual maximum instead
of a variable number depending on buf.used().  Previously, it might have
been as large as BLOB_READ_SIZE = 1MB, which is much larger than BLOB_MAX =
16k.  Now BLOB_MAX is actually the max.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: o.fatal(): print error after, not before, usage message.
Avery Pennarun [Sun, 20 Feb 2011 05:37:16 +0000 (21:37 -0800)]
options.py: o.fatal(): print error after, not before, usage message.

git prints the error *before* the usage message, but the more I play with
it, the more I'm annoyed by that behaviour.  The usage message can be pretty
long, and the error gots lost way above the usage message.  The most
important thing *is* the error, so let's print it last.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: make --usage just print the usage message.
Avery Pennarun [Sun, 20 Feb 2011 05:34:51 +0000 (21:34 -0800)]
options.py: make --usage just print the usage message.

This is a relatively common option in other programs, so let's make it work
in case someone tries to use it.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agodoc/import-rsnapshot: small corrections and clarification bup-0.23a
Gabriel Filion [Fri, 18 Feb 2011 19:15:41 +0000 (14:15 -0500)]
doc/import-rsnapshot: small corrections and clarification

There's a typo in the --dry-run option explanation.

The form "[...] or only imports all [...]" is confusing. Turn it around
a little bit so that the quantifiers are associated more easily to the
right portions of the sentence.

Also, add an example for using the backuptarget argument.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agocmd/midx, git.py: all else being equal, delete older midxes first.
Avery Pennarun [Fri, 18 Feb 2011 09:15:47 +0000 (01:15 -0800)]
cmd/midx, git.py: all else being equal, delete older midxes first.

Previous runs of 'bup midx -f' might have created invalid midx files with
exactly the same length as a newer run.  bup's "prune redundant midx" logic
would quasi-randomly choose one or the other to delete (based on
alphabetical order of filenames, basically) and sometimes that would be the
new one, not the old one, so the 'bup midx -f' results never actually kicked
in.

Now if the file sizes are equal we'll use the mtime as a tie breaker; newer
is better.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agot/test.sh: a test for the recently-uncovered midx4 problem.
Avery Pennarun [Fri, 18 Feb 2011 08:17:17 +0000 (00:17 -0800)]
t/test.sh: a test for the recently-uncovered midx4 problem.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_helpers.c: midx4 didn't handle multiple index with the same object.
Avery Pennarun [Fri, 18 Feb 2011 08:44:32 +0000 (00:44 -0800)]
_helpers.c: midx4 didn't handle multiple index with the same object.

It *tried* to handle it, but would end up with a bunch of zero entries at
the end, which prevents .exists() from working correctly in some cases.

In midx2, it made sense to never include the same entry twice, because the
only informatin we had about a particular entry was that it existed.  In
midx4 this is no longer true; we might sometimes want to know *all* the idx
files that contain a particular object (for example, when we implement
expiry later).  So the easiest fix for this bug is to just include multiple
entries when we have them.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/midx: add a --check option.
Avery Pennarun [Fri, 18 Feb 2011 08:11:59 +0000 (00:11 -0800)]
cmd/midx: add a --check option.

Running this on my system does reveal that some objects return
exists()==False on my midx even though they show up during iteration.

Now to actually find and fix it...

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoAdd git.shorten_hash(), printing only the first few bytes of a sha1.
Avery Pennarun [Fri, 18 Feb 2011 07:36:49 +0000 (23:36 -0800)]
Add git.shorten_hash(), printing only the first few bytes of a sha1.

The full name is rarely needed and clutters the output.  Let's try this
instead in a few places.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agotclient.py: add some additional tests that objcache.refresh() is called.
Avery Pennarun [Fri, 18 Feb 2011 05:35:39 +0000 (21:35 -0800)]
tclient.py: add some additional tests that objcache.refresh() is called.

...which it is, so no bugs were fixed here.  Aneurin is sitll exposing a bug
somehow though.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/server: add a debug message saying which object caused a suggestion.
Avery Pennarun [Thu, 17 Feb 2011 12:54:13 +0000 (04:54 -0800)]
cmd/server: add a debug message saying which object caused a suggestion.

Let's use this to try to debug Aneurin's problem (and potentially others).

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/list-idx: a quick tool for searching the contents of idx/midx files.
Avery Pennarun [Thu, 17 Feb 2011 12:50:04 +0000 (04:50 -0800)]
cmd/list-idx: a quick tool for searching the contents of idx/midx files.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoAdd tests around the bloom ruin and check options
Brandon Low [Thu, 17 Feb 2011 04:09:47 +0000 (20:09 -0800)]
Add tests around the bloom ruin and check options

This generally improves our test coverage of bloom filter behavior and
more specifically makes sure that check and ruin do something.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoAdd a bloom --ruin for testing failure cases
Brandon Low [Thu, 17 Feb 2011 06:41:33 +0000 (22:41 -0800)]
Add a bloom --ruin for testing failure cases

This command option ruins a bloom filter by setting all of its bits to
zero.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoOne more constant for header lengths
Brandon Low [Thu, 17 Feb 2011 04:09:45 +0000 (20:09 -0800)]
One more constant for header lengths

I missed bloom header length in the last pass.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoSplit PackMidx from git.py into a new midx.py.
Avery Pennarun [Thu, 17 Feb 2011 02:55:41 +0000 (18:55 -0800)]
Split PackMidx from git.py into a new midx.py.

git.py is definitely too big.  It still is, but this helps a bit.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agobloom.py: move bloom.ShaBloom.create to just bloom.create.
Avery Pennarun [Thu, 17 Feb 2011 02:39:35 +0000 (18:39 -0800)]
bloom.py: move bloom.ShaBloom.create to just bloom.create.

I don't really like class-level functions.  Ideally we'd just move all the
creation stuff into cmd/bloom, but tbloom.py is testing them, so it's not
really worth it.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge branch 'bl/bloomcheck' into ap/cleanups
Avery Pennarun [Thu, 17 Feb 2011 02:34:36 +0000 (18:34 -0800)]
Merge branch 'bl/bloomcheck' into ap/cleanups

* bl/bloomcheck:
  Bail out immediately instead of redownloading .idx
  Add a --check behavior to verify bloom
  Defines/preprocessor lengths > magic numbers

Conflicts:
cmd/bloom-cmd.py

13 years agoMove bloom-related stuff from git.py to a new bloom.py.
Avery Pennarun [Thu, 17 Feb 2011 02:05:18 +0000 (18:05 -0800)]
Move bloom-related stuff from git.py to a new bloom.py.

No other functionality changes other than that cmd/memtest now reports the
number of bloom steps separately from the midx/idx steps.  (This is mostly
so they don't have to share the same global variables, but it's also
interesting information to break out.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/bloom: add a --force option to forget regenerating the bloom.
Avery Pennarun [Thu, 17 Feb 2011 01:32:51 +0000 (17:32 -0800)]
cmd/bloom: add a --force option to forget regenerating the bloom.

This corresponds to midx's --force option.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoUse the new qprogress() function in more places.
Avery Pennarun [Thu, 17 Feb 2011 00:11:26 +0000 (16:11 -0800)]
Use the new qprogress() function in more places.

qprogress() was introduced in the last commit and has smarter default
behaviour that automatically reduces progress output so we don't print too
many messages per second.  Various commands/etc were doing this in various
different ad-hoc ways, but let's centralize it all in one place.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoBail out immediately instead of redownloading .idx
Brandon Low [Thu, 17 Feb 2011 01:46:01 +0000 (17:46 -0800)]
Bail out immediately instead of redownloading .idx

This should make diagnosing / fixing corrupted bloom filters and midx
files easier, and is generally more sane behavior.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoAdd a --check behavior to verify bloom
Brandon Low [Thu, 17 Feb 2011 01:46:02 +0000 (17:46 -0800)]
Add a --check behavior to verify bloom

This new behavior is useful when diagnosing weird behavior, lets a bloom
filter claiming to contain a particular idx be verified against that idx
file.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoDefines/preprocessor lengths > magic numbers
Brandon Low [Thu, 17 Feb 2011 01:46:00 +0000 (17:46 -0800)]
Defines/preprocessor lengths > magic numbers

This just changes some instances of "8", "12" and "20" to use the
equivalent sizeof or #defined constants to make the code more readable.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agocmd/{bloom,midx}: clean up progress messages.
Avery Pennarun [Thu, 17 Feb 2011 00:01:30 +0000 (16:01 -0800)]
cmd/{bloom,midx}: clean up progress messages.

bloom was printing messages more often than necessary on fast computers,
which could overwhelm the stderr output a bit.  Also change to a percentage
+ number of objects, like midx and save do, rather than just printing the
current file number.

And don't print so many lines of output by default: now if bloom isn't
end up doing anything, it doesn't print any output.  And if it does do
something, it prints only one output line per file.

bloom and midx now both print the name of the directory where they're
creating their output files; if you have multiple directories in
.bup/index-cache, it was a little confusing to see them doing
multiple runs for no apparent reason.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/bloom: by default generate bloom filters in *all* index-cache dirs.
Avery Pennarun [Wed, 16 Feb 2011 23:58:14 +0000 (15:58 -0800)]
cmd/bloom: by default generate bloom filters in *all* index-cache dirs.

This matches with 'bup midx -a' and 'bup midx -f' behaviour.  People might
have been thinking they were regenerating bloom filters without actually
doing them all.

Moved some shared code to do this from cmd/midx to git.py.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/newliner: avoid printing a blank line if the final output ended in \r.
Avery Pennarun [Wed, 16 Feb 2011 23:55:24 +0000 (15:55 -0800)]
cmd/newliner: avoid printing a blank line if the final output ended in \r.

If the last output was a progress message, we would blank out the line
(which was correct) but then we'd print a newline, which was wrong.  Only
print the leftover output, followed by a newline, if the last output was
nonempty.

'bup midx' suffered from this.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/index: make the default mode '-u'.
Avery Pennarun [Wed, 16 Feb 2011 23:14:01 +0000 (15:14 -0800)]
cmd/index: make the default mode '-u'.

I always forget the -u option, and it's by far the most common thing to do
with 'bup index'.  So if no mode option is provided, just default to that
one.

While we're here, update the man page and usage message a bit.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_helpers.c: don't cygwin doesn't set any win32 defines.
Avery Pennarun [Wed, 16 Feb 2011 22:56:38 +0000 (14:56 -0800)]
_helpers.c: don't cygwin doesn't set any win32 defines.

...so let's #ifdef for cygwin specifically.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_helpers.c: don'g unpythonize_argv() on win32.
Avery Pennarun [Wed, 16 Feb 2011 03:09:29 +0000 (19:09 -0800)]
_helpers.c: don'g unpythonize_argv() on win32.

Py_GetArgcArgv() doesn't exist on win32 platforms.  Which isn't so bad,
since neither does the 'ps' command, really.

Reported by Aneurin Price.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRemove .c and .o rules, apply flags to csetup.py
Brandon Low [Mon, 14 Feb 2011 19:27:29 +0000 (11:27 -0800)]
Remove .c and .o rules, apply flags to csetup.py

The .c and .o rules were not used and were misleadingly implying that we
were already paying attention to LDFLAGS and CFLAGS.  Instead apply the
flags to csetup.py where they will actually do something.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoFix a valid warning that some compilers warned
Brandon Low [Mon, 14 Feb 2011 23:51:16 +0000 (15:51 -0800)]
Fix a valid warning that some compilers warned

And it's a good idea not to ignore fwrite's return value I suppose.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoMove .idx file writing to C bup-0.23
Brandon Low [Sun, 13 Feb 2011 19:17:06 +0000 (11:17 -0800)]
Move .idx file writing to C

This was a remaining CPU bottleneck in bup-dumb-server mode.  In a quick
test, writing 10 .idx files of 100000 elements on my netbook went from
50s to 4s.  There may be more performance available by adjusting the
definition of the PackWriter.idx object, but list(list(tuple)) isn't
bad.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agomain.py: fix whitespace in the usage string.
Avery Pennarun [Sun, 13 Feb 2011 12:47:18 +0000 (04:47 -0800)]
main.py: fix whitespace in the usage string.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/daemon: FD_CLOEXEC the listener socket and don't leak fd for the connection.
Avery Pennarun [Sun, 13 Feb 2011 12:10:08 +0000 (04:10 -0800)]
cmd/daemon: FD_CLOEXEC the listener socket and don't leak fd for the connection.

Otherwise the listener gets inherited by all the child processes (mostly
harmless) and subprograms run by bup-server inherit an extra fd for the
connection socket (problematic since we want the connection to close as soon
as bup-server closes).

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/daemon: close file descriptors correctly in parent process.
Avery Pennarun [Sun, 13 Feb 2011 10:29:59 +0000 (02:29 -0800)]
cmd/daemon: close file descriptors correctly in parent process.

The client wasn't getting disconnected when the server died, because the
daemon was still hanging on to its copy of the original socket, due to some
misplaced os.dup() calls.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/daemon: use SO_REUSEADDR.
Avery Pennarun [Sun, 13 Feb 2011 10:23:17 +0000 (02:23 -0800)]
cmd/daemon: use SO_REUSEADDR.

Otherwise we can't re-listen on that socket until the TIME_WAIT period ends,
under certain conditions.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/daemon: pass extra options along to 'bup server'.
Avery Pennarun [Sun, 13 Feb 2011 10:18:57 +0000 (02:18 -0800)]
cmd/daemon: pass extra options along to 'bup server'.

Currently 'bup server' doesn't take any options, but that might change
someday.

Also use a '--' to separate the bup mux command from its arguments, so it
doesn't accidentally try to parse them.  This didn't matter before (since
none of the options we were passing along started with a dash) but if the
user provides extra options, it might matter.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/daemon: correctly report socket binding/listening errors.
Avery Pennarun [Sun, 13 Feb 2011 10:17:31 +0000 (02:17 -0800)]
cmd/daemon: correctly report socket binding/listening errors.

We should never, ever throw away the string from an exception, because
that's how people debug problems.  (In this case, my problem was "address
already in use.")

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agomain.py: use execvp() instead of subprocess.Popen() when possible.
Avery Pennarun [Sun, 13 Feb 2011 09:52:51 +0000 (01:52 -0800)]
main.py: use execvp() instead of subprocess.Popen() when possible.

This avoids an extra process showing up in the 'ps' listing if we're not
going to be using bup-newliner anyhow.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_helpers.c: Remove ugly 'python' junk from argv[0] so 'ps' is prettier.
Avery Pennarun [Sun, 13 Feb 2011 09:50:50 +0000 (01:50 -0800)]
_helpers.c: Remove ugly 'python' junk from argv[0] so 'ps' is prettier.

Okay, this is pretty gross.  But the 'ps' output was looking ugly, and
now it doesn't.  We remove the 'python' interpreter string and the expanded
pathname of the command being run, so it now shows as (eg.) "bup-join" instead
of "python /blah/blah/blah/cmd/bup-join".

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/bloom: fix a message pluralization.
Avery Pennarun [Sun, 13 Feb 2011 08:38:48 +0000 (00:38 -0800)]
cmd/bloom: fix a message pluralization.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/join: add a new -o (output filename) option.
Avery Pennarun [Sun, 13 Feb 2011 06:53:50 +0000 (22:53 -0800)]
cmd/join: add a new -o (output filename) option.

This is a helpful way to have it open and write to the given output file.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/ls: fix a typo causing 'bup ls foo/latest' to not work.
Avery Pennarun [Sun, 13 Feb 2011 06:50:34 +0000 (22:50 -0800)]
cmd/ls: fix a typo causing 'bup ls foo/latest' to not work.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/server: add a new 'help' command.
Avery Pennarun [Sun, 13 Feb 2011 05:56:29 +0000 (21:56 -0800)]
cmd/server: add a new 'help' command.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agomidx4: Fix the other side of my previous nasty bug
Brandon Low [Thu, 10 Feb 2011 21:23:36 +0000 (13:23 -0800)]
midx4: Fix the other side of my previous nasty bug

The previous one was a problem with midx4s generated from idx files,
this one is similar but when they are generated from other .midx4 files.

Many thanks to Aneurin Price for putting up with the awful behavior and
prodding at bup and whatnot while I was trying to make this one
disappear under a rug.

Once again, midx4 files generated prior to this patch will want to be
regenerated.  Once again, only smart servers which have objects not on
the client's index cache will be effected, but they sure as hell well be
effected.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agomidx4: Fix name offsets when generated from idx
Brandon Low [Tue, 8 Feb 2011 18:43:22 +0000 (10:43 -0800)]
midx4: Fix name offsets when generated from idx

This was a nasty bug, glad it got found before release.  Only effected
the server's ability to suggest .idxs so far, but would have effected
any attempt to have bup retrieve objects directly too.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoFix a couple of python 2.4 incompatibilities.
Avery Pennarun [Tue, 8 Feb 2011 12:59:54 +0000 (04:59 -0800)]
Fix a couple of python 2.4 incompatibilities.

Thanks to Jimmy Tang for his help testing these since I don't have python
2.4 easily available.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRemove incorrect comment
Brandon Low [Tue, 8 Feb 2011 06:14:45 +0000 (22:14 -0800)]
Remove incorrect comment

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoMerge branch 'bloom'
Avery Pennarun [Tue, 8 Feb 2011 06:16:08 +0000 (22:16 -0800)]
Merge branch 'bloom'

* bloom:
  bloom: avoid kernel disk flushes when we dirty a lot of pages.
  midx4: Properly decide whether to do progress in C
  midx4: Don't use Py_ssize_t, it's not in python2.4
  cmd/bloom: map only one .idx file at a time.
  bloom: Use truncate not writing zeros in create
  bloom: Don't use function pointers in tight loops
  Fix updating of bloom with additional files
  ShaBloom.init(): initialize members before the assert().
  cmd/bloom: actually, always use the same temp filename.
  cmd/bloom: use mkstemp() instead of NamedTemporaryFile().
  midx: Write midx4 in C rather than python
  midx4: midx2 with idx backreferences
  ShaBloom: Add k=4 support for large repositories
  ShaBloom prefilter to detect nonexistant objects
  mmap: Make closing source file optional

13 years agobloom: avoid kernel disk flushes when we dirty a lot of pages.
Avery Pennarun [Tue, 8 Feb 2011 03:09:06 +0000 (19:09 -0800)]
bloom: avoid kernel disk flushes when we dirty a lot of pages.

Based on the number of objects we'll add to the bloom, decide if we want to
mmap() the pages as shared-writable ('immediate' write) or else map them
private-writable for later manual writing back to the file ('delayed'
write).

A bloom table's write access pattern is such that we dirty almost all the
pages after adding very few entries; essentially, we can expect to dirty
about n*k/4096 pages if we add n objects to the bloom with k hashes. But the
table is so big that dirtying *all* the pages often exceeds Linux's default
/proc/sys/vm/dirty_ratio or /proc/sys/vm/dirty_background_ratio,
thus causing it to start flushing the table before we're
finished... even though there's more than enough space to
store the bloom table in RAM.

To work around that behaviour, if we calculate that we'll probably end up
touching the whole table anyway (at least one bit flipped per memory page),
let's use a "private" mmap, which defeats Linux's ability to flush it to
disk.  Then we'll flush it as one big lump during close(), which doesn't
lose any time since we would have had to flush all the pages anyway.

While we're here, let's remove the readwrite=True option to
ShaBloom.create(); nobody's going to create a bloom file that isn't
writable.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agomidx4: Properly decide whether to do progress in C
Brandon Low [Tue, 8 Feb 2011 02:30:04 +0000 (18:30 -0800)]
midx4: Properly decide whether to do progress in C

Basically just gives us a _helpers.istty to go along with helpers.istty
and uses it to decide whether or not to write progress messages from
midx4 generation.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agomidx4: Don't use Py_ssize_t, it's not in python2.4
Brandon Low [Tue, 8 Feb 2011 02:25:44 +0000 (18:25 -0800)]
midx4: Don't use Py_ssize_t, it's not in python2.4

This also uses a slightly more error-checked conversion of input values
to appropriate C structures.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agocmd/bloom: map only one .idx file at a time.
Avery Pennarun [Tue, 8 Feb 2011 01:41:00 +0000 (17:41 -0800)]
cmd/bloom: map only one .idx file at a time.

This massively decreases virtual memory allocation since we only ever need
to look at a single idx at once.

In theory, VM doesn't cost us anything, but on 32-bit systems we can
actually run out of address space if we try to map all the idx files at
once on a very large repo.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agobloom: Use truncate not writing zeros in create
Brandon Low [Mon, 7 Feb 2011 17:08:00 +0000 (09:08 -0800)]
bloom: Use truncate not writing zeros in create

This lets us test more of bloom's code without writing gigabyte(s) of
zeros to disk.  As noted in the NOTE: this works on all of the common
modern unixes that I checked, but may need special handling on other
systems.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agobloom: Don't use function pointers in tight loops
Brandon Low [Mon, 7 Feb 2011 17:07:59 +0000 (09:07 -0800)]
bloom: Don't use function pointers in tight loops

They really just confused the code at this point and may have prevented
GCC from doing some optimization.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoFix updating of bloom with additional files
Brandon Low [Mon, 7 Feb 2011 16:19:04 +0000 (08:19 -0800)]
Fix updating of bloom with additional files

Make bloom add additional .idx files when it's run on a repo with an
existing bloom filter file rather than just regenerating all the time.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoShaBloom.init(): initialize members before the assert().
Avery Pennarun [Mon, 7 Feb 2011 09:25:32 +0000 (01:25 -0800)]
ShaBloom.init(): initialize members before the assert().

Otherwise __del__() throws an exception if the assert triggers, thus hiding
the original problem.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/bloom: actually, always use the same temp filename.
Avery Pennarun [Mon, 7 Feb 2011 09:28:06 +0000 (01:28 -0800)]
cmd/bloom: actually, always use the same temp filename.

There's no reason to use a different temp filename every time, since we're
going to just be overwriting the same output file anyhow.  And if we got
interrupted, we left the temp file lying around.  Let's just always use the
same temp filename, which means if we get interrupted, we'll clean it up
next time.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/bloom: use mkstemp() instead of NamedTemporaryFile().
Avery Pennarun [Mon, 7 Feb 2011 08:55:10 +0000 (00:55 -0800)]
cmd/bloom: use mkstemp() instead of NamedTemporaryFile().

Older versions of python (I tested python 2.5) don't support the
delete=False parameter to NamedTemporaryFile().  In any case, it's not
actually a temporary file since we're not planning to delete it.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agomidx: Write midx4 in C rather than python
Brandon Low [Mon, 7 Feb 2011 06:06:09 +0000 (22:06 -0800)]
midx: Write midx4 in C rather than python

Obviously this is dramatically faster.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agomidx4: midx2 with idx backreferences
Brandon Low [Mon, 7 Feb 2011 06:06:08 +0000 (22:06 -0800)]
midx4: midx2 with idx backreferences

Like midx3, this adds a lookup table of 4 bytes per entry to
reference an entry in the idxnames list.  2 bytes should be plenty, but
disk is cheap and the table will only be referenced when bup server gets
an object that's already in the midx.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoShaBloom: Add k=4 support for large repositories
Brandon Low [Mon, 7 Feb 2011 06:06:07 +0000 (22:06 -0800)]
ShaBloom: Add k=4 support for large repositories

Comments pretty much tell the story, as 3TiB is really not large enough
for a backup system to support, this adds k=4 support to ShaBloom which
lets it hold 100s of TiB without too many negative tradeoffs.  Still
better to use k=5 for smaller repositories, so it switches when the
repository exceeds 3TiB.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoShaBloom prefilter to detect nonexistant objects
Brandon Low [Mon, 7 Feb 2011 06:06:06 +0000 (22:06 -0800)]
ShaBloom prefilter to detect nonexistant objects

This inserts a bloom prefilter ahead of midx for efficient checking of
objects most of which do not exist.  As long as you have enough RAM for
the bloom filter to stay in memory, this saves a lot of time compared to
midx files.  Bloom filter is between 1/5th and 1/20th the size of midx
given the parameters I'm using so far.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agommap: Make closing source file optional
Brandon Low [Mon, 7 Feb 2011 06:06:05 +0000 (22:06 -0800)]
mmap: Make closing source file optional

New index file formats require this behavior (bloom, midx3, etc.)

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoMerge branch 'daemon_msg' of git://github.com/leto/bup bup-0.22a
Avery Pennarun [Mon, 7 Feb 2011 08:47:00 +0000 (00:47 -0800)]
Merge branch 'daemon_msg' of git://github.com/leto/bup

* 'daemon_msg' of git://github.com/leto/bup:
  Make 'bup daemon' print a message at startup regardless of debug level

13 years agooptions.py: update docstrings and detail optspec
Gabriel Filion [Sat, 5 Feb 2011 22:17:47 +0000 (17:17 -0500)]
options.py: update docstrings and detail optspec

The docstring on the Options class currently refers to a man page which
does not exist, and still talks about the now-removed 'exe' parameter.
Update this to be more accurate.

Add a docstring to OptDict.

Finally, the options.py file brings a concept of option spec string. Its
construction should be documented. Since we'd like the options.py file
to be a one-file drop-in so that it can be easily used in other
projects, let's document the option specs in the module's docstring.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agocmd/memtest: don't die if /proc/self/status is the wrong format.
Avery Pennarun [Sat, 5 Feb 2011 01:30:11 +0000 (17:30 -0800)]
cmd/memtest: don't die if /proc/self/status is the wrong format.

Apparently Solaris has /proc/self/status, but it's binary and so our
Linux-centric parser couldn't handle it.  The data we're getting from it is
non-critical, so just ignore the parse error and let the high-level code in
report() deal with it.

Reported by henning mueller, diagnosed by Gabriel Filion.  Thanks guys!

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMake 'bup daemon' print a message at startup regardless of debug level
Jonathan "Duke" Leto [Sat, 5 Feb 2011 00:43:40 +0000 (16:43 -0800)]
Make 'bup daemon' print a message at startup regardless of debug level

13 years agoclient.py: replace a never-used GitError with a ClientError.
Avery Pennarun [Fri, 4 Feb 2011 11:01:33 +0000 (03:01 -0800)]
client.py: replace a never-used GitError with a ClientError.

Nobody ever tried calling that function, so it's really just an assertion
that never triggered.  Which is good, because it was trying to throw an
exception that wasn't available in the current namespace.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoDemuxConn.__init__: you can't assume the *last* 6 bytes are BUPMUX.
Avery Pennarun [Thu, 3 Feb 2011 23:12:52 +0000 (15:12 -0800)]
DemuxConn.__init__: you can't assume the *last* 6 bytes are BUPMUX.

The actual muxed data might arrive immediately after it, and since we're not
buffering that, we have to read one byte at a time.

(Buffering would be more efficient if we expected this to happen frequently,
but it shouldn't.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoDemuxConn.__init__: abort the loop if read() returns EOF.
Avery Pennarun [Thu, 3 Feb 2011 23:07:48 +0000 (15:07 -0800)]
DemuxConn.__init__: abort the loop if read() returns EOF.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agohelpers.py: always use two blank lines between functions/classes.
Avery Pennarun [Thu, 3 Feb 2011 23:04:47 +0000 (15:04 -0800)]
helpers.py: always use two blank lines between functions/classes.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoclient.py: avoid an exception when no new remote packs were generated.
Avery Pennarun [Thu, 3 Feb 2011 10:16:54 +0000 (02:16 -0800)]
client.py: avoid an exception when no new remote packs were generated.

This is probably pretty rare, but it can happen if you needed to download a
remote index, and that index had *all* your objects, so we did end up
writing some objects to the remote server, but it didn't end up generating
any packs.  If that happened, we would try to return the contents of a
nonexistent variable.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoFix documentation for `bup daemon`
Brandon Low [Thu, 3 Feb 2011 03:06:41 +0000 (19:06 -0800)]
Fix documentation for `bup daemon`

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoruntests: Apparently $(wildcard) in make doesn't always sort its output. bup-0.22
Avery Pennarun [Tue, 1 Feb 2011 10:04:53 +0000 (02:04 -0800)]
runtests: Apparently $(wildcard) in make doesn't always sort its output.

This meant that on Solaris, tests would be run in a different order, so that
BUP_MAIN_EXTRA (set in tclient.py) wouldn't be set the same as on Linux.

In this case, we know the wildcard will always match something anyway, so we
might as well just let the shell expand it out rather than asking make to do
it.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/help: earlier path.exedir() change made it not find manpages correctly.
Avery Pennarun [Tue, 1 Feb 2011 09:47:09 +0000 (01:47 -0800)]
cmd/help: earlier path.exedir() change made it not find manpages correctly.

...when the binary wasn't actually installed.  Previously, it would use
sys.argv[0], which was the path to bup-help, but now it uses path.exedir(),
which has the path to bup, which is one directory up.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge branch 'mux'
Avery Pennarun [Tue, 1 Feb 2011 08:22:07 +0000 (00:22 -0800)]
Merge branch 'mux'

* mux:
  If you specified the port number on the command line, convert it to an int.
  Add `bup daemon` command for simple socket server
  Add DemuxConn and `bup mux` for client-server

13 years agoIf you specified the port number on the command line, convert it to an int.
Avery Pennarun [Tue, 1 Feb 2011 06:13:00 +0000 (22:13 -0800)]
If you specified the port number on the command line, convert it to an int.

This gets rid of an exception.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoAdd `bup daemon` command for simple socket server
Brandon Low [Thu, 27 Jan 2011 02:30:21 +0000 (18:30 -0800)]
Add `bup daemon` command for simple socket server

Nothing special here, just listens on a host:port combination and spawns
`bup mux server` instances.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agoAdd DemuxConn and `bup mux` for client-server
Brandon Low [Thu, 27 Jan 2011 02:30:20 +0000 (18:30 -0800)]
Add DemuxConn and `bup mux` for client-server

`bup mux` works with any bup command to multiplex its stdout and stderr
streams over a single stdout stream.

DemuxConn works on the client side to demultiplex stderr and data
streams from a single stream, emulating a simple connection.

For now, these are only used in the case of simple socket bup://
client-server connections, because rsh and local connections don't need
them.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agot/test.sh: Fix a test for 'split' on solaris
Gabriel Filion [Fri, 28 Jan 2011 06:14:08 +0000 (01:14 -0500)]
t/test.sh: Fix a test for 'split' on solaris

When looking at output from a test run on Solaris, one test in the
'split' suite showed up as OK but was actually showing a diff
invocation error.

The -q argument (for quiet) does not exist on the version of diff that
is installed on Solaris. Since wvtest intercepts output from tested
commands, the -q argument is actually not needed. Remove the argument in
order to make the test execute correctly under all operating systems
that were tested thus far.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoGive main.py the a --profile option
Brandon Low [Wed, 26 Jan 2011 06:54:23 +0000 (22:54 -0800)]
Give main.py the a --profile option

This is just a convenience for anyone who is interested in seeing where
CPU seconds are going.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>
13 years agooptions.py: generate usage string correctly for no-* options.
Avery Pennarun [Wed, 26 Jan 2011 05:14:35 +0000 (21:14 -0800)]
options.py: generate usage string correctly for no-* options.

(copied from the sshuttle project)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: don't die if tty width is set to 0.
Avery Pennarun [Sun, 23 Jan 2011 00:42:32 +0000 (16:42 -0800)]
options.py: don't die if tty width is set to 0.

This sometimes happens if weird people, such as myself, open a pty without
setting the width field correctly.

(copied from the sshuttle project)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoCombine and speed up idx->midx and bupindex merge
Brandon Low [Mon, 24 Jan 2011 03:31:51 +0000 (19:31 -0800)]
Combine and speed up idx->midx and bupindex merge

These two processes used almost identical algorithms, but were
implemented separately.  The main difference was one was ascending and
the other was descending.

This patch reverses the cmp on index.Entry so that both can share an
algorithm.

It also cuts some overhead in the algorithm by using it.next() instead of
the next() wrapper, yielding a ~6% speedup on midx generation and index merging.

Signed-off-by: Brandon Low <lostlogic@lostlogicx.com>