]> arthur.barton.de Git - bup.git/log
bup.git
13 years agotag-cmd: Some fixups
Gabriel Filion [Thu, 2 Dec 2010 23:01:40 +0000 (18:01 -0500)]
tag-cmd: Some fixups

* Make some error messages nicer in telling the tag name that was used.

* Move tag listing code in git.py and use this code in tag-cmd and vfs.py

* Make tag-cmd output the list of tags on stdout instead of stderr

* Don't error out when more than 2 arguments are given. When there are less
  than 2, be nice and show the usage.

* In tag-cmd, catch GitErrors that come out of git.rev_parse()

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoSkip over invalid .idx files if we find any.
Avery Pennarun [Thu, 23 Dec 2010 02:08:58 +0000 (18:08 -0800)]
Skip over invalid .idx files if we find any.

There's no particular reason to make it fatal; just pretend they're not
there.

Zoran reported a bug where he had (it seems) some zero-length .idx files,
which is weird, but nothing worth aborting a backup over.

Also, fix _mmap_do() to be able to handle mmap'ing a zero-length file
without an error.  It's a trivial and somewhat pointless operation, but it
shouldn't throw an exception.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/server: find .idx filenames more efficiently when needed.
Avery Pennarun [Wed, 22 Dec 2010 18:49:20 +0000 (10:49 -0800)]
cmd/server: find .idx filenames more efficiently when needed.

Rather than mapping *all* the .idx files into memory at once just to look up
a single object, just open/read/close them sequentially.  This should
significantly increase the total repo size on a 32-bit system.  (Of course,
it's still not very ideal; we really should have some kind of fallback mode
for when our total set of indexes starts getting too big.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoREADME.md: suggest using apt-get build-dep.
Jon Dowland [Sat, 18 Dec 2010 07:14:15 +0000 (23:14 -0800)]
README.md: suggest using apt-get build-dep.

13 years agocmd/memtest: stop using weird mmap() and /dev/urandom tricks.
Avery Pennarun [Sat, 4 Dec 2010 14:13:16 +0000 (06:13 -0800)]
cmd/memtest: stop using weird mmap() and /dev/urandom tricks.

I'll just write a C function that can rapidly generate random sha1s.  This
should make it more portable, hopefully fixing a problem reported by Michael
Budde on a Linux/SPARC system.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoReplace a try/except/finally with a nested try block.
Michael Budde [Sat, 4 Dec 2010 13:49:08 +0000 (05:49 -0800)]
Replace a try/except/finally with a nested try block.

try/except/finally doesn't work in python 2.4.

13 years agoUse PyLong_FromUnsignedLong instead of Py_BuildValue("I")
Avery Pennarun [Fri, 3 Dec 2010 19:08:03 +0000 (11:08 -0800)]
Use PyLong_FromUnsignedLong instead of Py_BuildValue("I")

...for python-pre-2.4.3 compatibility.  The "I" option was broken before
python 2.4.3, even though it was supposed to be supported since python 2.3.

Reported by Michael Budde.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRenames --exclude-file to --exclude-from and encaspulates exclude-parsing.
Zoran Zaric [Thu, 2 Dec 2010 06:24:14 +0000 (07:24 +0100)]
Renames --exclude-file to --exclude-from and encaspulates exclude-parsing.

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years agoAdds documentation for --exclude and --exclude-file to bup indexes manpage.
Zoran Zaric [Thu, 2 Dec 2010 06:24:13 +0000 (07:24 +0100)]
Adds documentation for --exclude and --exclude-file to bup indexes manpage.

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years agoAdds --exclude-file option to bup index.
Zoran Zaric [Thu, 2 Dec 2010 06:24:12 +0000 (07:24 +0100)]
Adds --exclude-file option to bup index.

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years agoAdds --exclude option to bup index and bup drecurse
Zoran Zaric [Thu, 2 Dec 2010 06:24:11 +0000 (07:24 +0100)]
Adds --exclude option to bup index and bup drecurse

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years agocmd/midx: differentiate the log message from the index.py merging.
Avery Pennarun [Thu, 2 Dec 2010 00:02:03 +0000 (16:02 -0800)]
cmd/midx: differentiate the log message from the index.py merging.

It's a curse (inherited from git) that .idx files are called "indexes" and
the bupindex is called an "index."  Let's change the message in cmd/midx so
at least we'll know which kind of index people are complaining about.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/index: fix documented default value for --indexfile.
Avery Pennarun [Wed, 1 Dec 2010 11:11:26 +0000 (03:11 -0800)]
cmd/index: fix documented default value for --indexfile.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agomidx: auto-remove midx files that refer to missing .idx files.
Avery Pennarun [Wed, 1 Dec 2010 10:44:18 +0000 (02:44 -0800)]
midx: auto-remove midx files that refer to missing .idx files.

Normally an .idx file doesn't ever disappear, but it could happen if you run
'git gc' on your repository.  Which I thought would be a terrible idea, but
apparently it can actually save a lot of space for some people (although it
takes a pretty long time to run).  And when that happens, all your .idx
files move around.  So let's be polite when that happens.  We'll print a
warning the first time, but then shut up after that since the flawed midx
file will just go away.

Reported by Peter Rabbitson.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoadd a tag command
Gabriel Filion [Fri, 26 Nov 2010 11:00:35 +0000 (06:00 -0500)]
add a tag command

Currently implemented: list all tags, add a tag on a specific commit or
head, delete a known tag.

Also, make vfs expose a new directory called '/.tag' which contains a
link for each tag to its associated commit directory situated in
'/.commit'. Finally, make tags appear as symlinks on branch directories
on which the tagged commit is visible.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoMove commit directories in /.commit/??/
Gabriel Filion [Fri, 26 Nov 2010 11:00:34 +0000 (06:00 -0500)]
Move commit directories in /.commit/??/

Currently, directories in which we can access files of a particular
commit are placed in each branch directory by which it is reachable.

To avoid possible repetitions of commit directories, move the
directories in a new top level hidden directory named /.commit.

This hidden directory is structured as a two level-deep directory
structure, wherein the first level represents the first byte (two
hexadecimal characters) of commit hashes, and the second level
represents the remainder of the hash.

With this movement, branch directories now contain only symlinks to the
commit directories in /.commit/??/

Also, in BranchList (formerly CommitList), the 'latest' commit was
computed on every iteration over a commit. I moved this calculation up
one level so that it is computed only once.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoAdds test for exclude-bupdir.
Zoran Zaric [Tue, 16 Nov 2010 09:12:55 +0000 (10:12 +0100)]
Adds test for exclude-bupdir.

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years agoExcludes BUP_DIR from index.
Zoran Zaric [Tue, 16 Nov 2010 09:12:54 +0000 (10:12 +0100)]
Excludes BUP_DIR from index.

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years agoDocumentation/bup-split.md: fix a parser error.
Avery Pennarun [Wed, 1 Dec 2010 09:31:47 +0000 (01:31 -0800)]
Documentation/bup-split.md: fix a parser error.

The version of pandoc that I use doesn't like quotation marks, so use single
quotes instead.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/memtest: remove an unused function.
Avery Pennarun [Wed, 1 Dec 2010 09:31:31 +0000 (01:31 -0800)]
cmd/memtest: remove an unused function.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoREADME.md: give a suggestion for how to get more documentation.
Avery Pennarun [Wed, 1 Dec 2010 09:30:39 +0000 (01:30 -0800)]
README.md: give a suggestion for how to get more documentation.

Based on a request on the mailing list that pointed out there was no obvious
route from the README to the docs.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agot/test.sh: use /bin/pwd instead of just pwd.
Avery Pennarun [Sat, 13 Nov 2010 05:58:03 +0000 (21:58 -0800)]
t/test.sh: use /bin/pwd instead of just pwd.

$(pwd) seems to sometimes lie, because the shell uses the $PWD environment
variable.  If your PWD is a symlink, this can cause the test to fail since
bup figures out the path using a real call to getcwd().

Problem reported by Zenaan Harkness, though he never did acknowledge if this
fixes his problem :(

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoFix 'make test' on MacOS ("wc -l" returns extra whitespace).
Avery Pennarun [Wed, 10 Nov 2010 16:54:22 +0000 (08:54 -0800)]
Fix 'make test' on MacOS ("wc -l" returns extra whitespace).

By removing the comparison value from quotes, we tell the shell to ignore
whitespace.

Reported by Jimmy Tang.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoAdd make export-docs/push-docs/import-docs targets. bup-0.20
Avery Pennarun [Tue, 9 Nov 2010 06:05:54 +0000 (22:05 -0800)]
Add make export-docs/push-docs/import-docs targets.

export-docs: update local 'man' and 'html' branches with pregenerated doc
files.

push-docs: push those to origin/man and origin/html.

import-docs: extract the documentation from origin/man and origin/html into
the local tree so it can be installed.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRevert (most of) "--remote parameter requires a colon"
Avery Pennarun [Tue, 9 Nov 2010 05:14:04 +0000 (21:14 -0800)]
Revert (most of) "--remote parameter requires a colon"

This reverts (most of) commit c135a5834a9bf9cd9c3382d6489f93e3fdabeafd.

Requiring a colon seems to be too fascist; it makes people think that you
can't use local repositories anymore, which wasn't true: you could just
refer to them as ":/path/to/repo".  But that's just too weird and
non-obvious.  It already resulted in a query on the mailing list, the
avoidance of which is why we added this patch in the first place.  So let's
take it back out.

I kept some minor clarifications and unit test improvements, however.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoAdd a coding style document.
Gabriel Filion [Mon, 11 Oct 2010 18:41:26 +0000 (14:41 -0400)]
Add a coding style document.

The document is largely inspired by the one in Scott Chacon's "HACKING"
file [1] in his 'agitmemnon-server' repository on GitHub with some
precision on the docstring style that was adopted for bup.

http://github.com/schacon/agitmemnon-server/blob/master/HACKING

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoRevert new-style classes
Gabriel Filion [Mon, 11 Oct 2010 18:30:51 +0000 (14:30 -0400)]
Revert new-style classes

Some classes were changed to "new-style" Python classes in c7a0f06.

Following a discussion on the mailing list about the relevance of such a
change, it was noted that the features that new-style classes bring are
not used in bup, and considering their slightly higher cost in
instantiating them and accessing their attributes, it is decided that we
don't change to using them.

Revert the changed clases back to old-style classes so that all code is
consistent.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoIf not directory root, prepend list with ".."
David Roda [Thu, 23 Sep 2010 12:35:55 +0000 (08:35 -0400)]
If not directory root, prepend list with ".."

Add a link to the file list to traverse up a directory if we
are not already at the top

Signed-off-by: David Roda <davidcroda@gmail.com>
13 years agoUse relative width for wrapper. Don't stretch table
David Roda [Thu, 23 Sep 2010 12:35:54 +0000 (08:35 -0400)]
Use relative width for wrapper. Don't stretch table

Switched wrapper from 960px to 90%
Removed width from table and columns to allow the
browser to properly auto align them

Signed-off-by: David Roda <davidcroda@gmail.com>
13 years agoDESIGN: 'bup restore' exists now.
Avery Pennarun [Mon, 18 Oct 2010 16:45:11 +0000 (10:45 -0600)]
DESIGN: 'bup restore' exists now.

Reported by Dieter Plaetinck.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/save: if file.read() returns an error, don't abort.
Avery Pennarun [Sat, 16 Oct 2010 23:55:16 +0000 (17:55 -0600)]
cmd/save: if file.read() returns an error, don't abort.

Apparently some mis-implemented Linux filesystems (selinuxfs) have regular
files that can be opened for read, but return EINVAL when you try to read
them.  We would throw a fatal exception in that case (since we're not
supposed to have read errors ever, and thus that implies something happened
that we didn't think of) but I guess we'd better make this into a non-fatal
error.  It still makes the exit code nonzero so you can see that something
didn't work, though.

Reported by Zoran Zaric.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge remote branch 'origin/master'
Avery Pennarun [Mon, 4 Oct 2010 04:34:57 +0000 (21:34 -0700)]
Merge remote branch 'origin/master'

* origin/master:
  cmd/web: stream large files asynchronously.
  cmd/save: oops, missing a mangle_name() call.

13 years agocmd/web: stream large files asynchronously.
Avery Pennarun [Mon, 4 Oct 2010 03:41:09 +0000 (20:41 -0700)]
cmd/web: stream large files asynchronously.

We had a nice chunkyreader() loop for writing files, but unfortunately,
Tornado captured the full content of those files before writing them to the
client.  Oops.

Change things around so we don't end up buffering some multiple of the
ENTIRE FILE in memory.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/save: oops, missing a mangle_name() call.
Avery Pennarun [Mon, 4 Oct 2010 03:02:20 +0000 (20:02 -0700)]
cmd/save: oops, missing a mangle_name() call.

Directories with names ending in '.bup' - including ~/.bup, sigh - didn't
get the .bupl suffix added, thus making their sizes not calculate correctly.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/split: print a progress counter.
Avery Pennarun [Wed, 22 Sep 2010 14:13:18 +0000 (07:13 -0700)]
cmd/split: print a progress counter.

We don't know how many bytes we're going to split in total, but we can at
least print the total number of bytes we've seen so far.

Also fix cmd/random to *not* print progress messages by default, since my
test situation is
bup random 100M | bup split -b
and they scribble over each other when they both print progress output.  bup
random now gets a '-v' option.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agogit.py: support the old git pack .idx version 1 format.
Avery Pennarun [Wed, 22 Sep 2010 13:46:16 +0000 (06:46 -0700)]
git.py: support the old git pack .idx version 1 format.

It's produced by git-fast-import, and I happen to want to read some of
those.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agogit.py: more careful handling of .idx version codes.
Avery Pennarun [Wed, 22 Sep 2010 13:27:05 +0000 (06:27 -0700)]
git.py: more careful handling of .idx version codes.

Now it prints a useful error if it sees a version 1 pack (no header) or some
newer as-yet-undefined version.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/split: add a --git-ids option.
Avery Pennarun [Wed, 22 Sep 2010 13:02:32 +0000 (06:02 -0700)]
cmd/split: add a --git-ids option.

This lets you provide a list of git object ids on stdin instead of the raw
content.  bup-split then uses a CatPipe to retrieve the objects from git and
hashsplit them.  You could use this as a helper for converting a git repo
that contains a bunch of large files into one that uses bup-style hashsplit
files.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/split: add a new --keep-boundaries option.
Avery Pennarun [Wed, 22 Sep 2010 12:07:33 +0000 (05:07 -0700)]
cmd/split: add a new --keep-boundaries option.

If you provide multiple input files on the command line, sometimes you want
to merge them togther into a single file before re-chunking them (the
default).  But sometimes you want all the files to be treated separately for
chunking purposes, ie. when you know that some of the files will never
change so there's never any point in merging it with previous/subsequent
files.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge branch 'dr/web'
Avery Pennarun [Wed, 22 Sep 2010 02:00:00 +0000 (19:00 -0700)]
Merge branch 'dr/web'

* drweb:
  Add simple styling to bup web.
  If we are showing hidden files, continue to do so.
  Enable static resources. Move css to external file.

13 years agoMerge branch 'maint'
Avery Pennarun [Wed, 22 Sep 2010 01:59:57 +0000 (18:59 -0700)]
Merge branch 'maint'

* maint:
  Fixup ctime/mtime that are outside a 32-bit range.

13 years agoAdd simple styling to bup web.
David Roda [Tue, 14 Sep 2010 01:09:16 +0000 (21:09 -0400)]
Add simple styling to bup web.

This adds a wrapper set to 960px to the bup web layout. It also
sets widths on the table and columns.

I added a wrapper div to list-directory.html but I think that must
have snuck in with a previous commit.  I am not sure how to fix that
so I will leave it for now.  Sorry!

Adds a wrapper div to the html template.

Signed-off-by: David Roda <davidcroda@gmail.com>
13 years agoIf we are showing hidden files, continue to do so.
David Roda [Wed, 22 Sep 2010 01:50:27 +0000 (18:50 -0700)]
If we are showing hidden files, continue to do so.

This appends ?hidden=1 to all url's outputted to the template
if we are currently showing hidden files.  I added a variable
url_append which is appended to the urls outside of the escaping

Signed-off-by: David Roda <davidcroda@gmail.com>
13 years agoEnable static resources. Move css to external file.
David Roda [Tue, 14 Sep 2010 01:09:13 +0000 (21:09 -0400)]
Enable static resources. Move css to external file.

Add to the settings variable in web-cmd.py to set
/web/static to be servced as static resources.  This is for
css, javascript, and images.

Move the current styles from the head to static/css/styles.css.
Remove a few unnecessary styles and change the tab stop
to 4 spaces to match the rest of the code.

Add to Makefile to copy new directory structure.

Signed-off-by: David Roda <davidcroda@gmail.com>
13 years agoFixup ctime/mtime that are outside a 32-bit range.
Avery Pennarun [Thu, 9 Sep 2010 05:42:21 +0000 (22:42 -0700)]
Fixup ctime/mtime that are outside a 32-bit range.

This avoids a DeprecationWarning on python 2.6.  That warning probably
should have been an error instead.

Reported by David Roda.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoAdds a date option to save, split and bup.git.PackWriter._new_commit()
Zoran Zaric [Wed, 8 Sep 2010 20:11:36 +0000 (22:11 +0200)]
Adds a date option to save, split and bup.git.PackWriter._new_commit()

Signed-off-by: Zoran Zaric <zz@zoranzaric.de>
13 years ago--remote parameter requires a colon
David Roda [Wed, 8 Sep 2010 12:58:09 +0000 (08:58 -0400)]
--remote parameter requires a colon

This patch checks for the presence of a colon if the --remote option
is used in bup save, bup split, bup join, and bup init.  Even though
specifying *only* a pathname without a hostname: is perfectly valid,
it's confusing to allow users to do so, because if they specify
"-r hostname" it will be treated as a path and thus give them a
confusing error message. Requiring a colon will avoid this.

It adds a few test cases to demonstrate that the code
works properly.

It also wraps the remote connection in a try except to prevent
a traceback if there is an error (so far I have only seen this
happen with an invalid bup dir parameter)

And I added the netbeans project folder to gitignore

Signed-off-by: David Roda <davidcroda@gmail.com>
13 years agocmd/restore: embarrassingly slow implementation of 'bup restore' bup-0.19
Avery Pennarun [Wed, 8 Sep 2010 09:37:54 +0000 (02:37 -0700)]
cmd/restore: embarrassingly slow implementation of 'bup restore'

Well, that was easy, since vfs.py already existed and is doing most of the
hard work.  Only 103 lines including all the log message handling and
whatnot.

Only one catch: the restoring code is definitely not optimized.  Among other
things (like the probably-excessive-for-our-restoring-needs layering in
vfs.py), we're still calling into 'git cat-file --stdin' to retrieve our
objects.  This involves lots and lots of context switches, plus it can't use
midx files for its lookups.  The result is that restoring takes much more
CPU time and memory than it really should.  But oh well, we have to start
somewhere.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/save: always print a progress() message after a log() message.
Avery Pennarun [Wed, 8 Sep 2010 08:23:53 +0000 (01:23 -0700)]
cmd/save: always print a progress() message after a log() message.

An earlier commit (634df2f8b26a1439f22dc9f6a23d55a006bf0429) made 'bup save'
update the progress line much less frequently.  Unfortunately, if you used
-v or -vv, this would mean that there was *no* progress bar for a short time
after every log() message (directory or filename).  That made the progress
bar flicker annoyingly.

To fix it, make sure we reset the progress bar timer after every filename we
print with log().  It's subtle, but it makes a very visible difference.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: get the real tty width for word wrapping purposes.
Avery Pennarun [Wed, 8 Sep 2010 08:13:09 +0000 (01:13 -0700)]
options.py: get the real tty width for word wrapping purposes.

Previously we just assumed it was 70 chars, which was safe enough, but not
as elegant as actually reading the real value and adjusting the word wrap of
the usage string accordingly.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: remove extra newlines in usage string.
Avery Pennarun [Wed, 8 Sep 2010 07:42:39 +0000 (00:42 -0700)]
options.py: remove extra newlines in usage string.

If the first line after the "--" was a comment (started with whitespace),
then we'd end up printing a double newline instead of a single one after the
synopsis string.

It would also look weird if we had a multi-line comment; the lines would be
separated by blank lines.

'bup damage -?' encountered this problem.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: handle optspecs that include inline square brackets.
Avery Pennarun [Wed, 8 Sep 2010 07:37:44 +0000 (00:37 -0700)]
options.py: handle optspecs that include inline square brackets.

We recently made it so if the last thing on an options line was [defval],
then the value in brackets became the default for that option.  However, we
inadvertently matched *any* bracketed value on that line, not just the one
at the end of the line, which basically prevents us from using square
brackets anywhere on the line.  That's no fun.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agooptions.py: better support for explicit no-* options.
Avery Pennarun [Wed, 8 Sep 2010 07:31:19 +0000 (00:31 -0700)]
options.py: better support for explicit no-* options.

If a declared option name starts with no-xxx, then the 'xxx' option starts
off set to True by default, so that no-xxx is False by default, so that
passing --no-xxx as an option will have the desired effect of setting
--no-xxx=True (and thus --xxx=False).

Previously, trying to list a --no-xxx option in the argument list would
trigger an assertion failure.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoIntroduce BUP_DEBUG, --debug, and tone down the log messages a lot. bup-0.18
Avery Pennarun [Tue, 7 Sep 2010 01:00:57 +0000 (18:00 -0700)]
Introduce BUP_DEBUG, --debug, and tone down the log messages a lot.

There's a new global bup option, --debug (-D) that increments BUP_DEBUG.  If
BUP_DEBUG >=1, debug1() prints; if >= 2, debug2() prints.

We change a bunch of formerly-always-printing log() messages to debug1 or
debug2, so now a typical bup session should be a lot less noisy.

This affects midx in particular, which was *way* too noisy now that 'bup
save' and 'bup server' were running it automatically every now and then.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoclient.py,git.py: run 'bup midx -a' automatically sometimes.
Avery Pennarun [Tue, 7 Sep 2010 00:14:48 +0000 (17:14 -0700)]
client.py,git.py: run 'bup midx -a' automatically sometimes.

Now that 'bup midx -a' is smarter, we should run it automatically after
creating a new index file.  This should remove the need for running it by
hand.

Thus, we also remove 'bup midx' from the lists of commonly-used subcommands.
(While we're here, let's take out 'split' and 'join' too; you should be
using 'index' and 'save' most of the time.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRename 'bup rbackup' to 'bup on'
Avery Pennarun [Mon, 6 Sep 2010 23:18:14 +0000 (16:18 -0700)]
Rename 'bup rbackup' to 'bup on'

'rbackup' was a dumb name but I couldn't think of anything better at the
time.  This works nicely in a grammatical sort of way:

   bup on myserver save -n myserver-backup /etc

Now that we've settled on a name, also add some documentation for the
command.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agorot13 the t/testfile* sample data files.
Avery Pennarun [Mon, 6 Sep 2010 22:48:35 +0000 (15:48 -0700)]
rot13 the t/testfile* sample data files.

They were generated by catting bunches of bup source code together, which,
as it turns out, makes 'git grep' super annoying.  Let's rot13 them so
grepping doesn't do anything interesting but the other characteristics are
the same.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/midx: --auto mode can combine existing midx files now.
Avery Pennarun [Thu, 2 Sep 2010 20:12:12 +0000 (13:12 -0700)]
cmd/midx: --auto mode can combine existing midx files now.

Previously, --auto would *only* create a midx from not-already-midxed .idx
files.  This wasn't optimal since you'd eventually end up with a tonne of
.midx files, which is just as bad as a tonne of .idx files.

Now we'll try to maintain a maximum number of midx files using a
highwater/lowwater mark.  That means the number of active midx files should
now stay between 2 and 5, and you can run 'bup midx -a' as often as you
want.

'bup midx -f' will still make sure everything is in a single .midx file,
which is an efficient thing to run every now and then.

'bup midx -af' is the same, but uses existing midx files rather than forcing
bup to start from only .idx files.  Theoretically this should always be
faster than, and never be worse than, 'bup midx -f'.

Bonus: 'bup midx -a' now works when there's a limited number of file
descriptors.  The previous fix only worked properly with 'bup midx -f'.
(This was rarely a problem since 'bup midx -a' would only ever touch the
last few .idx files, so it didn't need many file descriptors.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge branch 'maint'
Avery Pennarun [Mon, 6 Sep 2010 10:08:19 +0000 (03:08 -0700)]
Merge branch 'maint'

* maint:
  cmd/midx: use getrlimit() to find the max open files.

13 years agocmd/midx: use getrlimit() to find the max open files.
Avery Pennarun [Mon, 6 Sep 2010 08:35:07 +0000 (01:35 -0700)]
cmd/midx: use getrlimit() to find the max open files.

It turns out the default file limit on MacOS is 256, which is less than our
default of 500.  I guess this means trouble after all, so let's auto-detect
it.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge branch 'maint'
Avery Pennarun [Mon, 6 Sep 2010 07:52:54 +0000 (00:52 -0700)]
Merge branch 'maint'

* maint:
  index.py: handle uid/gid == -1 on cygwin
  cmd/memtest: use getrusage() instead of /proc/self/stat.
  cmd/index: catch exception for paths that don't exist.
  Don't use $(wildcard) during 'make install'.
  Don't forget to install _helpers.dll on cygwin.

13 years agocmd/margin: interpret the meaning of the margin bits.
Avery Pennarun [Sun, 5 Sep 2010 18:59:39 +0000 (11:59 -0700)]
cmd/margin: interpret the meaning of the margin bits.

Maybe you were wondering how good it is when 'bup margin' returns 40 or 45.
Well, now it'll tell you.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoindex.py: handle uid/gid == -1 on cygwin bup-0.17b
Avery Pennarun [Mon, 6 Sep 2010 07:32:59 +0000 (00:32 -0700)]
index.py: handle uid/gid == -1 on cygwin

On cygwin, the uid or gid might be -1 for some reason.  struct.pack()
complains about a DeprecationWarning when packing a negative number into an
unsigned int, so fix it up first.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/memtest: use getrusage() instead of /proc/self/stat.
Avery Pennarun [Mon, 6 Sep 2010 04:52:10 +0000 (21:52 -0700)]
cmd/memtest: use getrusage() instead of /proc/self/stat.

Only Linux has /proc/self/stat, so 'bup memtest' didn't work on anything
except Linux.  Unfortunately, getrusage() on *Linux* doesn't have a valid
RSS field (sigh), so we have to use /proc/self/stat as a fallback if it's
zero.

Now memtest works on MacOS as well, which means 'make test' passes again.
(It stopped passing because 'bup memtest' recently got added to one of the
tests.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/index: catch exception for paths that don't exist.
David Roda [Tue, 31 Aug 2010 22:25:34 +0000 (18:25 -0400)]
cmd/index: catch exception for paths that don't exist.

Rather than aborting completely if a path specified on the command line
doesn't exist, report it as a non-fatal error instead.

(Heavily modified by apenwarr from David Roda's original patch.)

Signed-off-by: David Roda <davidcroda@gmail.com>
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoDocumentation/*.md: add some options that we forgot to document.
Avery Pennarun [Sun, 5 Sep 2010 18:44:41 +0000 (11:44 -0700)]
Documentation/*.md: add some options that we forgot to document.

Software evolves, but documentation evolves... slower.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRename Documentation/*.1.md to Documentation/*.md
Avery Pennarun [Sun, 5 Sep 2010 18:04:42 +0000 (11:04 -0700)]
Rename Documentation/*.1.md to Documentation/*.md

All our man pages end up in section 1 of man anyway, and it looks like that
will probably never change.  So let's make our filenames simpler and easier
to understand.

Even if we do end up adding a page in (say) section 5 someday, it's no big
deal; we can just add an exception to the Makefile for it or something.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoDon't use $(wildcard) during 'make install'. bup-0.17a
Avery Pennarun [Sat, 4 Sep 2010 23:42:47 +0000 (16:42 -0700)]
Don't use $(wildcard) during 'make install'.

It seems the $(wildcard) is evaluated once at make's startup, so any changes
made *during* build don't get noticed.

That means 'make install' would fail if you ran it without first running
'make all', because $(wildcard cmd/bup-*) wouldn't match anything at startup
time; the files we were copying only got created during the build.

Problem reported by David Roda.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoDon't forget to install _helpers.dll on cygwin.
Avery Pennarun [Sat, 4 Sep 2010 23:33:25 +0000 (16:33 -0700)]
Don't forget to install _helpers.dll on cygwin.

We were installing *.so, but not *$(SOEXT) like we should have.  Now we do,
which should fix some cygwin install problems reported by David Roda.

Also, when installing *.so and *.dll files, make them 0755 instead of 0644,
also to prevent permissions problems on cygwin, also reported by David Roda.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoMerge branch 'maint'
Avery Pennarun [Thu, 2 Sep 2010 21:33:04 +0000 (14:33 -0700)]
Merge branch 'maint'

* maint:
  git.py: recover more elegantly if a MIDX file has the wrong version.
  cmd/midx: add a new --max-files parameter.

Conflicts:
lib/bup/git.py

13 years agoMerge branch 'guesser'
Avery Pennarun [Thu, 2 Sep 2010 21:29:37 +0000 (14:29 -0700)]
Merge branch 'guesser'

* guesser:
  _helpers.extract_bits(): rewrite git.extract_bits() in C.
  _helpers.firstword(): a new function to extract the first 32 bits.
  git.py: when seeking inside a midx, use statistical guessing.

13 years agogit.py: recover more elegantly if a MIDX file has the wrong version.
Avery Pennarun [Thu, 2 Sep 2010 21:26:25 +0000 (14:26 -0700)]
git.py: recover more elegantly if a MIDX file has the wrong version.

Previously we'd throw an assertion for any too-new-format MIDX file, which
isn't so good.  Let's recover more politely (and just ignore the file in
question) if that happens.

Noticed by Zoran Zaric who was testing my midx3 branch.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/midx: add a new --max-files parameter.
Avery Pennarun [Thu, 2 Sep 2010 20:16:39 +0000 (13:16 -0700)]
cmd/midx: add a new --max-files parameter.

Zoran reported that 'bup midx -f' on his system tried to open 3000 files at
a time and wouldn't work.  That's no good, so let's limit the maximum files
to open; the default is 500 for now, since that ought to be usable for
normal people.  Arguably we could use getrlimit() or something to find out
the actual maximum, or just keep opening stuff until we get an error, but
maybe there's no point.

Unfortunately this patch isn't really perfect, because it limits the
usefulness of midx files.  If you could merge midx files into other midx
files, then you could at least group them all together after multiple runs,
but that's not currently supported.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_helpers.extract_bits(): rewrite git.extract_bits() in C.
Avery Pennarun [Fri, 27 Aug 2010 03:19:49 +0000 (20:19 -0700)]
_helpers.extract_bits(): rewrite git.extract_bits() in C.

That makes our memtest run just slightly faster: 2.8 seconds instead of 3.0
seconds, which catches us back up with the pre-interpolation-search code.
Thus we should now be able to release this patch without feeling embarrassed
:)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_helpers.firstword(): a new function to extract the first 32 bits.
Avery Pennarun [Fri, 27 Aug 2010 03:11:45 +0000 (20:11 -0700)]
_helpers.firstword(): a new function to extract the first 32 bits.

This is a pretty common operation in git.py and it speeds up cmd/memtest
results considerably: from 3.7 seconds to 3.0 seconds.

That gets us *almost* as fast as we were before the whole statistical
guessing thing, but we still enjoy the improved memory usage.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agogit.py: when seeking inside a midx, use statistical guessing.
Avery Pennarun [Fri, 27 Aug 2010 02:31:24 +0000 (19:31 -0700)]
git.py: when seeking inside a midx, use statistical guessing.

Instead of using a pure binary search (where we seek to the middle of the
area and do a greater/lesser comparison) we now use an "interpolation
search" (http://en.wikipedia.org/wiki/Interpolation_search), which means we
seek to where we statistically *expect* the desired value to be.

In my test data, this reduces the number of typical search steps in my test
midx from 8.7 steps/object to 4.8 steps/object.

This reduces memory churn when using a midx, since sometimes a given search
region spans two pages, and this technique allows us to more quickly
eliminate one of the two pages sometimes, allowing us to dirty one fewer
page.

Unfortunately the implementation requires some futzing, so this actually
makes memtest run about 35% *slower*.  Will try to fix that next.

The original link to this algorithm came from this article:
http://sna-projects.com/blog/2010/06/beating-binary-search/

Thanks, article!

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agogit.py: keep statistics on how much sha1 searching we had to do.
Avery Pennarun [Fri, 27 Aug 2010 01:29:36 +0000 (18:29 -0700)]
git.py: keep statistics on how much sha1 searching we had to do.

And cmd/memtest prints out the results.  Unfortunately this slows down
memtest runs by 0.126/2.526 = 5% or so.  Yuck.  Well, we can take it out
later.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/memtest: add a --existing option to test with existing objects.
Avery Pennarun [Fri, 27 Aug 2010 02:16:34 +0000 (19:16 -0700)]
cmd/memtest: add a --existing option to test with existing objects.

This is useful for testing behaviour when we're looking for objects
that *do* exist.  Of course, it just goes through the objects in order, so
it's not actually that realistic.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/midx: fix SHA_PER_PAGE calculation.
Avery Pennarun [Thu, 26 Aug 2010 04:06:46 +0000 (21:06 -0700)]
cmd/midx: fix SHA_PER_PAGE calculation.

For some reason we were dividing by 200 instead of by 20, which was way off.
Switch to 20 instead.  Suspiciously, this makes memory usage slightly worse
in my current (smallish) set of test data, so we might need to revert it
later...?  But if we're going to have an adjustment, we should at least make
it clear what for, rather than hiding it in something that looks
suspiciously like a typo.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/margin: add a new --predict option.
Avery Pennarun [Thu, 26 Aug 2010 03:40:34 +0000 (20:40 -0700)]
cmd/margin: add a new --predict option.

When --predict is given, it tries to guess the offset in the indexfile of
each hash, based on assumption that the hashes are distributed evenly
throughout the file.  Then it prints the maximum amount by which this guess
deviates from reality.

I was hoping the results would show that the maximum deviation in a typical
midx was less than a page's worth of hashes; that would mean the toplevel
lookup table could be redundant, which means fewer pages hit in the
common case.  No such luck, unfortunately; with 1.6 million objects, my
maximum deviation was 913 hashes (about 18 kbytes, or 5 pages).

By comparison, midx files should hit about 2 pages in the common case (1
lookup table + 1 data page).  Or 3 pages if we're unlucky and the search
spans two data pages.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/memtest: print per-cycle and total times.
Avery Pennarun [Thu, 26 Aug 2010 03:06:16 +0000 (20:06 -0700)]
cmd/memtest: print per-cycle and total times.

This makes it easier to compare output from other people or between
machines, and also gives a clue as to swappiness.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoRename _faster.so to _helpers.so. bup-0.17
Avery Pennarun [Mon, 23 Aug 2010 03:27:03 +0000 (20:27 -0700)]
Rename _faster.so to _helpers.so.

Okay, _faster.so wasn't a good choice of names.  Partly because not
everything in there is just to make stuff faster, and partly because some
*proposed* changes to it don't just make stuff faster.  So let's rename it
one more time.  Hopefully the last time for a while!

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agolib/bup/ssh: Add docstrings
Gabriel Filion [Mon, 16 Aug 2010 01:29:34 +0000 (21:29 -0400)]
lib/bup/ssh: Add docstrings

Document the code with doctrings.

Also add an "import sys" line since it is used by sys.argv[0] on line 6.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agolib/bup/options: Add docstrings
Gabriel Filion [Mon, 16 Aug 2010 01:29:33 +0000 (21:29 -0400)]
lib/bup/options: Add docstrings

Document the code with docstrings.

Use one line per imported module as recommended by PEP 8 to make it
easier to spot unused modules.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agoimport cleanup
Gabriel Filion [Mon, 16 Aug 2010 01:29:29 +0000 (21:29 -0400)]
import cleanup

Remove unused imported modules.

I started using the pyflakes.vim plugin and it automagically shows a
bunch of problems/uncleanliness in the code. It helped me pull this out
in 15mins.

This change shouldn't have any impact on performance or functionality
but it makes the code cleaner.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agocmd/ftp: don't die if we can't import the ctypes module.
Avery Pennarun [Sun, 22 Aug 2010 06:44:49 +0000 (23:44 -0700)]
cmd/ftp: don't die if we can't import the ctypes module.

It's only needed on some rare broken versions of readline anyway.  If we
can't find the module, chances are the system doesn't have that broken
version of readline.

Based on suggestions by Gabriel Filion and Aaron Ucko.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agolib/bup/vfs: bring back Python 2.4 support
Gabriel Filion [Fri, 20 Aug 2010 06:24:57 +0000 (02:24 -0400)]
lib/bup/vfs: bring back Python 2.4 support

There is currently one test failure when running tests against Python
2.4: a try..except..finally block that's interpreted as a syntax error.
The commit introducing this incompatibility with 2.4 is f77a0829

This is a well known python 2.4 limitation and the workaround, although
ugly, is easy.

With this test passing, Python 2.4 support is back.

Signed-off-by: Gabriel Filion <lelutin@gmail.com>
13 years agolib/bup/vfs: Add docstrings
Gabriel Filion [Mon, 2 Aug 2010 06:20:06 +0000 (02:20 -0400)]
lib/bup/vfs: Add docstrings

Since the vfs module uses the function git._treeparse, it should not be
named as if it was a private function. Rename git._treeparse to
git.treeparse and document it (add a docstring to it).

Also, transform _ChunkReader, _FileReader and Node into new-style
classes.

Finally, remove trailing spaces from lib/bup/vfs.py .

13 years agoDESIGN: update mentions of stupidsum to reflect new rollsum algorithm.
Avery Pennarun [Mon, 2 Aug 2010 03:01:56 +0000 (23:01 -0400)]
DESIGN: update mentions of stupidsum to reflect new rollsum algorithm.

Pointed out by Gabriel Filion.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agoREADME: typo.
Avery Pennarun [Sun, 1 Aug 2010 15:18:23 +0000 (11:18 -0400)]
README: typo.

Noticed by Zoran Zaric.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/save: update the progress meter less often.
Avery Pennarun [Sat, 31 Jul 2010 06:33:38 +0000 (02:33 -0400)]
cmd/save: update the progress meter less often.

If you ran 'bup save' in an ssh sessio, you could end up sending huge
amounts of data back over ssh *just* to update the progress meter after
every single block!  Oops.  Limit the updates to only about 5 per second,
which is much better.

13 years agoRename _hashsplit.so to _faster.so, and move bupsplit into its own source file.
Avery Pennarun [Sat, 31 Jul 2010 00:23:08 +0000 (20:23 -0400)]
Rename _hashsplit.so to _faster.so, and move bupsplit into its own source file.

A lot of stuff in _hashsplit.c wasn't actually about hashsplitting; it was
just a catch-all for all our C accelerator functions.  Now the module name
reflects that.

Also move the bupsplit functions into their own non-python-dependent C
source file so they can be used as part of other projects.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agotest.sh: check the return code of 'bup random'
Avery Pennarun [Sat, 31 Jul 2010 00:17:15 +0000 (20:17 -0400)]
test.sh: check the return code of 'bup random'

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agocmd/{random,memtest}: use the new options.py default value support.
Avery Pennarun [Wed, 28 Jul 2010 06:37:38 +0000 (02:37 -0400)]
cmd/{random,memtest}: use the new options.py default value support.

13 years agooptions.py: support for putting default values in [square brackets].
Avery Pennarun [Fri, 16 Jul 2010 06:45:33 +0000 (02:45 -0400)]
options.py: support for putting default values in [square brackets].

This looks good in the usage message, and is a better place to hardcode such
things than in the code itself.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_hashsplit.c: get rid of some warnings indicated by a C++ compiler.
Avery Pennarun [Tue, 27 Jul 2010 07:05:55 +0000 (03:05 -0400)]
_hashsplit.c: get rid of some warnings indicated by a C++ compiler.

Not hugely important, but might as well fix them.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_hashsplit.c: replace the stupidsum algorithm with rsync's adler32-based one.
Avery Pennarun [Tue, 27 Jul 2010 05:27:54 +0000 (01:27 -0400)]
_hashsplit.c: replace the stupidsum algorithm with rsync's adler32-based one.

I've been meaning to do this for a while, but a particular test dataset that
really caused problems with stupidsum() (ie. it split things into way more
chunks than it should have) finally screwed me over.  Let's change over to a
"real" checksum algorithm.

Non-annoying datasets shouldn't be noticeably affected, but bad ones (such
as my test case from EQL Data) can be 10x more sensible.  Typical backup
sets now have about 20% fewer chunks, although this has little affect on the
overall repository size.

WARNING: After this patch, all your chunk boundaries will be different from
before!  That means your incremental backups won't be terribly incremental
and your backup repositories will jump in size.  This should only happen
once.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_hashsplit.c: switch rollsum_roll() to a macro instead of an inline function.
Avery Pennarun [Tue, 27 Jul 2010 07:06:26 +0000 (03:06 -0400)]
_hashsplit.c: switch rollsum_roll() to a macro instead of an inline function.

gcc 4.3's optimizer manages to fail at optimizing the inline, but works okay
with the macro.

Mysteriously, if find_ofs() is *not* static (and therefore presumably
*harder* to optimize), the optimizer works either way.  But removing the
static is just wrong, so use the macro instead.

The difference in speed is about 53 megs/sec vs 80 megs/sec on my machine
for this command:

bup random 100M 2>/dev/null | bup split -N --bench

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years ago_hashsplit.c: refactor a bit, and add a self-test.
Avery Pennarun [Tue, 27 Jul 2010 04:49:20 +0000 (00:49 -0400)]
_hashsplit.c: refactor a bit, and add a self-test.

In preparation for replacing the stupidsum algorithm with the rsync
adler32-based one.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
13 years agomake clean: remove some leftover files.
Avery Pennarun [Wed, 28 Jul 2010 04:43:11 +0000 (00:43 -0400)]
make clean: remove some leftover files.

Stuff has moved around a bit recently, and we weren't cleaning up everything
like we should.