arthur.barton.de Git - bup.git/log

]> arthur.barton.de Git - bup.git/log

projects / bup.git / log

commit | commitdiff | tree

Avery Pennarun [Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)]

Update the README to reflect recent changes.

commit | commitdiff | tree

Avery Pennarun [Mon, 11 Jan 2010 20:18:35 +0000 (15:18 -0500)]

Merge branch 'cygwin'

* cygwin:
  Assorted cleanups to Luke's cygwin fixes.
  Makefile: work with cygwin on different windows versions.
  .gitignore sanity.
  Makefile:  On Windows, executable files must end with .exe.
  client.py:  Windows files don't support ':', so rename cachedir.
  index.py:  os.rename() fails on Windows if dstfile already exists.
  Don't try to rename tmpfiles into existing open files.
  helpers.py:  Cygwin doesn't support `hostname -f`, use `hostname`.
  cmd-index.py:  Retry os.open without O_LARGEFILE if not supported.
  Makefile:  Build on Windows under Cygwin.

commit | commitdiff | tree

Avery Pennarun [Mon, 11 Jan 2010 20:06:03 +0000 (15:06 -0500)]

Assorted cleanups to Luke's cygwin fixes.

There were a few things that weren't quite done how I would have done them,
so I changed the implementation.  Should still work in cygwin, though.

The only actual functional changes are:
- index.Reader.close() now actually sets m=None rather than just closing it
- removed the "if rename fails, then unlink first" logic, which is
   seemingly not needed after all.
- rather than special-casing cygwin to use "hostname" instead of "hostname
   -f", it turns out python has a socket.getfqdn() that does what we want.

commit | commitdiff | tree

Avery Pennarun [Mon, 11 Jan 2010 19:57:23 +0000 (14:57 -0500)]

Makefile: work with cygwin on different windows versions.

Just check the CYGWIN part; don't depend on the fact that it's NT 5.1. (Of
course, uname isn't supposed to report such things by default anyway... but
that's cygwin for you.)

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 09:38:32 +0000 (04:38 -0500)]

Merge branch 'master' of git://github.com/apenwarr/bup

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 09:19:33 +0000 (04:19 -0500)]

.gitignore sanity.

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 09:18:49 +0000 (04:18 -0500)]

Makefile: On Windows, executable files must end with .exe.

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 09:15:07 +0000 (04:15 -0500)]

client.py: Windows files don't support ':', so rename cachedir.

Cachedir was previously $host:$dir, and is now $host-$dir.

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 09:07:10 +0000 (04:07 -0500)]

index.py: os.rename() fails on Windows if dstfile already exists.

Hence, we perform an os.unlink on the dstfile if os.rename() receives
an OSError exception, and try again.

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 09:04:17 +0000 (04:04 -0500)]

Don't try to rename tmpfiles into existing open files.

Linux and friends have no problem with this, but Windows doesn't allow
this without some effort, which we can avoid by... not needing to write
to an already-open file.

Give index.Reader a 'close' method which identifies and closes any open
mmaped files, and make cmd-index.py use this before trying to close a
index.Writer instance (which renames a tmpfile into the same file the
Reader has mmaped).

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 08:59:20 +0000 (03:59 -0500)]

helpers.py: Cygwin doesn't support `hostname -f`, use `hostname`.

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 08:57:42 +0000 (03:57 -0500)]

cmd-index.py: Retry os.open without O_LARGEFILE if not supported.

Python under Cygwin doesn't have os.O_LARGEFILE, so if we receive an
'AttributeError' exception trying to open something, just remove
O_LARGEFILE and try again.

commit | commitdiff | tree

Lukasz Kosewski [Sun, 10 Jan 2010 08:52:52 +0000 (03:52 -0500)]

Makefile:  Build on Windows under Cygwin.

- Python modules have to end with .dll instead .so to load into Python
  via 'import'.
- GCC under Windows builds all programs with -fPIC, and doesn't accept
  this command-line option.
- libpython2.5.dll is found in /usr/bin under Cygwin (wtf?), so we need
  to add this to the LDFLAGS path.
- 'make clean' should remove .dll files too.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 07:12:47 +0000 (02:12 -0500)]

Oops, 'bup save /' produced an invalid tree.

Add a bunch of assertions to make sure that never happens.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 06:13:10 +0000 (01:13 -0500)]

This adds the long-awaited indexfile feature, so you no longer have to feed
your backups through tar.

Okay, 'bup save' is still a bit weak... but it could be much worse.

Merge branch 'indexfile'

* indexfile:
  Minor fix for python 2.4.4 compatibility.
  cmd-save: completely reimplement using the indexfile.
  Moved some reusable index-handling code from cmd-index.py to index.py.
  A bunch of wvtests for the 'bup index' command.
  Start using wvtest.sh for shell-based tests in test-sh.
  cmd-index: default indexfile path is ~/.bup/bupindex, not $PWD/index
  cmd-index: skip merging the index if nothing was written to the new one.
  cmd-index: only update if -u is given; print only given file/dirnames.
  cmd-index: correct reporting of deleted vs. added vs. modified status.
  Generalize the multi-index-walking code.
  cmd-index: indexfiles should start with a well-known header.
  cmd-index: eliminate redundant paths from index update command.
  cmd-index: some handy options.
  index: add --xdev (--one-file-system) option.
  Fix some bugs with indexing '/'
  cmd-index: basic index reader/writer/merger.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 05:56:58 +0000 (00:56 -0500)]

Minor fix for python 2.4.4 compatibility.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 03:43:48 +0000 (22:43 -0500)]

cmd-save: completely reimplement using the indexfile.

'bup save' no longer walks the filesystem: instead it walks the indexfile
(which is much faster) and doesn't bother opening any files that haven't had
an attribute change, since it can just reuse their sha1 from before. That
makes it *much* faster in the common case.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 02:41:07 +0000 (21:41 -0500)]

Moved some reusable index-handling code from cmd-index.py to index.py.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 02:17:10 +0000 (21:17 -0500)]

A bunch of wvtests for the 'bup index' command.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 01:35:16 +0000 (20:35 -0500)]

Start using wvtest.sh for shell-based tests in test-sh.

This makes the output a little prettier... at least in the common case where
it passes :)

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 00:50:13 +0000 (19:50 -0500)]

cmd-index: default indexfile path is ~/.bup/bupindex, not $PWD/index

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 00:47:14 +0000 (19:47 -0500)]

cmd-index: skip merging the index if nothing was written to the new one.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 00:27:26 +0000 (19:27 -0500)]

cmd-index: only update if -u is given; print only given file/dirnames.

cmd-index now does two things:
- it updates the index with the given names if -u is given
- it prints the index if -p, -s, or -m are given.

In both cases, if filenames are given, it operates (recursively) on the
given filenames or directories. If no filenames are given, -u fails (we
don't want to default to /; it's too slow) but -p/s/m just prints the whole
index.

commit | commitdiff | tree

Avery Pennarun [Sun, 10 Jan 2010 00:07:05 +0000 (19:07 -0500)]

cmd-index: correct reporting of deleted vs. added vs. modified status.

A file with an all-zero sha1 is considered Added instead of Modified, since
it has obviously *never* had a valid sha1.  (A modified file has an old
sha1, but IX_HASHVALID isn't set.)

We also now don't remove old files from the index - for now - so that we can
report old files with a D status.  This might perhaps be useful eventually.

Furthermore, we had a but where reindexing a particular filename would
"sometimes" cause siblings of that file to be marked as deleted.  The
sibling entries should never be updated, because we didn't check them and
thus have no idea of their new status.  This bug was mostly caused by the
silly way we current pass dirnames and filenames around...

commit | commitdiff | tree

Avery Pennarun [Sat, 9 Jan 2010 23:25:23 +0000 (18:25 -0500)]

Generalize the multi-index-walking code.

Now you can walk through multiple indexes correctly from anywhere, avoiding
the need for merging a huge index just to update a few files.

commit | commitdiff | tree

Avery Pennarun [Sat, 9 Jan 2010 22:12:36 +0000 (17:12 -0500)]

cmd-index: indexfiles should start with a well-known header.

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 23:54:40 +0000 (18:54 -0500)]

cmd-index: eliminate redundant paths from index update command.

If someone asks to update "/etc" and "/etc/passwd", the latter is redundant
because it's included in the first.  Don't bother updating the file twice
(and thus causing two index merges, etc).

Ideally we would only do one merge for *any* number of updates (etc /etc and
/var).  This should be possible as long as we sort the entries correctly
(/var/ and then /etc/), since a single sequential indexfile could just have
one appended to the other.  But we don't do that yet.

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 23:43:02 +0000 (18:43 -0500)]

cmd-index: some handy options.

New options:
--modified: print only files that aren't up to date
--status: prefix printouts with status chars
--fake-valid: mark all entries as up to date
--indexfile: override the default index filename

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 07:54:45 +0000 (02:54 -0500)]

index: add --xdev (--one-file-system) option.

For not traversing across filesystem boundaries.

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 07:50:09 +0000 (02:50 -0500)]

Fix some bugs with indexing '/'

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 04:55:48 +0000 (23:55 -0500)]

cmd-index: basic index reader/writer/merger.

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 23:16:52 +0000 (18:16 -0500)]

On MacOS X, "wc -l" returns extra whitespace.

ie. " 0" instead of "0".

The easiest workaround is to compare as a number instead of as a string.
This seems to work correctly on both MacOS and Linux.

commit | commitdiff | tree

Avery Pennarun [Thu, 7 Jan 2010 23:14:20 +0000 (18:14 -0500)]

More compile options for MacOS X.

Based on a patch from Dave Coombs. I changed it to auto-detect the OS
platform, so I might have broken it, however.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 21:42:54 +0000 (16:42 -0500)]

splitting to a remote server would cause "already busy" errors.

Specifically:
client.ClientError: already busy with command 'receive-objects'

That's because recent changes removed the call to onclose() from
PackWriter_Remote. Now it's back, plus I added an extra unit test to reveal
the problem.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 18:03:23 +0000 (13:03 -0500)]

client: enhance the PATH when searching for the 'bup' binary.

Automatically adds the *local* $PWD to the *remote* $PATH before trying to
run 'bup server'. That way, if you build the source in exactly the same
folder on two machines - or if those two machines are actually the same
machine and you're just doing a test against localhost - it'll work.

I hereby curse both "sh -c <command>" and "ssh hostname -- <command>" for
not allowing a sensible way to just set argv[] without doing any stupid
quoting. Nasty.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 18:02:58 +0000 (13:02 -0500)]

wvtest: coerce non-string arguments when printing.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 17:07:59 +0000 (12:07 -0500)]

Much more user-friendly error messages when bup can't exec the server.

...which happens unfortunately often, including in 'make test' when PATH
doesn't include bup. I'll fix that next. But it makes sense to fix the
error messages first :)

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 16:48:35 +0000 (11:48 -0500)]

Add a 'make stupid' target that does 'make test' with a minimal PATH.

Because I'm stupid and I keep forgetting to test what happens if you don't
have 'bup' in your PATH.

Thanks to Dave Coombs and Andy Chong for reporting the problem. And in
v0.01 too.

commit | commitdiff | tree

Michael Wolf [Wed, 6 Jan 2010 15:18:37 +0000 (10:18 -0500)]

Merge remote branch 'remotes/apenwarr-master/master'

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 05:19:11 +0000 (00:19 -0500)]

split: Prevent memory drain from excessively long shalists.

This avoids huge RAM usage when you're splitting a really huge object, plus
git probably doesn't work too well with single trees that contain millions
of objects anyway.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 04:42:15 +0000 (23:42 -0500)]

Split packs around 100M objects or 1G bytes.

This will make pruning much easier later, plus avoids any problems with
packs >= 2GB (not that we've had any of those yet, but...), plus avoids
wasting RAM with an overly full MultiPackIndex.also{} dictionary.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 04:50:41 +0000 (23:50 -0500)]

OOPS! Was writing one byte at a time to the server.

_raw_write() expects a list, not a string, so it was iterating over it
character by character. Magically it worked anyway. Which is sort of cool,
and yet not.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 03:21:18 +0000 (22:21 -0500)]

Fix compatibility with git 1.5.4.3 (Ubuntu Hardy).

Thanks to Andy Chong for reporting the problem.

Basically it comes down to two things that are missing in that version but
exist in git 1.5.6:

  - git init --bare doesn't work, but git --bare init does.
  - git cat-file --batch doesn't exist in that version.

Unfortunately, the latter problem is pretty serious; bup join is really slow
without it.  I guess it might be time to implement an internal version of
cat-file.

commit | commitdiff | tree

Michael Wolf [Wed, 6 Jan 2010 03:00:59 +0000 (22:00 -0500)]

Figure out where Python dynamic libraries live. Use them, too.

commit | commitdiff | tree

Avery Pennarun [Wed, 6 Jan 2010 02:41:07 +0000 (21:41 -0500)]

Older git needs 'git --bare init' instead of 'git init --bare'

Needed in at least 1.5.4 (Ubuntu Hardy), but not 1.5.6 (Debian Lenny).

commit | commitdiff | tree

Avery Pennarun [Tue, 5 Jan 2010 17:11:27 +0000 (12:11 -0500)]

test-sh: don't truncate stderr.

Normally you wouldn't notice this problem, unless you tried to run:

make test >&test.out

and found that it was full of NUL characters. Oops.

commit | commitdiff | tree

Avery Pennarun [Tue, 5 Jan 2010 18:02:16 +0000 (13:02 -0500)]

Makefile: avoid using backquotes.

commit | commitdiff | tree

David Wolever [Tue, 5 Jan 2010 05:59:11 +0000 (00:59 -0500)]

Make now calls python2.5-config.

commit | commitdiff | tree

Avery Pennarun [Mon, 4 Jan 2010 16:48:38 +0000 (11:48 -0500)]

Fix two bugs reported by dcoombs.

test-sh was assuming 'bup' was on the PATH. (It wasn't *supposed* to be
assuming that, but the "alias bup=whatever" line wasn't working,
apparently.)

randomgen.c triggered a warning in some versions of gcc about the return
value of write() being ignored. It really doesn't bother me if some of my
random bytes don't get written, but whatever; I'll assert instead, which
should shut it up.

commit | commitdiff | tree

Avery Pennarun [Mon, 4 Jan 2010 04:39:52 +0000 (23:39 -0500)]

Add a README for v0.01.

commit | commitdiff | tree

Avery Pennarun [Mon, 4 Jan 2010 03:06:29 +0000 (22:06 -0500)]

Force generated pack indexes to be version 2.

We don't support parsing earlier versions, and the earlier versions are
inferior anyway.

commit | commitdiff | tree

Avery Pennarun [Mon, 4 Jan 2010 02:51:41 +0000 (21:51 -0500)]

Rewrite bup-join in python and add remote server support.

commit | commitdiff | tree

Avery Pennarun [Mon, 4 Jan 2010 01:30:25 +0000 (20:30 -0500)]

We can now update refs when we do a backup.

Supported by both cmd-save and cmd-split, albeit with a disturbing amount of
code duplication.

Also updated bup-join to use a default BUP_DIR if none is specified.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 20:46:45 +0000 (15:46 -0500)]

Refactored client stuff into client.py; now cmd-save and cmd-init use it too.

Still not updating refs, however, so it remains a bit stupid.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 11:32:32 +0000 (06:32 -0500)]

Fix some problems running on older Debian.

python 2.5 (pre-2.5.2) can't struct.unpack from a buffer(); coerce it to a
string first.

The default python is 2.4, so run /usr/bin/python2.5 explicitly.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 11:17:30 +0000 (06:17 -0500)]

Support incremental backups to a remote server.

We now cache the server's packfile indexes locally, so we know which objects
he does and doesn't have. That way we can send him a packfile with only the
ones he's missing.

cmd-split supports this now, but cmd-save still doesn't support remote
servers.

The -n option (set a ref correctly) doesn't work yet either.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 10:00:38 +0000 (05:00 -0500)]

Extremely basic 'bup server' support.

It's enough to send a pack to the remote end with 'bup split', though 'bup
save' doesn't support it yet, and we're not smart enough to do incremental
backups, which means we generate the gigantic pack every single time.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 08:01:40 +0000 (03:01 -0500)]

bup save: handle symlinks correctly.

ie. save symlinks as symlinks, don't blindly recurse into them.

Also, directory names in trees need to be sorted as if they were "name/",
not "name". Otherwise git fsck complains.

Improved 'make test' to check 'git fsck' output and to actually run bup
save.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 07:18:35 +0000 (02:18 -0500)]

cmd-join didn't honour $BUP_DIR.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 04:58:45 +0000 (23:58 -0500)]

Add 'bup init' command.

Basically an alias for git init, but uses BUP_DIR instead of GIT_DIR
environment variable.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 04:19:05 +0000 (23:19 -0500)]

Moved some git.* globals into PackWriter; use BUP_DIR as repo path.

If BUP_DIR isn't set, it defaults to ./.git, which is probably not so smart,
but works for now (and was what we were using *until* now anyway).

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 03:42:58 +0000 (22:42 -0500)]

Name temp files from 'make test' as *.tmp to make them easier to clean.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 03:34:43 +0000 (22:34 -0500)]

Better behaviour with --verbose.

Also added --verbose to cmd-save.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 03:10:57 +0000 (22:10 -0500)]

Use binary sha1 instead of hex sha1 whenever possible.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 03:10:10 +0000 (22:10 -0500)]

'make test' now tests the -t and -c options of bup split.

commit | commitdiff | tree

Avery Pennarun [Sun, 3 Jan 2010 02:10:17 +0000 (21:10 -0500)]

git.PackIndex: a class for quickly searching a git packfile index.

This will allow us to generate incremental backups more efficiently, since
we can avoid rewriting already-known objects into a new pack.

commit | commitdiff | tree

Avery Pennarun [Sat, 2 Jan 2010 09:16:25 +0000 (04:16 -0500)]

Write git pack files instead of loose object files.

This causes much, much less disk grinding than creating zillions of files,
plus it's even more disk space efficient.

We could theoretically make it go even faster by generating the .idx file
ourselves, but for now, we just call "git index-pack" to do it. That
helpfully also confirms that the data was written in a git-compatible way.

commit | commitdiff | tree

Avery Pennarun [Sat, 2 Jan 2010 07:36:13 +0000 (02:36 -0500)]

bup split: print extra output to stderr if -v or -vv is given.

commit | commitdiff | tree

Avery Pennarun [Sat, 2 Jan 2010 06:46:06 +0000 (01:46 -0500)]

'bup split': speed optimization for never-ending blocks.

For blocks which never got split (eg. huge endless streams of zeroes) we
would constantly scan and re-scan the same sub-blocks, making things go
really slowly. In such a bad situation, there's no point in being so careful;
just dump the *entire* input buffer to a chunk and move on. This vastly
speeds up splitting of files with lots of blank space in them, eg.
VirtualBox images.

Also add a cache for git.hash_raw() so it doesn't have to stat() the same
blob files over and over if the same blocks (especially zeroes) occur more
than once.

commit | commitdiff | tree

Avery Pennarun [Sat, 2 Jan 2010 06:45:14 +0000 (01:45 -0500)]

Fix 'bup split --bench'.

This was broken earlier and apparently didn't have a test; now it does.

commit | commitdiff | tree

Avery Pennarun [Sat, 2 Jan 2010 06:44:04 +0000 (01:44 -0500)]

Fix generated commit messages.

The first (summary) line should be shorter so that git log looks prettier.

commit | commitdiff | tree

Avery Pennarun [Sat, 2 Jan 2010 00:28:14 +0000 (19:28 -0500)]

'bup save' now generates a hierarchical set of git trees as it should.

commit | commitdiff | tree

Avery Pennarun [Fri, 1 Jan 2010 23:15:12 +0000 (18:15 -0500)]

Initial version of 'bup save'.

Saves a complete tree by recursively iterating into subdirs, and splits
large files into chunks using the same algorithm as 'bup split'.

Currently no support for special files (symlinks etc), and it generates the
resulting git tree incorrectly (by just turning / into _ in filenames).

commit | commitdiff | tree

Avery Pennarun [Fri, 1 Jan 2010 02:59:30 +0000 (21:59 -0500)]

'bup join' now takes objects on the command line and handles commitids.

It converts commitids directly into trees and cats the entire tree
recursively.

If no ids are provided on the command line, it reverts back to reading the
list of objects from stdin.

commit | commitdiff | tree

Avery Pennarun [Fri, 1 Jan 2010 02:51:12 +0000 (21:51 -0500)]

Refactor splitting functions from cmd-split.py into hashsplit.py.

Now we can split other stuff from other programs (which don't exist yet).

commit | commitdiff | tree

Avery Pennarun [Fri, 1 Jan 2010 02:28:53 +0000 (21:28 -0500)]

Oops, multi-file split forced a split between each file.

Now if you have multiple files on input, it's possible for a single
resulting blob to contain parts of more than one file.

commit | commitdiff | tree

Avery Pennarun [Fri, 1 Jan 2010 00:04:59 +0000 (19:04 -0500)]

split: name chunkfiles more carefully to prevent name changes.

This isn't perfect, but a bit of byte jitter here and there now won't cause
unnecessary filename changes.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 23:46:04 +0000 (18:46 -0500)]

'bup split' takes a list of filenames on the command line.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 23:35:55 +0000 (18:35 -0500)]

'bup split' now outputs any combination of blobs, tree, and commit.

maximum flexibility.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 23:29:34 +0000 (18:29 -0500)]

'bup split' can now update a git ref if you give it the -n option.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 23:10:02 +0000 (18:10 -0500)]

Automatically handle "--no-" prefix on long options.

Similar to how git does it.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 22:55:14 +0000 (17:55 -0500)]

'bup split' now has a -c option to generate a git commit object.

There's no way to set its parent yet, but at least this is all you need if
you want to repack.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 22:06:15 +0000 (17:06 -0500)]

Completely revamped option parsing based on git-shell-style.

This is in options.py. Also added some wvtests for option parsing stuff.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 21:43:32 +0000 (16:43 -0500)]

Import wvtestrun and wvtest.py from the wvtest.git project.

Corresponding wvtest commit is db65ff5907571a5004bb3c500efd19421cb06d1a.

commit | commitdiff | tree

Avery Pennarun [Thu, 31 Dec 2009 19:45:04 +0000 (14:45 -0500)]

Rename datagen.c to randomgen.c, to better reflect its purpose.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 23:18:35 +0000 (18:18 -0500)]

Add a '-t' option to 'bup split' and make 'bup join' support it.

This lets you generate a git "tree" object with the list of hashes in a
particular file, so you can treat the file as a directory as far as git is
concerned. And 'bup join' knows how to take a tree and concatenate it
together to reverse the operation.

Also refactored a bunch of stuff in cmd-split.py.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 22:14:21 +0000 (17:14 -0500)]

Get rid of extra CFLAGS/LDFLAGS that I don't understand anyway.

I just copied them from some other python module. But it's better to keep
things simple, I think.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 22:10:03 +0000 (17:10 -0500)]

Add a 'bup' wrapper program.

We're going to use that with some subcommands, git-style.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 21:10:21 +0000 (16:10 -0500)]

Remove old hashsplit.c, since hashsplit.py replaces it.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 09:46:30 +0000 (04:46 -0500)]

Do less work for objects that occur more than once.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 09:09:08 +0000 (04:09 -0500)]

Clean up buffering to reduce number of buffer copies.

Slight performance improvement, but not inspirational.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 08:28:36 +0000 (03:28 -0500)]

Use t# instead of et# for hashsplit parameter type.

This lets us work with any kind of buffer object, which means there's no
unnecessary data copying when coming into our module. Causes a bit of
speedup.

Also refactored the checksum code for easier experimentation.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 07:55:07 +0000 (02:55 -0500)]

Hey wow, turning on -O2 gives about a 50% speedup.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 07:33:35 +0000 (02:33 -0500)]

Add a C module to do the rolling checksum.

This is about 80x faster than the old speed (27megs/sec instead of 330k/sec)
but still quite a lot slower than the 60+megs/sec I get *without* the
checksum stuff. There are a few inefficiencies remaining, but not such easy
ones as before...

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 06:17:24 +0000 (01:17 -0500)]

hashsplit.py: print performance timings to stderr on exit.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 06:08:27 +0000 (01:08 -0500)]

datagen.c: a quick program to generate a repeatable series of bytes.

Useful for testing. Note that we *don't* see the random number generator,
so every time you generate the bytes, you get the same sequence.

This is also vastly faster than /dev/urandom, since it doesn't try to be
cryptographically secure. It generates about 200 megs/sec on my computer,
which is much faster than a disk and thus useful for testing the speed of
hashsplit.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 06:06:16 +0000 (01:06 -0500)]

hashsplit.py: less excessive logging, more suitable for speed tests.

Result of speed tests: it's slow. Almost entirely because of how slow
splitbuf() is in python (which is no surprise at all).

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 00:50:14 +0000 (19:50 -0500)]

hashsplit.py: create blob objects by hand, not with 'git hash-object'

Much quicker this way, since we never need to fork anything.

commit | commitdiff | tree

Avery Pennarun [Wed, 30 Dec 2009 00:20:35 +0000 (19:20 -0500)]

hashsplit.py is now much, much faster than before.

4.8 secs vs. 0.8 secs for testfile1.

Still vastly slower than the C version (0.17 secs including time to fork
git for each blob) but still a significant improvement.

The remaining slowness seems to be entirely from:

- running git hash-object (which we can avoid by hashing the object
ourselves)

- running the rolling checksum algorithm (which we can speed up using a C
module)

So it's looking good.

commit | commitdiff | tree

Avery Pennarun [Tue, 29 Dec 2009 19:24:50 +0000 (14:24 -0500)]

hashsplit.py: a python version of hashsplit.c

It's slow. Very slow. But hopefully it won't be too hard to optimize.

Alex' bup ("It backs things up") development repository

RSS Atom