]>
arthur.barton.de Git - bup.git/log
summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Avery Pennarun [Thu, 31 Dec 2009 23:35:55 +0000 (18:35 -0500)]
'bup split' now outputs any combination of blobs, tree, and commit.
maximum flexibility.
Avery Pennarun [Thu, 31 Dec 2009 23:29:34 +0000 (18:29 -0500)]
'bup split' can now update a git ref if you give it the -n option.
Avery Pennarun [Thu, 31 Dec 2009 23:10:02 +0000 (18:10 -0500)]
Automatically handle "--no-" prefix on long options.
Similar to how git does it.
Avery Pennarun [Thu, 31 Dec 2009 22:55:14 +0000 (17:55 -0500)]
'bup split' now has a -c option to generate a git commit object.
There's no way to set its parent yet, but at least this is all you need if
you want to repack.
Avery Pennarun [Thu, 31 Dec 2009 22:06:15 +0000 (17:06 -0500)]
Completely revamped option parsing based on git-shell-style.
This is in options.py. Also added some wvtests for option parsing stuff.
Avery Pennarun [Thu, 31 Dec 2009 21:43:32 +0000 (16:43 -0500)]
Import wvtestrun and wvtest.py from the wvtest.git project.
Corresponding wvtest commit is
db65ff5907571a5004bb3c500efd19421cb06d1a .
Avery Pennarun [Thu, 31 Dec 2009 19:45:04 +0000 (14:45 -0500)]
Rename datagen.c to randomgen.c, to better reflect its purpose.
Avery Pennarun [Wed, 30 Dec 2009 23:18:35 +0000 (18:18 -0500)]
Add a '-t' option to 'bup split' and make 'bup join' support it.
This lets you generate a git "tree" object with the list of hashes in a
particular file, so you can treat the file as a directory as far as git is
concerned. And 'bup join' knows how to take a tree and concatenate it
together to reverse the operation.
Also refactored a bunch of stuff in cmd-split.py.
Avery Pennarun [Wed, 30 Dec 2009 22:14:21 +0000 (17:14 -0500)]
Get rid of extra CFLAGS/LDFLAGS that I don't understand anyway.
I just copied them from some other python module. But it's better to keep
things simple, I think.
Avery Pennarun [Wed, 30 Dec 2009 22:10:03 +0000 (17:10 -0500)]
Add a 'bup' wrapper program.
We're going to use that with some subcommands, git-style.
Avery Pennarun [Wed, 30 Dec 2009 21:10:21 +0000 (16:10 -0500)]
Remove old hashsplit.c, since hashsplit.py replaces it.
Avery Pennarun [Wed, 30 Dec 2009 09:46:30 +0000 (04:46 -0500)]
Do less work for objects that occur more than once.
Avery Pennarun [Wed, 30 Dec 2009 09:09:08 +0000 (04:09 -0500)]
Clean up buffering to reduce number of buffer copies.
Slight performance improvement, but not inspirational.
Avery Pennarun [Wed, 30 Dec 2009 08:28:36 +0000 (03:28 -0500)]
Use t# instead of et# for hashsplit parameter type.
This lets us work with any kind of buffer object, which means there's no
unnecessary data copying when coming into our module. Causes a bit of
speedup.
Also refactored the checksum code for easier experimentation.
Avery Pennarun [Wed, 30 Dec 2009 07:55:07 +0000 (02:55 -0500)]
Hey wow, turning on -O2 gives about a 50% speedup.
Avery Pennarun [Wed, 30 Dec 2009 07:33:35 +0000 (02:33 -0500)]
Add a C module to do the rolling checksum.
This is about 80x faster than the old speed (27megs/sec instead of 330k/sec)
but still quite a lot slower than the 60+megs/sec I get *without* the
checksum stuff. There are a few inefficiencies remaining, but not such easy
ones as before...
Avery Pennarun [Wed, 30 Dec 2009 06:17:24 +0000 (01:17 -0500)]
hashsplit.py: print performance timings to stderr on exit.
Avery Pennarun [Wed, 30 Dec 2009 06:08:27 +0000 (01:08 -0500)]
datagen.c: a quick program to generate a repeatable series of bytes.
Useful for testing. Note that we *don't* see the random number generator,
so every time you generate the bytes, you get the same sequence.
This is also vastly faster than /dev/urandom, since it doesn't try to be
cryptographically secure. It generates about 200 megs/sec on my computer,
which is much faster than a disk and thus useful for testing the speed of
hashsplit.
Avery Pennarun [Wed, 30 Dec 2009 06:06:16 +0000 (01:06 -0500)]
hashsplit.py: less excessive logging, more suitable for speed tests.
Result of speed tests: it's slow. Almost entirely because of how slow
splitbuf() is in python (which is no surprise at all).
Avery Pennarun [Wed, 30 Dec 2009 00:50:14 +0000 (19:50 -0500)]
hashsplit.py: create blob objects by hand, not with 'git hash-object'
Much quicker this way, since we never need to fork anything.
Avery Pennarun [Wed, 30 Dec 2009 00:20:35 +0000 (19:20 -0500)]
hashsplit.py is now much, much faster than before.
4.8 secs vs. 0.8 secs for testfile1.
Still vastly slower than the C version (0.17 secs including time to fork
git for each blob) but still a significant improvement.
The remaining slowness seems to be entirely from:
- running git hash-object (which we can avoid by hashing the object
ourselves)
- running the rolling checksum algorithm (which we can speed up using a C
module)
So it's looking good.
Avery Pennarun [Tue, 29 Dec 2009 19:24:50 +0000 (14:24 -0500)]
hashsplit.py: a python version of hashsplit.c
It's slow. Very slow. But hopefully it won't be too hard to optimize.
Avery Pennarun [Tue, 29 Dec 2009 18:07:22 +0000 (13:07 -0500)]
Make split condition depend on ~0, not 0.
Otherwise we could end up splitting on one-byte blocks, which is pretty
dumb.
Avery Pennarun [Tue, 29 Dec 2009 18:02:03 +0000 (13:02 -0500)]
Report the block size when splitting each block.
Avery Pennarun [Sun, 4 Oct 2009 02:55:42 +0000 (22:55 -0400)]
Add a README
Avery Pennarun [Sun, 4 Oct 2009 02:33:28 +0000 (22:33 -0400)]
Add some comments so nobody thinks I think fgetc/fputc are fast.
Avery Pennarun [Sun, 4 Oct 2009 01:51:16 +0000 (21:51 -0400)]
Add a comment to stupidsum_add() so people don't think I'm an idiot.
Yes, I know shift-and-xor is a supremely lame algorithm.
Avery Pennarun [Sun, 4 Oct 2009 01:49:43 +0000 (21:49 -0400)]
Rename hsplit/hjoin to hashsplit/hashjoin.
Avery Pennarun [Sun, 4 Oct 2009 00:38:43 +0000 (20:38 -0400)]
Add a trivial hjoin, the reverse of hsplit.
Avery Pennarun [Sun, 4 Oct 2009 00:31:41 +0000 (20:31 -0400)]
Aha, fixed a bug causing the split points not to resync.
Avery Pennarun [Sun, 4 Oct 2009 00:06:56 +0000 (20:06 -0400)]
Actually hash stuff, and add a basic 'make test'.
Unfortunately the test fails: after the first difference, it never manages
to resync.
Avery Pennarun [Sat, 3 Oct 2009 23:48:49 +0000 (19:48 -0400)]
Extremely cheesy initial implementation of rolling-sum-based splitting.
The checksum algorithm is crap, and we don't actually generate the output
files yet, so I'm guessing it's still junk.