]>
arthur.barton.de Git - bup.git/log
summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Avery Pennarun [Wed, 30 Dec 2009 22:14:21 +0000 (17:14 -0500)]
Get rid of extra CFLAGS/LDFLAGS that I don't understand anyway.
I just copied them from some other python module. But it's better to keep
things simple, I think.
Avery Pennarun [Wed, 30 Dec 2009 22:10:03 +0000 (17:10 -0500)]
Add a 'bup' wrapper program.
We're going to use that with some subcommands, git-style.
Avery Pennarun [Wed, 30 Dec 2009 21:10:21 +0000 (16:10 -0500)]
Remove old hashsplit.c, since hashsplit.py replaces it.
Avery Pennarun [Wed, 30 Dec 2009 09:46:30 +0000 (04:46 -0500)]
Do less work for objects that occur more than once.
Avery Pennarun [Wed, 30 Dec 2009 09:09:08 +0000 (04:09 -0500)]
Clean up buffering to reduce number of buffer copies.
Slight performance improvement, but not inspirational.
Avery Pennarun [Wed, 30 Dec 2009 08:28:36 +0000 (03:28 -0500)]
Use t# instead of et# for hashsplit parameter type.
This lets us work with any kind of buffer object, which means there's no
unnecessary data copying when coming into our module. Causes a bit of
speedup.
Also refactored the checksum code for easier experimentation.
Avery Pennarun [Wed, 30 Dec 2009 07:55:07 +0000 (02:55 -0500)]
Hey wow, turning on -O2 gives about a 50% speedup.
Avery Pennarun [Wed, 30 Dec 2009 07:33:35 +0000 (02:33 -0500)]
Add a C module to do the rolling checksum.
This is about 80x faster than the old speed (27megs/sec instead of 330k/sec)
but still quite a lot slower than the 60+megs/sec I get *without* the
checksum stuff. There are a few inefficiencies remaining, but not such easy
ones as before...
Avery Pennarun [Wed, 30 Dec 2009 06:17:24 +0000 (01:17 -0500)]
hashsplit.py: print performance timings to stderr on exit.
Avery Pennarun [Wed, 30 Dec 2009 06:08:27 +0000 (01:08 -0500)]
datagen.c: a quick program to generate a repeatable series of bytes.
Useful for testing. Note that we *don't* see the random number generator,
so every time you generate the bytes, you get the same sequence.
This is also vastly faster than /dev/urandom, since it doesn't try to be
cryptographically secure. It generates about 200 megs/sec on my computer,
which is much faster than a disk and thus useful for testing the speed of
hashsplit.
Avery Pennarun [Wed, 30 Dec 2009 06:06:16 +0000 (01:06 -0500)]
hashsplit.py: less excessive logging, more suitable for speed tests.
Result of speed tests: it's slow. Almost entirely because of how slow
splitbuf() is in python (which is no surprise at all).
Avery Pennarun [Wed, 30 Dec 2009 00:50:14 +0000 (19:50 -0500)]
hashsplit.py: create blob objects by hand, not with 'git hash-object'
Much quicker this way, since we never need to fork anything.
Avery Pennarun [Wed, 30 Dec 2009 00:20:35 +0000 (19:20 -0500)]
hashsplit.py is now much, much faster than before.
4.8 secs vs. 0.8 secs for testfile1.
Still vastly slower than the C version (0.17 secs including time to fork
git for each blob) but still a significant improvement.
The remaining slowness seems to be entirely from:
- running git hash-object (which we can avoid by hashing the object
ourselves)
- running the rolling checksum algorithm (which we can speed up using a C
module)
So it's looking good.
Avery Pennarun [Tue, 29 Dec 2009 19:24:50 +0000 (14:24 -0500)]
hashsplit.py: a python version of hashsplit.c
It's slow. Very slow. But hopefully it won't be too hard to optimize.
Avery Pennarun [Tue, 29 Dec 2009 18:07:22 +0000 (13:07 -0500)]
Make split condition depend on ~0, not 0.
Otherwise we could end up splitting on one-byte blocks, which is pretty
dumb.
Avery Pennarun [Tue, 29 Dec 2009 18:02:03 +0000 (13:02 -0500)]
Report the block size when splitting each block.
Avery Pennarun [Sun, 4 Oct 2009 02:55:42 +0000 (22:55 -0400)]
Add a README
Avery Pennarun [Sun, 4 Oct 2009 02:33:28 +0000 (22:33 -0400)]
Add some comments so nobody thinks I think fgetc/fputc are fast.
Avery Pennarun [Sun, 4 Oct 2009 01:51:16 +0000 (21:51 -0400)]
Add a comment to stupidsum_add() so people don't think I'm an idiot.
Yes, I know shift-and-xor is a supremely lame algorithm.
Avery Pennarun [Sun, 4 Oct 2009 01:49:43 +0000 (21:49 -0400)]
Rename hsplit/hjoin to hashsplit/hashjoin.
Avery Pennarun [Sun, 4 Oct 2009 00:38:43 +0000 (20:38 -0400)]
Add a trivial hjoin, the reverse of hsplit.
Avery Pennarun [Sun, 4 Oct 2009 00:31:41 +0000 (20:31 -0400)]
Aha, fixed a bug causing the split points not to resync.
Avery Pennarun [Sun, 4 Oct 2009 00:06:56 +0000 (20:06 -0400)]
Actually hash stuff, and add a basic 'make test'.
Unfortunately the test fails: after the first difference, it never manages
to resync.
Avery Pennarun [Sat, 3 Oct 2009 23:48:49 +0000 (19:48 -0400)]
Extremely cheesy initial implementation of rolling-sum-based splitting.
The checksum algorithm is crap, and we don't actually generate the output
files yet, so I'm guessing it's still junk.