]> arthur.barton.de Git - bup.git/log
bup.git
14 years agohashsplit.py: create blob objects by hand, not with 'git hash-object'
Avery Pennarun [Wed, 30 Dec 2009 00:50:14 +0000 (19:50 -0500)]
hashsplit.py: create blob objects by hand, not with 'git hash-object'

Much quicker this way, since we never need to fork anything.

14 years agohashsplit.py is now much, much faster than before.
Avery Pennarun [Wed, 30 Dec 2009 00:20:35 +0000 (19:20 -0500)]
hashsplit.py is now much, much faster than before.

4.8 secs vs. 0.8 secs for testfile1.

Still vastly slower than the C version (0.17 secs including time to fork
git for each blob) but still a significant improvement.

The remaining slowness seems to be entirely from:

- running git hash-object (which we can avoid by hashing the object
  ourselves)

- running the rolling checksum algorithm (which we can speed up using a C
  module)

So it's looking good.

14 years agohashsplit.py: a python version of hashsplit.c
Avery Pennarun [Tue, 29 Dec 2009 19:24:50 +0000 (14:24 -0500)]
hashsplit.py: a python version of hashsplit.c

It's slow.  Very slow.  But hopefully it won't be too hard to optimize.

14 years agoMake split condition depend on ~0, not 0.
Avery Pennarun [Tue, 29 Dec 2009 18:07:22 +0000 (13:07 -0500)]
Make split condition depend on ~0, not 0.

Otherwise we could end up splitting on one-byte blocks, which is pretty
dumb.

14 years agoReport the block size when splitting each block.
Avery Pennarun [Tue, 29 Dec 2009 18:02:03 +0000 (13:02 -0500)]
Report the block size when splitting each block.

14 years agoAdd a README
Avery Pennarun [Sun, 4 Oct 2009 02:55:42 +0000 (22:55 -0400)]
Add a README

14 years agoAdd some comments so nobody thinks I think fgetc/fputc are fast.
Avery Pennarun [Sun, 4 Oct 2009 02:33:28 +0000 (22:33 -0400)]
Add some comments so nobody thinks I think fgetc/fputc are fast.

14 years agoAdd a comment to stupidsum_add() so people don't think I'm an idiot.
Avery Pennarun [Sun, 4 Oct 2009 01:51:16 +0000 (21:51 -0400)]
Add a comment to stupidsum_add() so people don't think I'm an idiot.

Yes, I know shift-and-xor is a supremely lame algorithm.

14 years agoRename hsplit/hjoin to hashsplit/hashjoin.
Avery Pennarun [Sun, 4 Oct 2009 01:49:43 +0000 (21:49 -0400)]
Rename hsplit/hjoin to hashsplit/hashjoin.

14 years agoAdd a trivial hjoin, the reverse of hsplit.
Avery Pennarun [Sun, 4 Oct 2009 00:38:43 +0000 (20:38 -0400)]
Add a trivial hjoin, the reverse of hsplit.

14 years agoAha, fixed a bug causing the split points not to resync.
Avery Pennarun [Sun, 4 Oct 2009 00:31:41 +0000 (20:31 -0400)]
Aha, fixed a bug causing the split points not to resync.

14 years agoActually hash stuff, and add a basic 'make test'.
Avery Pennarun [Sun, 4 Oct 2009 00:06:56 +0000 (20:06 -0400)]
Actually hash stuff, and add a basic 'make test'.

Unfortunately the test fails: after the first difference, it never manages
to resync.

14 years agoExtremely cheesy initial implementation of rolling-sum-based splitting.
Avery Pennarun [Sat, 3 Oct 2009 23:48:49 +0000 (19:48 -0400)]
Extremely cheesy initial implementation of rolling-sum-based splitting.

The checksum algorithm is crap, and we don't actually generate the output
files yet, so I'm guessing it's still junk.