X-Git-Url: https://arthur.barton.de/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=README;h=42061c01a1c70097d1e4579f29a5adf40abdec95;hb=ea80387c098944bcfe17f63ea156b31cea27dce8;hp=d586018e2c4176f5e630275bd85107b8dc69cb81;hpb=db8aa7cc260711aadc6e9892d3653cfdc0360a86;p=bup.git diff --git a/README b/README deleted file mode 100644 index d586018..0000000 --- a/README +++ /dev/null @@ -1,302 +0,0 @@ - -bup 0.04: It backs things up -============================ - -bup is a program that backs things up. It's short for "backup." Can you -believe that nobody else has named an open source program "bup" after all -this time? Me neither. - -Despite its unassuming name, bup is pretty cool. To give you an idea of -just how cool it is, I wrote you this poem: - - Bup is teh awesome - What rhymes with awesome? - I guess maybe possum - But that's irrelevant. - -Hmm. Did that help? Maybe prose is more useful after all. - - -Reasons bup is awesome ----------------------- - -bup has a few advantages over other backup software: - - - It uses a rolling checksum algorithm (similar to rsync) to split large - files into chunks. The most useful result of this is you can backup huge - virtual machine (VM) disk images, databases, and XML files incrementally, - even though they're typically all in one huge file, and not use tons of - disk space for multiple versions. - - - It uses the packfile format from git (the open source version control - system), so you can access the stored data even if you don't like bup's - user interface. - - - Unlike git, it writes packfiles *directly* (instead of having a separate - garbage collection / repacking stage) so it's fast even with gratuitously - huge amounts of data. - - - Data is "automagically" shared between incremental backups without having - to know which backup is based on which other one - even if the backups - are made from two different computers that don't even know about each - other. You just tell bup to back stuff up, and it saves only the minimum - amount of data needed. - - - Even when a backup is incremental, you don't have to worry about - restoring the full backup, then each of the incrementals in turn; an - incremental backup *acts* as if it's a full backup, it just takes less - disk space. - - - It's written in python (with some C parts to make it faster) so it's easy - for you to extend and maintain. - - -Reasons you might want to avoid bup ------------------------------------ - - - This is a very early version. Therefore it will most probably not work - for you, but we don't know why. It is also missing some - probably-critical features. - - - It requires python 2.5, a C compiler, and an installed git version >= 1.5.2. - - - It currently only works on Linux, MacOS X 10.5, or Windows (with Cygwin). - Patches to support other platforms are welcome. - - - It has almost no documentation. Not even a man page! This file is all - you get for now. - - -Getting started ---------------- - - - check out the bup source code using git: - - git clone git://github.com/apenwarr/bup - - - install the python 2.5 development libraries. On Debian or Ubuntu, this - is: - apt-get install python2.5-dev - - - build the python module and symlinks: - - make - - - run the tests: - - make test - - (The tests should pass. If they don't pass for you, stop here and send - me an email.) - - - Try making a local backup as a tar file: - - tar -cvf - /etc | bup split -n local-etc -vv - - - Try restoring your backup tarball: - - bup join local-etc | tar -tf - - - - Look at how much disk space your backup took: - - du -s ~/.bup - - - Make another backup (which should be mostly identical to the last one; - notice that you don't have to *specify* that this backup is incremental, - it just saves space automatically): - - tar -cvf - /etc | bup split -n local-etc -vv - - - Look how little extra space your second backup used on top of the first: - - du -s ~/.bup - - - Restore your old backup again (the ~1 is git notation for "one older than - the most recent"): - - bup join local-etc~1 | tar -tf - - - - get a list of your previous backups: - - GIT_DIR=~/.bup git log local-etc - - - make a backup on a remote server (which must already have the 'bup' command - somewhere in the PATH, and be accessible via ssh; make sure to replace - SERVERNAME with the actual hostname of your server): - - tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv - - - try restoring the remote backup tarball: - - bup join -r SERVERNAME: local-etc | tar -tf - - - - try using the new (slightly experimental) 'bup index' and 'bup save' - style backups, which bypass 'tar' but have some missing features (see - "Things that are stupid" below): - - bup index -uv /etc - bup save -n local-etc /etc - - - do it again and see how fast an incremental backup can be: - - bup index -uv /etc - bup save -n local-etc /etc - - (You can also use the "-r SERVERNAME:" option to 'bup save', just like - with 'bup split' and 'bup join'. The index itself is always local, - so you don't need -r there.) - -That's all there is to it! - - -How it works ------------- - -Basic storage: - -bup stores its data in a git-formatted repository. Unfortunately, git -itself doesn't actually behave very well for bup's use case (huge numbers of -files, files with huge sizes, retaining file permissions/ownership are -important), so we mostly don't use git's *code* except for a few helper -programs. For example, bup has its own git packfile writer written in -python. - -Basically, 'bup split' reads the data on stdin (or from files specified on -the command line), breaks it into chunks using a rolling checksum (similar to -rsync), and saves those chunks into a new git packfile. There is one git -packfile per backup. - -When deciding whether to write a particular chunk into the new packfile, bup -first checks all the other packfiles that exist to see if they already have that -chunk. If they do, the chunk is skipped. - -git packs come in two parts: the pack itself (*.pack) and the index (*.idx). -The index is pretty small, and contains a list of all the objects in the -pack. Thus, when generating a remote backup, we don't have to have a copy -of the packfiles from the remote server: the local end just downloads a copy -of the server's *index* files, and compares objects against those when -generating the new pack, which it sends directly to the server. - -The "-n" option to 'bup split' and 'bup save' is the name of the backup you -want to create, but it's actually implemented as a git branch. So you can -do cute things like checkout a particular branch using git, and receive a -bunch of chunk files corresponding to the file you split. - -If you use '-b' or '-t' or '-c' instead of '-n', bup split will output a -list of blobs, a tree containing that list of blobs, or a commit containing -that tree, respectively, to stdout. You can use this to construct your own -scripts that do something with those values. - -The bup index: - -'bup index' walks through your filesystem and updates a file (whose name is, -by default, ~/.bup/bupindex) to contain the name, attributes, and an -optional git SHA1 (blob id) of each file and directory. - -'bup save' basically just runs the equivalent of 'bup split' a whole bunch -of times, once per file in the index, and assembles a git tree -that contains all the resulting objects. Among other things, that makes -'git diff' much more useful (compared to splitting a tarball, which is -essentially a big binary blob). However, since bup splits large files into -smaller chunks, the resulting tree structure doesn't *exactly* correspond to -what git itself would have stored. Also, the tree format used by 'bup save' -will probably change in the future to support storing file ownership, more -complex file permissions, and so on. - -If a file has previously been written by 'bup save', then its git blob/tree -id is stored in the index. This lets 'bup save' avoid reading that file to -produce future incremental backups, which means it can go *very* fast unless -a lot of files have changed. - - -Things that are stupid for now but which we'll fix later --------------------------------------------------------- - -Help with any of these problems, or others, is very, very welcome. Let me -know if you'd like to help. Maybe we can start a mailing list. - - - 'bup save' doesn't know about file metadata. - - That means we aren't saving file attributes, mtimes, ownership, hard - links, MacOS resource forks, etc. Clearly this needs to be improved. - - - There's no 'bup restore' yet. - - 'bup save' saves files in the standard git 'tree of blobs' format, so you - could then "restore" the files using something like 'git checkout'. But - that's a git command, not a bup command, so it's hard to explain and - doesn't support retrieving objects from a remote bup server without first - fetching and packing an entire (possibly huge) pack, which could be very - slow. Also, like 'bup save', you would need extra features in order to - properly restore file metadata. And files that bup has split into - chunks would need to be recombined somehow. - - - 'bup index' is slower than it should be. - - It's still rather fast: it can iterate through all the filenames on my - 600,000 file filesystem in a few seconds. But sometimes you just want to - change a filename or two, so this is needlessly slow. There should be - a way to binary search through the file list rather than always going - through it sequentially. And if you only add a couple of filenames, - there's no need to rewrite the entire index; just leave the new files - in a second "extra index" file or something. - - - bup could use inotify for *really* efficient incremental backups. - - You could even have your system doing "continuous" backups: whenever a - file changes, we immediately send an image of it to the server. We could - give the continuous-backup process a really low CPU and I/O priority so - you wouldn't even know it was running. - - - bup currently has no features that prune away *old* backups. - - Because of the way the packfile system works, backups become "entangled" - in weird ways and it's not actually possible to delete one pack - (corresponding approximately to one backup) without risking screwing up - other backups. - - git itself has lots of ways of optimizing this sort of thing, but its - methods aren't really applicable here; bup packfiles are just too huge. - We'll have to do it in a totally different way. There are lots of - options. For now: make sure you've got lots of disk space :) - - - bup doesn't ever validate existing backups/packs to ensure they're - correct. - - This would be easy to implement (given that git uses hashes and CRCs all - over the place), but nobody has implemented it. For now, you could try - doing a test restore of your tarball; doing so should trigger git's error - handling if any of the objects are corrupted. 'git fsck' would - theoreticaly work too, but it's too slow for huge backups. - - - bup has never been tested on anything but Linux, MacOS, and Linux+Cygwin. - - There's nothing that makes it *inherently* non-portable, though, so - that's mostly a matter of someone putting in some effort. (For a - "native" Windows port, the most annoying thing is the absence of ssh in - a default Windows installation.) - - - bup has no GUI. Actually, that's not stupid, but you might consider it - a limitation. There are a bunch of Linux GUI backup programs; someday - I expect someone will adapt one of them to use bup. - - -How you can help ----------------- - -bup is a work in progress and there are many ways it can still be improved. -If you'd like to contribute patches, ideas, or bug reports, please join the -bup mailing list. - -You can find the mailing list archives here: - - http://groups.google.com/group/bup-list - -and you can subscribe by sending a message to: - - bup-list+subscribe@googlegroups.com - -Have fun, - -Avery -January 2010 diff --git a/README b/README new file mode 120000 index 0000000..42061c0 --- /dev/null +++ b/README @@ -0,0 +1 @@ +README.md \ No newline at end of file