Minor README format changes to make it markdown-compliant.

author Avery Pennarun <apenwarr@gmail.com>

Sun, 24 Jan 2010 01:54:37 +0000 (20:54 -0500)

committer Avery Pennarun <apenwarr@gmail.com>

Sun, 24 Jan 2010 02:13:39 +0000 (21:13 -0500)
author Avery Pennarun <apenwarr@gmail.com>
Sun, 24 Jan 2010 01:54:37 +0000 (20:54 -0500)
committer Avery Pennarun <apenwarr@gmail.com>
Sun, 24 Jan 2010 02:13:39 +0000 (21:13 -0500)
diff --git a/README b/README

deleted file mode 100644 (file)

index d586018..0000000
--- a/README
+++ /dev/null
@@ -1,302 +0,0 @@
-
-bup 0.04: It backs things up
-============================
-
-bup is a program that backs things up.  It's short for "backup." Can you
-believe that nobody else has named an open source program "bup" after all
-this time?  Me neither.
-
-Despite its unassuming name, bup is pretty cool.  To give you an idea of
-just how cool it is, I wrote you this poem:
-
-                             Bup is teh awesome
-                          What rhymes with awesome?
-                            I guess maybe possum
-                           But that's irrelevant.
-                       
-Hmm.  Did that help?  Maybe prose is more useful after all.
-
-
-Reasons bup is awesome
-----------------------
-
-bup has a few advantages over other backup software:
-
- - It uses a rolling checksum algorithm (similar to rsync) to split large
-   files into chunks.  The most useful result of this is you can backup huge
-   virtual machine (VM) disk images, databases, and XML files incrementally,
-   even though they're typically all in one huge file, and not use tons of
-   disk space for multiple versions.
-   
- - It uses the packfile format from git (the open source version control
-   system), so you can access the stored data even if you don't like bup's
-   user interface.
-   
- - Unlike git, it writes packfiles *directly* (instead of having a separate
-   garbage collection / repacking stage) so it's fast even with gratuitously
-   huge amounts of data.
-   
- - Data is "automagically" shared between incremental backups without having
-   to know which backup is based on which other one - even if the backups
-   are made from two different computers that don't even know about each
-   other.  You just tell bup to back stuff up, and it saves only the minimum
-   amount of data needed.
-   
- - Even when a backup is incremental, you don't have to worry about
-   restoring the full backup, then each of the incrementals in turn; an
-   incremental backup *acts* as if it's a full backup, it just takes less
-   disk space.
-   
- - It's written in python (with some C parts to make it faster) so it's easy
-   for you to extend and maintain.
-
-
-Reasons you might want to avoid bup
------------------------------------
-
- - This is a very early version. Therefore it will most probably not work
-   for you, but we don't know why.  It is also missing some
-   probably-critical features.
-   
- - It requires python 2.5, a C compiler, and an installed git version >= 1.5.2.
- 
- - It currently only works on Linux, MacOS X 10.5, or Windows (with Cygwin).
-   Patches to support other platforms are welcome.
- 
- - It has almost no documentation.  Not even a man page!  This file is all
-   you get for now.
-   
-   
-Getting started
----------------
-
- - check out the bup source code using git:
- 
-       git clone git://github.com/apenwarr/bup
-
- - install the python 2.5 development libraries.  On Debian or Ubuntu, this
-   is:
-       apt-get install python2.5-dev
-       
- - build the python module and symlinks:
- 
-       make
-       
- - run the tests:
- 
-       make test
-       
-   (The tests should pass.  If they don't pass for you, stop here and send
-   me an email.)
-   
- - Try making a local backup as a tar file:
- 
-       tar -cvf - /etc | bup split -n local-etc -vv
-       
- - Try restoring your backup tarball:
- 
-       bup join local-etc | tar -tf -
-       
- - Look at how much disk space your backup took:
- 
-       du -s ~/.bup
-       
- - Make another backup (which should be mostly identical to the last one;
-   notice that you don't have to *specify* that this backup is incremental,
-   it just saves space automatically):
- 
-       tar -cvf - /etc | bup split -n local-etc -vv
-       
- - Look how little extra space your second backup used on top of the first:
- 
-       du -s ~/.bup
-       
- - Restore your old backup again (the ~1 is git notation for "one older than
-   the most recent"):
-   
-       bup join local-etc~1 | tar -tf -
- 
- - get a list of your previous backups:
- 
-       GIT_DIR=~/.bup git log local-etc
-       
- - make a backup on a remote server (which must already have the 'bup' command
-   somewhere in the PATH, and be accessible via ssh; make sure to replace
-   SERVERNAME with the actual hostname of your server):
-   
-       tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv
- 
- - try restoring the remote backup tarball:
- 
-       bup join -r SERVERNAME: local-etc | tar -tf -
-       
- - try using the new (slightly experimental) 'bup index' and 'bup save'
-   style backups, which bypass 'tar' but have some missing features (see
-   "Things that are stupid" below):
-       
-       bup index -uv /etc
-       bup save -n local-etc /etc
-       
- - do it again and see how fast an incremental backup can be:
- 
-       bup index -uv /etc
-       bup save -n local-etc /etc
-       
-   (You can also use the "-r SERVERNAME:" option to 'bup save', just like
-    with 'bup split' and 'bup join'.  The index itself is always local,
-    so you don't need -r there.)
-       
-That's all there is to it!
-
-
-How it works
-------------
-
-Basic storage:
-
-bup stores its data in a git-formatted repository.  Unfortunately, git
-itself doesn't actually behave very well for bup's use case (huge numbers of
-files, files with huge sizes, retaining file permissions/ownership are
-important), so we mostly don't use git's *code* except for a few helper
-programs.  For example, bup has its own git packfile writer written in
-python.
-
-Basically, 'bup split' reads the data on stdin (or from files specified on
-the command line), breaks it into chunks using a rolling checksum (similar to
-rsync), and saves those chunks into a new git packfile.  There is one git
-packfile per backup.
-
-When deciding whether to write a particular chunk into the new packfile, bup
-first checks all the other packfiles that exist to see if they already have that
-chunk.  If they do, the chunk is skipped.
-
-git packs come in two parts: the pack itself (*.pack) and the index (*.idx).
-The index is pretty small, and contains a list of all the objects in the
-pack.  Thus, when generating a remote backup, we don't have to have a copy
-of the packfiles from the remote server: the local end just downloads a copy
-of the server's *index* files, and compares objects against those when
-generating the new pack, which it sends directly to the server.
-
-The "-n" option to 'bup split' and 'bup save' is the name of the backup you
-want to create, but it's actually implemented as a git branch.  So you can
-do cute things like checkout a particular branch using git, and receive a
-bunch of chunk files corresponding to the file you split.
-
-If you use '-b' or '-t' or '-c' instead of '-n', bup split will output a
-list of blobs, a tree containing that list of blobs, or a commit containing
-that tree, respectively, to stdout.  You can use this to construct your own
-scripts that do something with those values.
-
-The bup index:
-
-'bup index' walks through your filesystem and updates a file (whose name is,
-by default, ~/.bup/bupindex) to contain the name, attributes, and an
-optional git SHA1 (blob id) of each file and directory.
-
-'bup save' basically just runs the equivalent of 'bup split' a whole bunch
-of times, once per file in the index, and assembles a git tree
-that contains all the resulting objects.  Among other things, that makes
-'git diff' much more useful (compared to splitting a tarball, which is
-essentially a big binary blob).  However, since bup splits large files into
-smaller chunks, the resulting tree structure doesn't *exactly* correspond to
-what git itself would have stored.  Also, the tree format used by 'bup save'
-will probably change in the future to support storing file ownership, more
-complex file permissions, and so on.
-
-If a file has previously been written by 'bup save', then its git blob/tree
-id is stored in the index.  This lets 'bup save' avoid reading that file to
-produce future incremental backups, which means it can go *very* fast unless
-a lot of files have changed.
-
- 
-Things that are stupid for now but which we'll fix later
---------------------------------------------------------
-
-Help with any of these problems, or others, is very, very welcome.  Let me
-know if you'd like to help.  Maybe we can start a mailing list.
-
- - 'bup save' doesn't know about file metadata.
- 
-   That means we aren't saving file attributes, mtimes, ownership, hard
-   links, MacOS resource forks, etc.  Clearly this needs to be improved.
-
- - There's no 'bup restore' yet.
- 
-   'bup save' saves files in the standard git 'tree of blobs' format, so you
-   could then "restore" the files using something like 'git checkout'.  But
-   that's a git command, not a bup command, so it's hard to explain and
-   doesn't support retrieving objects from a remote bup server without first
-   fetching and packing an entire (possibly huge) pack, which could be very
-   slow.  Also, like 'bup save', you would need extra features in order to
-   properly restore file metadata.  And files that bup has split into
-   chunks would need to be recombined somehow.
-   
- - 'bup index' is slower than it should be.
- 
-   It's still rather fast: it can iterate through all the filenames on my
-   600,000 file filesystem in a few seconds.  But sometimes you just want to
-   change a filename or two, so this is needlessly slow.  There should be
-   a way to binary search through the file list rather than always going
-   through it sequentially.  And if you only add a couple of filenames,
-   there's no need to rewrite the entire index; just leave the new files
-   in a second "extra index" file or something.
-   
- - bup could use inotify for *really* efficient incremental backups.
-
-   You could even have your system doing "continuous" backups: whenever a
-   file changes, we immediately send an image of it to the server.  We could
-   give the continuous-backup process a really low CPU and I/O priority so
-   you wouldn't even know it was running.
-
- - bup currently has no features that prune away *old* backups.
- 
-   Because of the way the packfile system works, backups become "entangled"
-   in weird ways and it's not actually possible to delete one pack
-   (corresponding approximately to one backup) without risking screwing up
-   other backups.
-   
-   git itself has lots of ways of optimizing this sort of thing, but its
-   methods aren't really applicable here; bup packfiles are just too huge.
-   We'll have to do it in a totally different way.  There are lots of
-   options.  For now: make sure you've got lots of disk space :)
-
- - bup doesn't ever validate existing backups/packs to ensure they're
-   correct.
-   
-   This would be easy to implement (given that git uses hashes and CRCs all
-   over the place), but nobody has implemented it.  For now, you could try
-   doing a test restore of your tarball; doing so should trigger git's error
-   handling if any of the objects are corrupted.  'git fsck' would
-   theoreticaly work too, but it's too slow for huge backups.
-
- - bup has never been tested on anything but Linux, MacOS, and Linux+Cygwin.
- 
-   There's nothing that makes it *inherently* non-portable, though, so
-   that's mostly a matter of someone putting in some effort.  (For a
-   "native" Windows port, the most annoying thing is the absence of ssh in
-   a default Windows installation.)
-   
- - bup has no GUI.  Actually, that's not stupid, but you might consider it
-   a limitation.  There are a bunch of Linux GUI backup programs; someday
-   I expect someone will adapt one of them to use bup.
-
-
-How you can help
-----------------
-
-bup is a work in progress and there are many ways it can still be improved.
-If you'd like to contribute patches, ideas, or bug reports, please join the
-bup mailing list.
-
-You can find the mailing list archives here:
-
-       http://groups.google.com/group/bup-list
-       
-and you can subscribe by sending a message to:
-
-       bup-list+subscribe@googlegroups.com
-
-Have fun,
-
-Avery
-January 2010
diff --git a/README.md b/README.md

new file mode 100644 (file)

index 0000000..077645f
--- /dev/null
+++ b/README.md
@@ -0,0 +1,302 @@
+
+bup 0.04: It backs things up
+============================
+
+bup is a program that backs things up.  It's short for "backup." Can you
+believe that nobody else has named an open source program "bup" after all
+this time?  Me neither.
+
+Despite its unassuming name, bup is pretty cool.  To give you an idea of
+just how cool it is, I wrote you this poem:
+
+                             Bup is teh awesome
+                          What rhymes with awesome?
+                            I guess maybe possum
+                           But that's irrelevant.
+                       
+Hmm.  Did that help?  Maybe prose is more useful after all.
+
+
+Reasons bup is awesome
+----------------------
+
+bup has a few advantages over other backup software:
+
+ - It uses a rolling checksum algorithm (similar to rsync) to split large
+   files into chunks.  The most useful result of this is you can backup huge
+   virtual machine (VM) disk images, databases, and XML files incrementally,
+   even though they're typically all in one huge file, and not use tons of
+   disk space for multiple versions.
+   
+ - It uses the packfile format from git (the open source version control
+   system), so you can access the stored data even if you don't like bup's
+   user interface.
+   
+ - Unlike git, it writes packfiles *directly* (instead of having a separate
+   garbage collection / repacking stage) so it's fast even with gratuitously
+   huge amounts of data.
+   
+ - Data is "automagically" shared between incremental backups without having
+   to know which backup is based on which other one - even if the backups
+   are made from two different computers that don't even know about each
+   other.  You just tell bup to back stuff up, and it saves only the minimum
+   amount of data needed.
+   
+ - Even when a backup is incremental, you don't have to worry about
+   restoring the full backup, then each of the incrementals in turn; an
+   incremental backup *acts* as if it's a full backup, it just takes less
+   disk space.
+   
+ - It's written in python (with some C parts to make it faster) so it's easy
+   for you to extend and maintain.
+
+
+Reasons you might want to avoid bup
+-----------------------------------
+
+ - This is a very early version. Therefore it will most probably not work
+   for you, but we don't know why.  It is also missing some
+   probably-critical features.
+   
+ - It requires python 2.5, a C compiler, and an installed git version >= 1.5.2.
+ 
+ - It currently only works on Linux, MacOS X 10.5, or Windows (with Cygwin).
+   Patches to support other platforms are welcome.
+ 
+ - It has almost no documentation.  Not even a man page!  This file is all
+   you get for now.
+   
+   
+Getting started
+---------------
+
+ - check out the bup source code using git:
+ 
+        git clone git://github.com/apenwarr/bup
+
+ - install the python 2.5 development libraries.  On Debian or Ubuntu, this
+   is:
+        apt-get install python2.5-dev
+       
+ - build the python module and symlinks:
+ 
+        make
+       
+ - run the tests:
+ 
+        make test
+       
+    (The tests should pass.  If they don't pass for you, stop here and send
+    me an email.)
+   
+ - Try making a local backup as a tar file:
+ 
+        tar -cvf - /etc | bup split -n local-etc -vv
+       
+ - Try restoring your backup tarball:
+ 
+        bup join local-etc | tar -tf -
+       
+ - Look at how much disk space your backup took:
+ 
+        du -s ~/.bup
+       
+ - Make another backup (which should be mostly identical to the last one;
+   notice that you don't have to *specify* that this backup is incremental,
+   it just saves space automatically):
+ 
+        tar -cvf - /etc | bup split -n local-etc -vv
+       
+ - Look how little extra space your second backup used on top of the first:
+ 
+       du -s ~/.bup
+       
+ - Restore your old backup again (the ~1 is git notation for "one older than
+   the most recent"):
+   
+        bup join local-etc~1 | tar -tf -
+ 
+ - get a list of your previous backups:
+ 
+        GIT_DIR=~/.bup git log local-etc
+       
+ - make a backup on a remote server (which must already have the 'bup' command
+   somewhere in the PATH, and be accessible via ssh; make sure to replace
+   SERVERNAME with the actual hostname of your server):
+   
+        tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv
+ 
+ - try restoring the remote backup tarball:
+ 
+        bup join -r SERVERNAME: local-etc | tar -tf -
+       
+ - try using the new (slightly experimental) 'bup index' and 'bup save'
+   style backups, which bypass 'tar' but have some missing features (see
+   "Things that are stupid" below):
+       
+        bup index -uv /etc
+        bup save -n local-etc /etc
+       
+ - do it again and see how fast an incremental backup can be:
+ 
+        bup index -uv /etc
+        bup save -n local-etc /etc
+       
+    (You can also use the "-r SERVERNAME:" option to 'bup save', just like
+     with 'bup split' and 'bup join'.  The index itself is always local,
+     so you don't need -r there.)
+       
+That's all there is to it!
+
+
+How it works
+------------
+
+Basic storage:
+
+bup stores its data in a git-formatted repository.  Unfortunately, git
+itself doesn't actually behave very well for bup's use case (huge numbers of
+files, files with huge sizes, retaining file permissions/ownership are
+important), so we mostly don't use git's *code* except for a few helper
+programs.  For example, bup has its own git packfile writer written in
+python.
+
+Basically, 'bup split' reads the data on stdin (or from files specified on
+the command line), breaks it into chunks using a rolling checksum (similar to
+rsync), and saves those chunks into a new git packfile.  There is one git
+packfile per backup.
+
+When deciding whether to write a particular chunk into the new packfile, bup
+first checks all the other packfiles that exist to see if they already have that
+chunk.  If they do, the chunk is skipped.
+
+git packs come in two parts: the pack itself (*.pack) and the index (*.idx).
+The index is pretty small, and contains a list of all the objects in the
+pack.  Thus, when generating a remote backup, we don't have to have a copy
+of the packfiles from the remote server: the local end just downloads a copy
+of the server's *index* files, and compares objects against those when
+generating the new pack, which it sends directly to the server.
+
+The "-n" option to 'bup split' and 'bup save' is the name of the backup you
+want to create, but it's actually implemented as a git branch.  So you can
+do cute things like checkout a particular branch using git, and receive a
+bunch of chunk files corresponding to the file you split.
+
+If you use '-b' or '-t' or '-c' instead of '-n', bup split will output a
+list of blobs, a tree containing that list of blobs, or a commit containing
+that tree, respectively, to stdout.  You can use this to construct your own
+scripts that do something with those values.
+
+The bup index:
+
+'bup index' walks through your filesystem and updates a file (whose name is,
+by default, ~/.bup/bupindex) to contain the name, attributes, and an
+optional git SHA1 (blob id) of each file and directory.
+
+'bup save' basically just runs the equivalent of 'bup split' a whole bunch
+of times, once per file in the index, and assembles a git tree
+that contains all the resulting objects.  Among other things, that makes
+'git diff' much more useful (compared to splitting a tarball, which is
+essentially a big binary blob).  However, since bup splits large files into
+smaller chunks, the resulting tree structure doesn't *exactly* correspond to
+what git itself would have stored.  Also, the tree format used by 'bup save'
+will probably change in the future to support storing file ownership, more
+complex file permissions, and so on.
+
+If a file has previously been written by 'bup save', then its git blob/tree
+id is stored in the index.  This lets 'bup save' avoid reading that file to
+produce future incremental backups, which means it can go *very* fast unless
+a lot of files have changed.
+
+ 
+Things that are stupid for now but which we'll fix later
+--------------------------------------------------------
+
+Help with any of these problems, or others, is very, very welcome.  Let me
+know if you'd like to help.  Maybe we can start a mailing list.
+
+ - 'bup save' doesn't know about file metadata.
+ 
+    That means we aren't saving file attributes, mtimes, ownership, hard
+    links, MacOS resource forks, etc.  Clearly this needs to be improved.
+
+ - There's no 'bup restore' yet.
+ 
+    'bup save' saves files in the standard git 'tree of blobs' format, so you
+    could then "restore" the files using something like 'git checkout'.  But
+    that's a git command, not a bup command, so it's hard to explain and
+    doesn't support retrieving objects from a remote bup server without first
+    fetching and packing an entire (possibly huge) pack, which could be very
+    slow.  Also, like 'bup save', you would need extra features in order to
+    properly restore file metadata.  And files that bup has split into
+    chunks would need to be recombined somehow.
+   
+ - 'bup index' is slower than it should be.
+ 
+    It's still rather fast: it can iterate through all the filenames on my
+    600,000 file filesystem in a few seconds.  But sometimes you just want to
+    change a filename or two, so this is needlessly slow.  There should be
+    a way to binary search through the file list rather than always going
+    through it sequentially.  And if you only add a couple of filenames,
+    there's no need to rewrite the entire index; just leave the new files
+    in a second "extra index" file or something.
+   
+ - bup could use inotify for *really* efficient incremental backups.
+
+    You could even have your system doing "continuous" backups: whenever a
+    file changes, we immediately send an image of it to the server.  We could
+    give the continuous-backup process a really low CPU and I/O priority so
+    you wouldn't even know it was running.
+
+ - bup currently has no features that prune away *old* backups.
+ 
+    Because of the way the packfile system works, backups become "entangled"
+    in weird ways and it's not actually possible to delete one pack
+    (corresponding approximately to one backup) without risking screwing up
+    other backups.
+   
+    git itself has lots of ways of optimizing this sort of thing, but its
+    methods aren't really applicable here; bup packfiles are just too huge.
+    We'll have to do it in a totally different way.  There are lots of
+    options.  For now: make sure you've got lots of disk space :)
+
+ - bup doesn't ever validate existing backups/packs to ensure they're
+    correct.
+   
+    This would be easy to implement (given that git uses hashes and CRCs all
+    over the place), but nobody has implemented it.  For now, you could try
+    doing a test restore of your tarball; doing so should trigger git's error
+    handling if any of the objects are corrupted.  'git fsck' would
+    theoreticaly work too, but it's too slow for huge backups.
+
+ - bup has never been tested on anything but Linux, MacOS, and Linux+Cygwin.
+ 
+    There's nothing that makes it *inherently* non-portable, though, so
+    that's mostly a matter of someone putting in some effort.  (For a
+    "native" Windows port, the most annoying thing is the absence of ssh in
+    a default Windows installation.)
+   
+ - bup has no GUI.  Actually, that's not stupid, but you might consider it
+   a limitation.  There are a bunch of Linux GUI backup programs; someday
+   I expect someone will adapt one of them to use bup.
+
+
+How you can help
+----------------
+
+bup is a work in progress and there are many ways it can still be improved.
+If you'd like to contribute patches, ideas, or bug reports, please join the
+bup mailing list.
+
+You can find the mailing list archives here:
+
+       http://groups.google.com/group/bup-list
+       
+and you can subscribe by sending a message to:
+
+       bup-list+subscribe@googlegroups.com
+
+Have fun,
+
+Avery
+January 2010
author	Avery Pennarun <apenwarr@gmail.com>
	Sun, 24 Jan 2010 01:54:37 +0000 (20:54 -0500)
committer	Avery Pennarun <apenwarr@gmail.com>
	Sun, 24 Jan 2010 02:13:39 +0000 (21:13 -0500)
README	[deleted file]	patch \| blob \| history
README.md	[new file with mode: 0644]	patch \| blob