Update the README to reflect recent changes.

author Avery Pennarun <apenwarr@gmail.com>

Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)

committer Avery Pennarun <apenwarr@gmail.com>

Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)
author Avery Pennarun <apenwarr@gmail.com>
Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)
committer Avery Pennarun <apenwarr@gmail.com>
Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)
diff --git a/README b/README

index 8b6ec6fafe1c2c0e568296503933f00ae74ec37f..d586018e2c4176f5e630275bd85107b8dc69cb81 100644 (file)
--- a/README
+++ b/README
@@ -1,5 +1,5 @@
  
-bup 0.01: It backs things up
+bup 0.04: It backs things up
  ============================
  
  bup is a program that backs things up.  It's short for "backup." Can you
@@ -54,12 +54,14 @@ bup has a few advantages over other backup software:
  Reasons you might want to avoid bup
  -----------------------------------
  
- - This is version 0.01.  What that means is this is the very first version. 
-   Therefore it will most probably not work for you, but we don't know why.
+ - This is a very early version. Therefore it will most probably not work
+   for you, but we don't know why.  It is also missing some
+   probably-critical features.
     
   - It requires python 2.5, a C compiler, and an installed git version >= 1.5.2.
   
- - It only works on Linux (for now).
+ - It currently only works on Linux, MacOS X 10.5, or Windows (with Cygwin).
+   Patches to support other platforms are welcome.
   
   - It has almost no documentation.  Not even a man page!  This file is all
     you get for now.
@@ -87,11 +89,11 @@ Getting started
     (The tests should pass.  If they don't pass for you, stop here and send
     me an email.)
     
- - Try making a local backup:
+ - Try making a local backup as a tar file:
   
         tar -cvf - /etc | bup split -n local-etc -vv
         
- - Try restoring your backup:
+ - Try restoring your backup tarball:
   
         bup join local-etc | tar -tf -
         
@@ -124,16 +126,34 @@ Getting started
     
         tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv
   
- - try restoring the remote backup:
+ - try restoring the remote backup tarball:
   
         bup join -r SERVERNAME: local-etc | tar -tf -
-
+       
+ - try using the new (slightly experimental) 'bup index' and 'bup save'
+   style backups, which bypass 'tar' but have some missing features (see
+   "Things that are stupid" below):
+       
+       bup index -uv /etc
+       bup save -n local-etc /etc
+       
+ - do it again and see how fast an incremental backup can be:
+ 
+       bup index -uv /etc
+       bup save -n local-etc /etc
+       
+   (You can also use the "-r SERVERNAME:" option to 'bup save', just like
+    with 'bup split' and 'bup join'.  The index itself is always local,
+    so you don't need -r there.)
+       
  That's all there is to it!
  
  
  How it works
  ------------
  
+Basic storage:
+
  bup stores its data in a git-formatted repository.  Unfortunately, git
  itself doesn't actually behave very well for bup's use case (huge numbers of
  files, files with huge sizes, retaining file permissions/ownership are
@@ -167,16 +187,27 @@ list of blobs, a tree containing that list of blobs, or a commit containing
  that tree, respectively, to stdout.  You can use this to construct your own
  scripts that do something with those values.
  
-'bup save' basically just runs 'bup split' a whole bunch of times, once per
-file in a directory hierarchy, and assembles a git tree that contains all
-the resulting objects.  Among other things, that makes 'git diff' much more
-useful (compared to splitting a tarball, which is essentially a big binary
-blob).  However, since bup splits large files into smaller chunks, the
-resulting tree structure doesn't *exactly* correspond to what git itself
-would have stored.  Also, the tree format used by 'bup save' will probably
-change in the future to support storing file ownership, more complex file
-permissions, and so on.
- 
+The bup index:
+
+'bup index' walks through your filesystem and updates a file (whose name is,
+by default, ~/.bup/bupindex) to contain the name, attributes, and an
+optional git SHA1 (blob id) of each file and directory.
+
+'bup save' basically just runs the equivalent of 'bup split' a whole bunch
+of times, once per file in the index, and assembles a git tree
+that contains all the resulting objects.  Among other things, that makes
+'git diff' much more useful (compared to splitting a tarball, which is
+essentially a big binary blob).  However, since bup splits large files into
+smaller chunks, the resulting tree structure doesn't *exactly* correspond to
+what git itself would have stored.  Also, the tree format used by 'bup save'
+will probably change in the future to support storing file ownership, more
+complex file permissions, and so on.
+
+If a file has previously been written by 'bup save', then its git blob/tree
+id is stored in the index.  This lets 'bup save' avoid reading that file to
+produce future incremental backups, which means it can go *very* fast unless
+a lot of files have changed.
+
   
  Things that are stupid for now but which we'll fix later
  --------------------------------------------------------
@@ -184,31 +215,31 @@ Things that are stupid for now but which we'll fix later
  Help with any of these problems, or others, is very, very welcome.  Let me
  know if you'd like to help.  Maybe we can start a mailing list.
  
- - bup's incremental backup algorithm is braindead.
+ - 'bup save' doesn't know about file metadata.
   
-   Bup reads the contents of every single file you want to back up, *then*
-   it checks if it has that content already, and if not, it backs up the
-   file.  Now, it happens to do that very fast (using mmap'ed git packfile
-   indexes), all things considered, but it's not nearly as fast as simply
-   noticing that the file inode+ctime is the same as before and just
-   skipping it.  There's nothing preventing us from adding this
-   optimization, though.  (Perhaps we could use the git indexfile format for
-   tracking this?)
-   
- - 'bup save' is incomplete and there's no 'bup restore' yet.
+   That means we aren't saving file attributes, mtimes, ownership, hard
+   links, MacOS resource forks, etc.  Clearly this needs to be improved.
+
+ - There's no 'bup restore' yet.
   
-   'bup save' is supposed to recursively go through a given directory and
-   store all the files efficiently, and then you could use 'bup restore' to
-   restore all or some of them.  However, these features don't really work
-   yet.
+   'bup save' saves files in the standard git 'tree of blobs' format, so you
+   could then "restore" the files using something like 'git checkout'.  But
+   that's a git command, not a bup command, so it's hard to explain and
+   doesn't support retrieving objects from a remote bup server without first
+   fetching and packing an entire (possibly huge) pack, which could be very
+   slow.  Also, like 'bup save', you would need extra features in order to
+   properly restore file metadata.  And files that bup has split into
+   chunks would need to be recombined somehow.
+   
+ - 'bup index' is slower than it should be.
   
-   Instead, for now the best way to use bup is to feed 'bup split' a big tar
-   file of your backup, then restore that tar file later with 'bup join'. 
-   This is cute, but inefficient; for example, tar files don't have an
-   index, so to restore a single file would require linearly reading through
-   the entire tarball.  (This is exactly like what always happens when you
-   make a backup using tar, but if we use git's native trees/blobs the way
-   they're meant to be used, it will be ridiculously faster.)
+   It's still rather fast: it can iterate through all the filenames on my
+   600,000 file filesystem in a few seconds.  But sometimes you just want to
+   change a filename or two, so this is needlessly slow.  There should be
+   a way to binary search through the file list rather than always going
+   through it sequentially.  And if you only add a couple of filenames,
+   there's no need to rewrite the entire index; just leave the new files
+   in a second "extra index" file or something.
     
   - bup could use inotify for *really* efficient incremental backups.
  
@@ -235,21 +266,35 @@ know if you'd like to help.  Maybe we can start a mailing list.
     This would be easy to implement (given that git uses hashes and CRCs all
     over the place), but nobody has implemented it.  For now, you could try
     doing a test restore of your tarball; doing so should trigger git's error
-   handling if any of the objects are corrupted.
+   handling if any of the objects are corrupted.  'git fsck' would
+   theoreticaly work too, but it's too slow for huge backups.
  
- - bup has never been tested on anything but Linux.
+ - bup has never been tested on anything but Linux, MacOS, and Linux+Cygwin.
   
     There's nothing that makes it *inherently* non-portable, though, so
-   that's mostly a matter of someone putting in some effort.
+   that's mostly a matter of someone putting in some effort.  (For a
+   "native" Windows port, the most annoying thing is the absence of ssh in
+   a default Windows installation.)
+   
+ - bup has no GUI.  Actually, that's not stupid, but you might consider it
+   a limitation.  There are a bunch of Linux GUI backup programs; someday
+   I expect someone will adapt one of them to use bup.
  
  
  How you can help
  ----------------
  
-bup is a work in progress and there are many ways it can still be improved. 
-If you'd like to contribute, please email me at <apenwarr@gmail.com>.  If
-enough people are interested, perhaps we should start a mailing list for
-it!
+bup is a work in progress and there are many ways it can still be improved.
+If you'd like to contribute patches, ideas, or bug reports, please join the
+bup mailing list.
+
+You can find the mailing list archives here:
+
+       http://groups.google.com/group/bup-list
+       
+and you can subscribe by sending a message to:
+
+       bup-list+subscribe@googlegroups.com
  
  Have fun,
author	Avery Pennarun <apenwarr@gmail.com>
	Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)
committer	Avery Pennarun <apenwarr@gmail.com>
	Mon, 11 Jan 2010 20:41:01 +0000 (15:41 -0500)