Documentation/bup-split.md

   1 % bup-split(1) Bup %BUP_VERSION%
   2 % Avery Pennarun <apenwarr@gmail.com>
   3 % %BUP_DATE%
   4
   5 # NAME
   6
   7 bup-split - save individual files to bup backup sets
   8
   9 # SYNOPSIS
  10
  11 bup split [-r *host*:*path*] <-b|-t|-c|-n *name*> [-v] [-q]
  12   [--bench] [--max-pack-size=*bytes*]
  13   [--max-pack-objects=*n*] [--fanout=*count] [filenames...]
  14
  15 # DESCRIPTION
  16
  17 `bup split` concatenates the contents of the given files
  18 (or if no filenames are given, reads from stdin), splits
  19 the content into chunks of around 8k using a rolling
  20 checksum algorithm, and saves the chunks into a bup
  21 repository.  Chunks which have previously been stored are
  22 not stored again (ie. they are "deduplicated").
  23
  24 Because of the way the rolling checksum works, chunks
  25 tend to be very stable across changes to a given file,
  26 including adding, deleting, and changing bytes.
  27
  28 For example, if you use `bup split` to back up an XML dump
  29 of a database, and the XML file changes slightly from one
  30 run to the next, nearly all the data will still be
  31 deduplicated and the size of each backup after the first
  32 will typically be quite small.
  33
  34 Another technique is to pipe the output of the `tar`(1) or
  35 `cpio`(1) programs to `bup split`.  When individual files
  36 in the tarball change slightly or are added or removed, bup
  37 still processes the remainder of the tarball efficiently.
  38 (Note that `bup save` is usually a more efficient way to
  39 accomplish this, however.)
  40
  41 To get the data back, use `bup-join`(1).
  42
  43 # OPTIONS
  44
  45 -r, --remote=*host*:*path*
  46 :   save the backup set to the given remote server.  If
  47     *path* is omitted, uses the default path on the remote
  48     server (you still need to include the ':')
  49
  50 -b, --blobs
  51 :   output a series of git blob ids that correspond to the
  52     chunks in the dataset.
  53
  54 -t, --tree
  55 :   output the git tree id of the resulting dataset.
  56
  57 -c, --commit
  58 :   output the git commit id of the resulting dataset.
  59
  60 -n, --name=*name*
  61 :   after creating the dataset, create a git branch
  62     named *name* so that it can be accessed using
  63     that name.  If *name* already exists, the new dataset
  64     will be considered a descendant of the old *name*.
  65     (Thus, you can continually create new datasets with
  66     the same name, and later view the history of that
  67     dataset to see how it has changed over time.)
  68
  69 -q, --quiet
  70 :   disable progress messages.
  71
  72 -v, --verbose
  73 :   increase verbosity (can be used more than once).
  74
  75 --noop
  76 :   read the data and split it into blocks based on the "bupsplit"
  77     rolling checksum algorithm, but don't do anything with
  78     the blocks.  This is mostly useful for benchmarking.
  79
  80 --copy
  81 :   like --noop, but also write the data to stdout.  This
  82     can be useful for benchmarking the speed of read+bupsplit+write
  83     for large amounts of data.
  84
  85 --bench
  86 :   print benchmark timings to stderr.
  87
  88 --max-pack-size=*bytes*
  89 :   never create git packfiles larger than the given number
  90     of bytes.  Default is 1 billion bytes.  Usually there
  91     is no reason to change this.
  92
  93 --max-pack-objects=*numobjs*
  94 :   never create git packfiles with more than the given
  95     number of objects.  Default is 200 thousand objects.
  96     Usually there is no reason to change this.
  97
  98 --fanout=*numobjs*
  99 :   when splitting very large files, never put more than
 100     this number of git blobs in a single git tree.  Instead,
 101     generate a new tree and link to that.  Default is
 102     4096 objects per tree.
 103
 104 --bwlimit=*bytes/sec*
 105 :   don't transmit more than *bytes/sec* bytes per second
 106     to the server.  This is good for making your backups
 107     not suck up all your network bandwidth.  Use a suffix
 108     like k, M, or G to specify multiples of 1024,
 109     1024*1024, 1024*1024*1024 respectively.
 110
 111
 112 # EXAMPLE
 113
 114     $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
 115     tar: Removing leading /' from member names
 116     Indexing objects: 100% (196/196), done.
 117
 118     $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
 119     1961
 120
 121
 122 # SEE ALSO
 123
 124 `bup-join`(1), `bup-index`(1), `bup-save`(1), `bup-on`(1)
 125
 126 # BUP
 127
 128 Part of the `bup`(1) suite.