1 % bup-split(1) Bup %BUP_VERSION%
2 % Avery Pennarun <apenwarr@gmail.com>
7 bup-split - save individual files to bup backup sets
11 bup split [-r *host*:*path*] <-b|-t|-c|-n *name*> [-v] [-q]
12 [--bench] [--max-pack-size=*bytes*]
13 [--max-pack-objects=*n*] [--fanout=*count]
14 [--keep-boundaries] [filenames...]
18 `bup split` concatenates the contents of the given files
19 (or if no filenames are given, reads from stdin), splits
20 the content into chunks of around 8k using a rolling
21 checksum algorithm, and saves the chunks into a bup
22 repository. Chunks which have previously been stored are
23 not stored again (ie. they are "deduplicated").
25 Because of the way the rolling checksum works, chunks
26 tend to be very stable across changes to a given file,
27 including adding, deleting, and changing bytes.
29 For example, if you use `bup split` to back up an XML dump
30 of a database, and the XML file changes slightly from one
31 run to the next, nearly all the data will still be
32 deduplicated and the size of each backup after the first
33 will typically be quite small.
35 Another technique is to pipe the output of the `tar`(1) or
36 `cpio`(1) programs to `bup split`. When individual files
37 in the tarball change slightly or are added or removed, bup
38 still processes the remainder of the tarball efficiently.
39 (Note that `bup save` is usually a more efficient way to
40 accomplish this, however.)
42 To get the data back, use `bup-join`(1).
46 -r, --remote=*host*:*path*
47 : save the backup set to the given remote server. If
48 *path* is omitted, uses the default path on the remote
49 server (you still need to include the ':')
52 : output a series of git blob ids that correspond to the
53 chunks in the dataset.
56 : output the git tree id of the resulting dataset.
59 : output the git commit id of the resulting dataset.
62 : after creating the dataset, create a git branch
63 named *name* so that it can be accessed using
64 that name. If *name* already exists, the new dataset
65 will be considered a descendant of the old *name*.
66 (Thus, you can continually create new datasets with
67 the same name, and later view the history of that
68 dataset to see how it has changed over time.)
71 : disable progress messages.
74 : increase verbosity (can be used more than once).
77 : if multiple filenames are given on the command line,
78 they are normally concatenated together as if the
79 content all came from a single file. That is, the
80 set of blobs/trees produced is identical to what it
81 would have been if there had been a single input file.
82 However, if you use `--keep-boundaries`, each file is
83 split separately. You still only get a single tree or
84 commit or series of blobs, but each blob comes from
85 only one of the files; the end of one of the input
86 files always ends a blob.
89 : read the data and split it into blocks based on the "bupsplit"
90 rolling checksum algorithm, but don't do anything with
91 the blocks. This is mostly useful for benchmarking.
94 : like --noop, but also write the data to stdout. This
95 can be useful for benchmarking the speed of read+bupsplit+write
96 for large amounts of data.
99 : print benchmark timings to stderr.
101 --max-pack-size=*bytes*
102 : never create git packfiles larger than the given number
103 of bytes. Default is 1 billion bytes. Usually there
104 is no reason to change this.
106 --max-pack-objects=*numobjs*
107 : never create git packfiles with more than the given
108 number of objects. Default is 200 thousand objects.
109 Usually there is no reason to change this.
112 : when splitting very large files, never put more than
113 this number of git blobs in a single git tree. Instead,
114 generate a new tree and link to that. Default is
115 4096 objects per tree.
117 --bwlimit=*bytes/sec*
118 : don't transmit more than *bytes/sec* bytes per second
119 to the server. This is good for making your backups
120 not suck up all your network bandwidth. Use a suffix
121 like k, M, or G to specify multiples of 1024,
122 1024*1024, 1024*1024*1024 respectively.
127 $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
128 tar: Removing leading /' from member names
129 Indexing objects: 100% (196/196), done.
131 $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
137 `bup-join`(1), `bup-index`(1), `bup-save`(1), `bup-on`(1)
141 Part of the `bup`(1) suite.