1 % bup-split(1) Bup %BUP_VERSION%
2 % Avery Pennarun <apenwarr@gmail.com>
7 bup-split - save individual files to bup backup sets
11 bup split \[-t\] \[-c\] \[-n *name*\] COMMON\_OPTIONS
13 bup split -b COMMON\_OPTIONS
15 bup split --copy COMMON\_OPTIONS
17 bup split --noop \[-t|-b\] COMMON\_OPTIONS
20 ~ \[-r *host*:*path*\] \[-v\] \[-q\] \[-d *seconds-since-epoch*\] \[\--bench\]
21 \[\--max-pack-size=*bytes*\] \[-#\] \[\--bwlimit=*bytes*\]
22 \[\--max-pack-objects=*n*\] \[\--fanout=*count*\]
23 \[\--keep-boundaries\] \[--git-ids | filenames...\]
27 `bup split` concatenates the contents of the given files
28 (or if no filenames are given, reads from stdin), splits
29 the content into chunks of around 8k using a rolling
30 checksum algorithm, and saves the chunks into a bup
31 repository. Chunks which have previously been stored are
32 not stored again (ie. they are 'deduplicated').
34 Because of the way the rolling checksum works, chunks
35 tend to be very stable across changes to a given file,
36 including adding, deleting, and changing bytes.
38 For example, if you use `bup split` to back up an XML dump
39 of a database, and the XML file changes slightly from one
40 run to the next, nearly all the data will still be
41 deduplicated and the size of each backup after the first
42 will typically be quite small.
44 Another technique is to pipe the output of the `tar`(1) or
45 `cpio`(1) programs to `bup split`. When individual files
46 in the tarball change slightly or are added or removed, bup
47 still processes the remainder of the tarball efficiently.
48 (Note that `bup save` is usually a more efficient way to
49 accomplish this, however.)
51 To get the data back, use `bup-join`(1).
55 These options select the primary behavior of the command, with -n
56 being the most likely choice.
59 : after creating the dataset, create a git branch
60 named *name* so that it can be accessed using
61 that name. If *name* already exists, the new dataset
62 will be considered a descendant of the old *name*.
63 (Thus, you can continually create new datasets with
64 the same name, and later view the history of that
65 dataset to see how it has changed over time.) The original data
66 will also be available as a top-level file named "data" in the VFS,
67 accessible via `bup fuse`, `bup ftp`, etc.
70 : output the git tree id of the resulting dataset.
73 : output the git commit id of the resulting dataset.
76 : output a series of git blob ids that correspond to the chunks in
77 the dataset. Incompatible with -n, -t, and -c.
80 : read the data and split it into blocks based on the "bupsplit"
81 rolling checksum algorithm, but don't store anything in the repo.
82 Can be combined with -b or -t to compute (but not store) the git
83 blobs or tree ids for the dataset. This is mostly useful for
84 benchmarking and validating the bupsplit algorithm. Incompatible
88 : like `--noop`, but also write the data to stdout. This can be
89 useful for benchmarking the speed of read+bupsplit+write for large
90 amounts of data. Incompatible with -n, -t, -c, and -b.
94 -r, \--remote=*host*:*path*
95 : save the backup set to the given remote server. If *path* is
96 omitted, uses the default path on the remote server (you still
97 need to include the ':'). The connection to the remote server is
98 made with SSH. If you'd like to specify which port, user or
99 private key to use for the SSH connection, we recommend you use
100 the `~/.ssh/config` file. Even though the destination is remote,
101 a local bup repository is still required.
103 -d, \--date=*seconds-since-epoch*
104 : specify the date inscribed in the commit (seconds since 1970-01-01).
107 : disable progress messages.
110 : increase verbosity (can be used more than once).
113 : stdin is a list of git object ids instead of raw data.
114 `bup split` will read the contents of each named git
115 object (if it exists in the bup repository) and split
116 it. This might be useful for converting a git
117 repository with large binary files to use bup-style
118 hashsplitting instead. This option is probably most
119 useful when combined with `--keep-boundaries`.
122 : if multiple filenames are given on the command line,
123 they are normally concatenated together as if the
124 content all came from a single file. That is, the
125 set of blobs/trees produced is identical to what it
126 would have been if there had been a single input file.
127 However, if you use `--keep-boundaries`, each file is
128 split separately. You still only get a single tree or
129 commit or series of blobs, but each blob comes from
130 only one of the files; the end of one of the input
131 files always ends a blob.
134 : print benchmark timings to stderr.
136 \--max-pack-size=*bytes*
137 : never create git packfiles larger than the given number
138 of bytes. Default is 1 billion bytes. Usually there
139 is no reason to change this.
141 \--max-pack-objects=*numobjs*
142 : never create git packfiles with more than the given
143 number of objects. Default is 200 thousand objects.
144 Usually there is no reason to change this.
147 : when splitting very large files, try and keep the number
148 of elements in trees to an average of *numobjs*.
150 \--bwlimit=*bytes/sec*
151 : don't transmit more than *bytes/sec* bytes per second
152 to the server. This is good for making your backups
153 not suck up all your network bandwidth. Use a suffix
154 like k, M, or G to specify multiples of 1024,
155 1024*1024, 1024*1024*1024 respectively.
157 -*#*, \--compress=*#*
158 : set the compression level to # (a value from 0-9, where
159 9 is the highest and 0 is no compression). The default
160 is 1 (fast, loose compression)
165 $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
166 tar: Removing leading /' from member names
167 Indexing objects: 100% (196/196), done.
169 $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
175 `bup-join`(1), `bup-index`(1), `bup-save`(1), `bup-on`(1), `ssh_config`(5)
179 Part of the `bup`(1) suite.