1 % bup-split(1) Bup %BUP_VERSION%
2 % Avery Pennarun <apenwarr@gmail.com>
7 bup-split - save individual files to bup backup sets
11 bup split \[-t\] \[-c\] \[-n *name*\] COMMON\_OPTIONS
13 bup split -b COMMON\_OPTIONS
15 bup split \<--noop \[--copy\]|--copy\> COMMON\_OPTIONS
18 ~ \[-r *host*:*path*\] \[-v\] \[-q\] \[-d *seconds-since-epoch*\] \[\--bench\]
19 \[\--max-pack-size=*bytes*\] \[-#\] \[\--bwlimit=*bytes*\]
20 \[\--max-pack-objects=*n*\] \[\--fanout=*count*\]
21 \[\--keep-boundaries\] \[--git-ids | filenames...\]
25 `bup split` concatenates the contents of the given files
26 (or if no filenames are given, reads from stdin), splits
27 the content into chunks of around 8k using a rolling
28 checksum algorithm, and saves the chunks into a bup
29 repository. Chunks which have previously been stored are
30 not stored again (ie. they are 'deduplicated').
32 Because of the way the rolling checksum works, chunks
33 tend to be very stable across changes to a given file,
34 including adding, deleting, and changing bytes.
36 For example, if you use `bup split` to back up an XML dump
37 of a database, and the XML file changes slightly from one
38 run to the next, nearly all the data will still be
39 deduplicated and the size of each backup after the first
40 will typically be quite small.
42 Another technique is to pipe the output of the `tar`(1) or
43 `cpio`(1) programs to `bup split`. When individual files
44 in the tarball change slightly or are added or removed, bup
45 still processes the remainder of the tarball efficiently.
46 (Note that `bup save` is usually a more efficient way to
47 accomplish this, however.)
49 To get the data back, use `bup-join`(1).
53 These options select the primary behavior of the command, with -n
54 being the most likely choice.
57 : after creating the dataset, create a git branch
58 named *name* so that it can be accessed using
59 that name. If *name* already exists, the new dataset
60 will be considered a descendant of the old *name*.
61 (Thus, you can continually create new datasets with
62 the same name, and later view the history of that
63 dataset to see how it has changed over time.)
66 : output the git tree id of the resulting dataset.
69 : output the git commit id of the resulting dataset.
72 : output a series of git blob ids that correspond to the chunks in
73 the dataset. Incompatible with -n, -t, and -c.
76 : read the data and split it into blocks based on the "bupsplit"
77 rolling checksum algorithm, but don't do anything with the blocks.
78 This is mostly useful for benchmarking. Incompatible with -n, -t,
82 : like `--noop`, but also write the data to stdout. This can be
83 useful for benchmarking the speed of read+bupsplit+write for large
84 amounts of data. Incompatible with -n, -t, -c, and -b.
88 -r, \--remote=*host*:*path*
89 : save the backup set to the given remote server. If *path* is
90 omitted, uses the default path on the remote server (you still
91 need to include the ':'). The connection to the remote server is
92 made with SSH. If you'd like to specify which port, user or
93 private key to use for the SSH connection, we recommend you use
94 the `~/.ssh/config` file. Even though the destination is remote,
95 a local bup repository is still required.
97 -d, \--date=*seconds-since-epoch*
98 : specify the date inscribed in the commit (seconds since 1970-01-01).
101 : disable progress messages.
104 : increase verbosity (can be used more than once).
107 : stdin is a list of git object ids instead of raw data.
108 `bup split` will read the contents of each named git
109 object (if it exists in the bup repository) and split
110 it. This might be useful for converting a git
111 repository with large binary files to use bup-style
112 hashsplitting instead. This option is probably most
113 useful when combined with `--keep-boundaries`.
116 : if multiple filenames are given on the command line,
117 they are normally concatenated together as if the
118 content all came from a single file. That is, the
119 set of blobs/trees produced is identical to what it
120 would have been if there had been a single input file.
121 However, if you use `--keep-boundaries`, each file is
122 split separately. You still only get a single tree or
123 commit or series of blobs, but each blob comes from
124 only one of the files; the end of one of the input
125 files always ends a blob.
128 : print benchmark timings to stderr.
130 \--max-pack-size=*bytes*
131 : never create git packfiles larger than the given number
132 of bytes. Default is 1 billion bytes. Usually there
133 is no reason to change this.
135 \--max-pack-objects=*numobjs*
136 : never create git packfiles with more than the given
137 number of objects. Default is 200 thousand objects.
138 Usually there is no reason to change this.
141 : when splitting very large files, try and keep the number
142 of elements in trees to an average of *numobjs*.
144 \--bwlimit=*bytes/sec*
145 : don't transmit more than *bytes/sec* bytes per second
146 to the server. This is good for making your backups
147 not suck up all your network bandwidth. Use a suffix
148 like k, M, or G to specify multiples of 1024,
149 1024*1024, 1024*1024*1024 respectively.
151 -*#*, \--compress=*#*
152 : set the compression level to # (a value from 0-9, where
153 9 is the highest and 0 is no compression). The default
154 is 1 (fast, loose compression)
159 $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
160 tar: Removing leading /' from member names
161 Indexing objects: 100% (196/196), done.
163 $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
169 `bup-join`(1), `bup-index`(1), `bup-save`(1), `bup-on`(1), `ssh_config`(5)
173 Part of the `bup`(1) suite.