On some platforms -Wstrict-prototype is now the default, and readline
includes prototypes like "int foo()" rather than "int foo(void)", so
for now, just suppress those warnings for the readline includes.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 21 Jun 2020 16:29:23 +0000 (11:29 -0500)]
configure-sampledata: only create random paths if asked
Stop creating randomized paths in t/sampledata/ by default. I'd
originally just added this to allow some quick testing, and while it
now appears to be fine on Linux/ext4, it's too aggressive to be the
default, so hide it behind a BUP_TEST_RANDOMIZED_SAMPLEDATA_PATHS
environment variable.
Among other things, make-random-paths just crashes on cirrus macos,
and cirrus freebsd was having (different) trouble. It might also have
been macos where test-import-duplicity.sh failed on compare-trees
mismatches. Not sure whether that was an issue with bup, rsync, or
duplicity.
We'll want to restore broader randomized path testing, but likely via
a less blunt instrument, since placing the paths in t/sampledata
affects any test that relies on it, and existing testing constructs
like
WVPASSEQ ... $(... | wc -l)
are completely incompatible.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 20 Jun 2020 16:55:53 +0000 (11:55 -0500)]
Use pkg-config opportunistically
Use pkg-config's --cflags and --libs when they're available for
libreadline or libacl, but don't require pkg-config. When it's not
found, just check for the libraries with a test compile.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 19 Jun 2020 05:19:07 +0000 (07:19 +0200)]
grp_struct_to_py(): fix error handling
Both getgrgid_r() and getgrnam_r() *return* an error number
on failures, and don't store it to errno. Thus, rc will not
be less than zero, and we need to set errno before we can
create a python error from it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 1 May 2020 22:52:53 +0000 (00:52 +0200)]
DESIGN: document the actual hashsplit algorithm
The hashsplit algorithm, when used for the fanout, has a quirk
that appears to be due to an implementation bug.
In order for splitting to occur, the lowest 13 (BUP_BLOBBITS)
bits of the csum need to be 1. Then, per DESIGN, the next bits
that are 1 are used for the fanout. However, the implementation
doesn't actually work this way. What actually happens is that
the lower 13 bits need to be ones:
........1'1111'1111'1111
Then, the DESIGN document states that the next bits that are 1
should be used for the fanout:
....'111_'____'____'____
However, the implementation actually ignores the next bit ('x')
....'11x_'____'____'____
and it doesn't matter whether that's set to 0 or 1, the fanout
will be based on the next higher bits (marked '.' and '1').
Fix this in the DESIGN documentation rather than changing the
algorithm as the latter would cause a save of an identical file
to completely rewrite the tree objects that make up the file,
due to different fanout behaviour.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Johannes Berg [Tue, 26 May 2020 22:29:23 +0000 (00:29 +0200)]
metadata: fix test failure with xattrs
The test_apply_to_path_restricted_access() is broken because
the expected string doesn't take into account that it's now
a u'' string (on python2) due to path_msg(), but this doesn't
appear on any non-selinux systems because the original file
never has any xattr, so the test passes, just not on my Fedora
system.
Change the expected message and remove the quote entirely as
it's different between python 2 and 3, and also try to set an
xattr in the test - if that fails, just continue as it used
to be without reading/setting it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 30 May 2020 21:55:46 +0000 (23:55 +0200)]
metadata: port ACL support to C
Use some own C code instead of the posix1e python bindings, as those
don't have correct 'bytes' support (at least right now), which means
that we cannot use them with arbitrary file, user and group names.
Our own wrappers just use 'bytes' throughout.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
[rlb@defaultvalue.org: adjust to rely on pkg-config] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 30 May 2020 02:50:25 +0000 (21:50 -0500)]
Split src tree python use to config/bin/python and dev/bup-python
Replace cmd/bup-python with config/bin/python, which is just a symlink
to the configured python, and dev/bup-python, which is what "bup
python" used to be. Adjust all the code to use config/bin/python when
we can (i.e. when we don't need bup modules), and dev/bup-python
otherwise. Drop "bup python", since we don't need it anymore.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Now that the last subcommand (web) has been ported to Python 3, we
at least some randomized binary test coverage, and we think we've
addressed all the Python 3 issues we know of, remove the
BUP_ALLOW_UNEXPECTED_PYTHON_VERSION guardrail.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Mon, 25 May 2020 19:46:52 +0000 (14:46 -0500)]
Stop forcing LC_CTYPE=ISO-8859-1
Now that we've made adjustments to work around all the Python 3
problems with non-Unicode data (argv, env vars, readline, acls, users,
groups, hostname, etc.), and added randomized binary path and argv
testing, stop overriding the LC_CTYPE since that should no longer be
necessary.
Thanks to Johannes Berg for nudging me to consider whether we might
now be in a position to do this (with a bit more work), and for quite
a bit of help getting all the precursors in place once we thought it
was feasible.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 7 Jun 2020 14:05:26 +0000 (09:05 -0500)]
Bypass Python 3 glibc argv problems by routing args through env
Until/unless https://sourceware.org/bugzilla/show_bug.cgi?id=26034 is
resolved by Python or GNU libc, sidestep the problem, which can crash
Python 3 during initialization, with a trivial sh wrapper that diverts
the command line arguments into BUP_ARGV_{0,1,2,...} environment
variables, since those can be safely retrieved.
Add compat.argvb and compat.argv and populate them at startup with the
BUP_ARGV_* values. Adjust all the relevant commands to rely on those
vars instead of sys.argv.
Although the preamble say "rewritten during install", that's not in
place yet, but will be soon (when we drop LC_CTYPE and rework
bup-python).
Thanks to Johannes Berg for suggesting this, and help figuring out the
details.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Apr 2020 21:35:33 +0000 (23:35 +0200)]
hashsplit: avoid cat_bytes() if possible
If our current buffer is empty, there's no need to cat_bytes()
it with the new buffer, we can just replace the empty one with
the new one. This saves the memcpy() in many cases. Especially
if the whole file was read in one chunk (by bumping up the read
size) this saves a lot of time.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sun, 17 May 2020 19:43:45 +0000 (21:43 +0200)]
tests: web: also add some invalid UTF-8
The '¡excitement!' really tests only valid UTF-8 since it
comes from the original bash file, add another test that
explicitly creates a byte sequence that is invalid utf-8
and ensures that this is preserved properly as well.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 7 Jun 2020 20:32:10 +0000 (15:32 -0500)]
Wrap readline oursleves to avoid py3's interference
We don't want Python to "help" us by guessing and insisting on the
encoding before we can even look at the incoming data, so wrap
readline ourselves, with a bytes-oriented (and more direct) API. This
will allows us to preserve the status quo for now (and maintain parity
between Python 2 and 3) when using Python 3 as we remove our LC_CTYPE
override.
At least on Linux, readline --cflags currently defines _DEFAULT_SOURCE
and defines _XOPEN_SOURCE to 600, and the latter conflicts with a
setting of 700 via Python.h in some installations, so for now, just
defer to Python as long as it doesn't choose an older version.
Thanks to Johannes Berg for fixes for allocation issues, etc. in an
earler version, and help figuring out the #define arrangement.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 31 May 2020 19:10:11 +0000 (14:10 -0500)]
distutils: handle CFLAGS and LDFLAGS directly
Otherwise it places LDFLAGS in the middle of the link arguments,
before lib/bup/*.o which means we can't add lib
dependencies (e.g. -lreadline). Pass the libs directly and specify them
via the appropriate extra_* arguments.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 30 May 2020 19:10:02 +0000 (21:10 +0200)]
bup: add own gethostname() wrapper
This is necessary because python3 insists that hostnames should
be utf-8, which is a rather questionable assumption.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: don't define HOST_NAME_MAX if it's not already] Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Thu, 28 May 2020 05:44:16 +0000 (00:44 -0500)]
pwdgrp: add C helpers to get user/group bytes directly
Thanks to Johannes Berg for fixing some bugs in an earlier revision.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 30 May 2020 19:41:10 +0000 (21:41 +0200)]
metadata: don't modify ACL list when writing
ACLs should be stored as a two-entry list on files, and four-entry
list for directories. Unfortunately, when writing, we expand the
two-entry list for files to four, because the metadata format is
always with four entries.
However, on reading, we trim the last two empty entries, so that
we can end up in an inconsistent situation: On a metadata entry
for a file that has been written already, it will still have four
entries, and that won't compare correctly etc.
This isn't an issue today because we only ever do the compare
in restore, where we didn't load from disk but from the meta-
data in the repository, which always starts out four entries.
Still, fix the inconsistency and don't erroneously extend the
list when writing.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Mon, 25 May 2020 19:55:46 +0000 (14:55 -0500)]
Move cmd to lib/ and reverse symlink
This prepares for removal of the bup-python wrapper. Given this
change we'll be able to have the same sys.path in the source tree and
install tree, and so won't have to go back to mangling that during
installs.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 13 May 2020 20:22:04 +0000 (22:22 +0200)]
index: fix -H option
hexlify(ent) doesn't work, that needs to be ent.sha. Fix it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Luca Carlon [Thu, 21 May 2020 20:28:41 +0000 (22:28 +0200)]
git/midx: provide context managers for idx classes
Opening files and then mmap()ing them keeps the files open at the
filesystem level, and then they cannot be fully removed until the
fd is closed when giving up the mapping.
On most filesystems, the file still exists but is no longer visible
ion this case. However, at least on CIFS this results in the file
still being visible in the folder, but it can no longer be opened
again, or such. This leads to a crash in 'bup gc' because it wants
to re-evaluate the idx it just tried to delete.
Teach the PackIdx classes the context manager protocol so we can
easily unmap once they're no longer needed, and use that in bup gc
(for now only there).
For consistency, already add the context manager protocol also to
the midx, even if it's not strictly needed yet since bup gc won't
actually do this to an midx.
Signed-off-by: Luca Carlon <carlon.luca@gmail.com>
[add commit message based on error description, add midx part,
remove shatable to avoid live pointers into unmapped region] Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 16 May 2020 07:03:31 +0000 (09:03 +0200)]
save: close files immediately
Use a with statement to close all files immediately after
hashsplitting. There's also no need to have two except
clauses, so unify them to simplify this change.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sun, 17 May 2020 21:45:21 +0000 (23:45 +0200)]
helpers: use float for format_filesize()
In format_filesize(), we really do want float division,
in order to display the value correctly. For example, if
there's a file with 45200000 bytes, that should be shown
as 43.1 MB, not 43.0. Fix this by using proper float
division here, not int division.
Fixes: a5809723352c ("helpers: use // not / for division") Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 1 May 2020 20:16:39 +0000 (22:16 +0200)]
mincore: fix reading information
The _fmincore_chunk_size is typically set to 64MiB, which
makes sense to avoid doing very large mmap() operations
(to save already precious VM on 32-bit systems).
However, since that's in bytes, we cannot divide a size in
pages by it, and expect any useful outcome.
Calculate the number of chunks (chunk_count) properly based
on the size of the file, rather than its number of pages.
Otherwise, chunk_count typically ends up just 1 even for a
very large file (my test file was ~500MiB), and mincore()
is run just once, so we fill the presence information only
for the first 64MiB of the file, even if it was previously
completely in RAM.
Given a large enough test file (and enough RAM to keep it
there), the following should print about the same times
twice:
cat test > /dev/null ; \
time cat test > /dev/null ; \
bup split --noop test ; \
time cat test >/dev/null
Without the fix, it's evident that the file is evicted from
RAM almost entirely (apart from the first 64MiB) even in
this synthetic case.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sun, 19 Jan 2020 20:18:35 +0000 (21:18 +0100)]
git: create_commit_blob: allow timezones to be specified as 0
Checking "if adate_tz" means that if it's 0 (UTC) then we'll
actually use localtime, which is wrong. Do this only when it's
specified as None.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: elaborate on rationale in commit message]
Johannes Berg [Sat, 8 Feb 2020 21:54:56 +0000 (22:54 +0100)]
vfs: use None for unknown uid/gid
This means we show '?' instead of 0 for unknown UIDs when
numeric output is requested, as it was before.
This also uncovered a forgotten bytes annotation for the
"unknown" string ('?' should be b'?').
Somehow, this new behaviour (of printing 0 instead of ?)
also got quite enshrined in the test suite, fix that too.
And finally, on python 2, fuse doesn't accept None in the
stat struct (but does on python 3, go figure).
Fixes: f76c37383ddb ("Remove vfs (replaced by vfs2)") Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 18 Jan 2020 21:36:36 +0000 (22:36 +0100)]
git: add a test for not keeping midx files open
This test creates a few dummy idx files, generates an midx,
queries a PackIdxList for a non-existent object, unlinks the
midx and checks that we still have it open, but that we close
it at PackIdxList::refresh now.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 3 Jan 2020 12:53:20 +0000 (13:53 +0100)]
git: split out idx file writing to a separate class
Split the idx file writing into a separate class, to make that
kind of action available separately. This will be useful for the
next patch where we use it to test some idx/midx code.
In the future, it'll also be useful for encrypted repositories
since the idx format there will be useful for local caching to
take advantage of midx and bloom code as is, but the packwriter
will of course not be useful.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Mon, 16 Mar 2020 21:09:02 +0000 (22:09 +0100)]
test-save-errors: fix shebang for freebsd
It appears that freebsd doesn't support recursive interpreters,
so you cannot use a shell script directly as one. Instead, to
fix it, invoke it via /usr/bin/env.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Mon, 13 Jan 2020 19:53:26 +0000 (20:53 +0100)]
ssh: simplify the code
There's no point in shipping PATH to the remote server, since
it will be different there. We can also simplify the loopback
check, and we don't really need to munge the PATH there either
if we just use path.exe() in place of a plain 'bup' for it.
While at it, also fix the formatting instruction for the ints
to %d, instead of %s.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Tue, 11 Feb 2020 02:47:13 +0000 (20:47 -0600)]
git: overhaul git version checking
Rework the git version tetsting to handle versions like "1.5.2-rc3"
or (apparently) "1.5.2-rc3 (something ...)". Add tests for parsing of
all the version types in the current git tag history that we need to
support.
Support and document BUP_ASSUME_GIT_VERSION_IS_FINE=1 as an escape
hatch in case the parsing isn't sufficiently comprehensive, or
upstreeam changes their practices in future releases.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
when there was no error output and the exit status was zero. A
previous commit 19f9faeb0055dadb7f76a953d51acec8373c6edb eliminated
the spurious reporting; now make sure we print a newline after the
par2 output whenever there actually is an error.
Signed-off-by: Christian Cornelssen <ccorn@1tein.de>
[rlb@defaultvalue.org: add information from the pr to commit message] Reviewed-by: Rob Browning <rlb@defaultvalue.org>
when there was no error output and the exit status was zero. Stop
printing the warning in that case.
Signed-off-by: Christian Cornelssen <ccorn@1tein.de>
[rlb@defaultvalue.org: add information from the pr to commit message] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Mon, 17 Feb 2020 20:02:11 +0000 (21:02 +0100)]
index: make --fake-valid match the man page
The index command currently clobbers the hash of a file when
marking it as valid, but the man page states:
--fake-valid
mark specified paths as up-to-date even if they aren't.
This can be useful for testing, or to avoid unnecessarily
backing up files that you know are boring.
The latter part ("avoid unnecessarily backing up [...]") cannot be
implemented with --fake-valid as is, because of the clobbering of
the hash: the fake invented hash will not exist in the repository,
and thus save checks and saves the file.
Fix this by clobbering the hash only if it's the invalid EMPTY_SHA.
Add a test for this to test-save-smaller, just because that's where
we discovered it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Mon, 17 Feb 2020 19:03:22 +0000 (20:03 +0100)]
save: add test for --smaller, fix DESIGN document
Add a test for --smaller, in particular showing that the actual --smaller
behaviour doesn't match what's described in the DESIGN file, which says:
Another interesting trick is that you can skip backing up files even if
IX_HASHVALID *isn't* set, as long as you have that file's sha1 in the
repository. What that means is you've chosen not to backup the latest
version of that file; instead, your new backup set just contains the
most-recently-known valid version of that file. This is a good trick if you
want to do frequent backups of smallish files and infrequent backups of
large ones (as in 'bup save --smaller'). Each of your backups will be
"complete," in that they contain all the small files and the large ones, but
intermediate ones will just contain out-of-date copies of the large files.
This ("Each of your backups will be 'complete,' [...]") would seem to indicate
all files should be present, but in fact neither new nor old files are actually
saved by 'bup save --smaller'.
To avoid confusion, also update the DESIGN documentation here.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 4 Feb 2020 20:27:23 +0000 (21:27 +0100)]
get: convert opt.source to bytes
I noticed this while playing with something else that
didn't just pass the repo_dir to git, but instead used
it with some os.path.join() calls that complain about
mixed unicode/bytes.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 31 Jan 2020 21:00:56 +0000 (22:00 +0100)]
compat: directly assign bytes_from_uint = chr
This is significantly faster than the indirection and
seems to reduce the runtime of the test suite by about
3% on my machine with python 2 (76.121s -> 73.725s,
but I tried only once which is clearly not enough.)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 13 Dec 2019 20:55:53 +0000 (21:55 +0100)]
BaseConn: let _read/_readline raise NotImplementedError
This way, it's easier to understand the code, since these
functions aren't referenced without existing in BaseConn.
Also change has_input() to raise NotImplementedError instead
of trying to instantiate NotImplemented - the latter is just
a singleton to return from the rich comparison methods.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Jan 2020 23:21:01 +0000 (00:21 +0100)]
git/client/server: remove rev_list() count support
This is obviously not used, as passing count!=None would
crash the client method (client.py doesn't import Integral).
Rather than fixing that, just remove support for it entirely.
While at it, also clean up a duplicate rev_list_invocation()
call in the server.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Jan 2020 19:41:10 +0000 (20:41 +0100)]
save: remove pointless metalist check
The metalist can never be empty, since at every level we add
at least the directory's own metadata; it may not have actual
metadata, but there's an entry all the time.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 25 Jan 2020 23:17:38 +0000 (00:17 +0100)]
save/vfs: update comments wrt. tree/bupm ordering
After looking into this and thinking about it, the comments here
are a bit misleading - save states the entries must be in a given
order without a rationale, and vfs states that the order is wrong
but gives an explanation that's not quite right.
Update both comments to make this clearer, and to document that
there's no inherent reason, just happened to pick something when
the save code was written, which turned out to be not the best.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Johannes Berg [Sat, 25 Jan 2020 22:13:17 +0000 (23:13 +0100)]
tests: add test for save encountering duplicates
Add a test for save encountering duplicates in the index, both
for a file and a directory, which was fixed in the previous patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 25 Jan 2020 22:41:56 +0000 (23:41 +0100)]
save: don't confuse metadata on duplicate files
If there are duplicate files, save removes the duplicates but
doesn't remove the duplicates from metadata, so the metadata
list gets messed up wrt. the file list. Fix this.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 25 Jan 2020 20:34:52 +0000 (21:34 +0100)]
save: minor code cleanups
Move some code around to avoid double ifs and nested ifs
where elif chains can be used instead.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 25 Jan 2020 21:21:24 +0000 (22:21 +0100)]
test-save-errors: add tests for inaccessible metadata
Add test cases for when metadata for a file and directory cannot
be read (causing an IOError instead).
Note that this test fails if the previous two patches aren't
applied.
Note also that if such an error ever happened without the patch
to save, the repository is essentially corrupt and there no way
to figure out _which_ file didn't get metadata.
Note that this uses some python trickery to force an IOError in
metadata reading.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
[rlb@defaultvalue.org: name test in commit summary line] Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 18 Jan 2020 21:26:16 +0000 (22:26 +0100)]
git: fix PackIdxList keeping deleted files open
When an midx is deleted underneath bup, usually by itself running
'bup midx --auto', then PackIdxList may keep them open. This can
cause bup to run out of disk space easily since these files can
be fairly big, and can be recreated multiple times in a backup
run.
To fix this, remove any open PackMidx instances from the list and
close them explicitly.
Out of an abundance of caution, also explicitly close the bloom
instance if we have one - the same issue should apply here even if
I couldn't observe it, since the GC isn't guaranteed to clean up
the object immediately.
I remember debugging this issue years ago without coming to any
good conclusion, and it's been mentioned on the mailing list a few
times as well, e.g.
https://groups.google.com/d/msg/bup-list/AqIyv9n9WPE/-Wl2JVh5AQAJ
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 8 Feb 2020 20:10:58 +0000 (21:10 +0100)]
perf-glance: make compatible with python==python3
If python is python3, then print is a function. Fix the code
to make it a function since python2 doesn't really care (in
this particular case where we just have a single argument).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Fri, 3 Jan 2020 21:23:54 +0000 (15:23 -0600)]
pwdgrp: pass strings to python for python 3
Python 3's getpwnam and getgrnam functions only accept unicode
strings, so decode the bytes we have as iso-8859-1, which is what they
should be, given bup-python's LC_CTYPE override.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Fri, 3 Jan 2020 20:27:09 +0000 (14:27 -0600)]
hashsplit: replace join_bytes with cat_bytes
Add a C cat_bytes that can concatenate two bytes objects with offsets
and extents. This allows us to have the same implementation for
python 2 and 3, to drop another use of buffer(), and may be handy in
the future, particularly given the expense of getting a buffer offset
in python 3 (i.e. memoryview() adds about ~200 bytes).
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Thu, 2 Jan 2020 21:30:28 +0000 (15:30 -0600)]
fuse: adjust for python 3 and test there
The python 3 version could have issues until the fuse module supports
binary data more completely (e.g. bytes paths), or until we switch to
some other foundation, but it may be OK even so (with some
inefficiency) given our bup-python iso-8859-1 hack.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>