If the watched process ends a push to the pipe without a newline at
the end, but with newlines in the middle, then sep_rx.split() will
return with multiple entries, the last of which will not end with a
newline and yet not be the empty string. This line prefix needs to be
stashed into the pending buffer, too.
This turns out to be exactly the same logic as if sep_rx.split had not
split the string, so eliminate one layer of conditionals.
This version incorporates feedback from Rob Browning to continue to
pass a list to extend().
Signed-off-by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
[rlb@defaultvalue.org: adjust commit summary and remove extra space in
"if split[0]" guard.] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Greg Troxel [Wed, 5 Jun 2019 20:55:32 +0000 (16:55 -0400)]
restore: create fifos with mkfifo, not mknod
I recently did a restore of a large bup backup, about 34G worth. All
worked well, including metadata, except that bup threw an exception on
restoring fifos (that I didn't need; they were in /var and were sockets
in use by daemons when the backup happened).
The problem was that mknod was being called for the fifo, and given only
two argumetns. mknod(2) on NetBSD says it takes three arguments.
mkfifo takes two. I am guessing that mknod in python calls mknod the OS
call, and on Linux somehow the third null argument works out somehow.
But it seems irregular to make a fifo with mknod.
I realize python is not POSIX, but mknod(2) requires three arguments:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html
It would be nice to have a test of backing up and restoring a fifo; that
would have caught this.
The following patch makes my restore go smoothly.
Signed-off-by: Greg Troxel <gdt@lexort.com>
[rlb@defaultvalue.org: adjust commit summary] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 13 Apr 2019 17:07:18 +0000 (12:07 -0500)]
rev_list: handle multiple results/ref from remote for custom formats
Previously a remote rev_list invocation would crash if a custom format
was provided and the remote produced multiple results for any input
ref. Change the client/server code to use a blank line to indicate
the end of the rev-list results. Of course, that means that the parse
function provided must be able to handle (consume) any blank lines
that its format string produces, which may preclude the use of some
format strings, but should be sufficient for now.
Adjust test-ls to (trivially) cover this case, and broaden the use of
the commit hash length tests in the code.
Thanks to Alex Roper for reporting the problem, providing an easy way
to reproduce it, and proposing a fix.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Fri, 12 Apr 2019 19:55:13 +0000 (14:55 -0500)]
Handle commit mergetags (at all)
Previously bup would just crash (i.e. bup ls) if it encountered a
commit with a mergetag header (apparently a new thing). For now,
adjust git.parse_commit to accept and ignore them as long as they only
appear as an optional, final header in the commit. That may or may
not turn out to be sufficient, and it does mean that for now we won't
be able to preserve mergetags (if we want to) whenever we rewrite
commits via bup gc, get, etc.
Rob Browning [Sun, 23 Mar 2014 17:41:06 +0000 (12:41 -0500)]
Add bup get; see the documentation for further information
WARNING: this is a new EXPERIMENTAL command that can (intentionally)
modify your data in destructive ways. Treat with caution.
Thanks to Karl Kiniger <karl.kiniger@med.ge.com> for helping track
down various bugs in earlier versions, and for noting that we might
want --verbose to be a little more effusive. And thanks to Patryck
Rouleau <prouleau72@gmail.com> for suggesting improvements to the
documentation.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 31 Mar 2018 20:32:52 +0000 (15:32 -0500)]
repo: avoid cyclic dependency with is_remote method
The current VFS operations (like resolve()) require a repo object, but
we're about to add a VFS resolve() method to the repos. In and of
itself, that isn't necessarily a problem, but as an optimization, we
want the VFS resolve() to be able to detect when the repo it's been
given is a RemoteRepo and redirect the call to remote_repo.resolve().
Doing so pushes the one single resolve() call to the remote instead of
executing the resolve() locally with a lot of individual calls to the
remote_repo's other methods.
Adding is_remote() makes that possible without having to 'import repo'
in the VFS (repo already imports vfs).
Perhaps we'll rework it later, but this will do for now.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 31 Mar 2018 19:14:43 +0000 (14:14 -0500)]
Move vfs resolve() tests to tresolve.py
Move resolve() tests from tvfs to tresolve, and the common tree_dict()
test code to a new test.vfs module, in preparation for more extensive
resolve() testing.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 12 Jan 2019 22:22:14 +0000 (16:22 -0600)]
test_resolve_loop: ensure exception is actually thrown
Make sure to resolve the correct path, and ensure the call never
returns. Previously when the path was wrong, and it *was* wrong, the
test would appear to succeed even though it wasn't actually testing
the intended ELOOP case.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 20 Oct 2018 23:04:27 +0000 (18:04 -0500)]
metadata: always add/store/retrieve size for links and normal files
This simplifies cases where we need to transmit Metadata
objects (i.e. bup-get's repo.resolve()), and it means that for trees
created using this new v3 format, retrieving the sizes of chunked
files should be notably less expensive, since they'll be directly
available in the directory's .bupm file.
Without that, we have to seek around in the chunked tree to find the
last byte (cf. vfs._normal_or_chunked_file_size).
Only store the sizes for symlinks and regular files (which might be
chunked) until it's clear that other st_sizes are useful.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Tue, 27 Feb 2018 07:32:54 +0000 (23:32 -0800)]
Replace lresolve with resolve(..., follow=False)
Although there's the NOFOLLOW precedent, this is really just to narrow
the API before we add it as a repo method, i.e. so we only have to
handle one function instead of two.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 9 Dec 2018 18:40:26 +0000 (12:40 -0600)]
vfs: change /save/latest back to a symlink to the latest save
The current, reworked vfs presents /somesave/latest as if it actually
is the latest commit. Change it back to a symlink to the latest save
to roughly match the previous behavior -- though now it's a link to
the save name, not to the (removed) /.commit/ subtree.
To restore the link, reintroduce the concept of a fake
symlink (i.e. one that has no corresponding blob in the repository).
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Mon, 3 Dec 2018 18:58:03 +0000 (12:58 -0600)]
Don't return invalid data for offset reads (observed via fuse)
Fix a serious bug in the vfs that could cause it to return invalid
data for chunked file read()s that didn't start at the beginning of
the file. The issue was first observed via fuse, which makes sense
given that it streams a file in chunks that (currently) each come from
independent, increasing seek-offset FileReaders.
The previous dropwhile() invocation in the _tree_chunks generator,
used to skip past chunks that were completely before the offset, was
simple but wrong, and would skip too far. Replace it with
_skip_chunks_before_offset().
Add randomized tests of both simple streaming reads, and seek offset
reads, which catch the problem, cover additional cases, and should
prevent regressions.
Thanks to voldial for reporting the problem and providing an easy way
to reproduce it.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 20 Oct 2018 17:35:11 +0000 (12:35 -0500)]
update-doc-branches: add command to update man and html
Create a new command to update the man and html branches, and move the
related code there from the Makefile.
Update the branches based on the current (clean) tree, rather than
consulting the git origin, and rely on ls-files rather than globbing
so that the file lists will always be correct -- we'll immediately
notice deletions, avoid picking up stray files in the directory, etc.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 30 Sep 2018 20:29:42 +0000 (15:29 -0500)]
Makefile: fix find -printf issue on FreeBSD
Apparently the use of -printf was causing the error: "printf: missing
format character" with FreeBSD 11.1-RELEASE. Change the helpers lib
count to rely on -print0 and tr, which should be more portable, but
still be undisturbed by unusual paths.
Thanks to Curtis Dunham for reporting the problem and proposing an
alternate solution.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 7 Jul 2018 17:20:47 +0000 (12:20 -0500)]
vfs: flatten resolution cache key
This will require more storage if there are a lot of lookups with the
same parent and differing paths, but otherwise, without more
intentional structure sharing among paths, this should be better, and
we can always revisit the arrangement later.
Serializing the parent path segments should also make sure the same
parent (semantically-speaking) contributes the same hash to the key.
Previously, Metadata objects could prevent that, given their trivial,
pointer-based hashes.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 2 Sep 2018 17:29:01 +0000 (12:29 -0500)]
Refuse to run by default if python version is not 2
Exit with an error if the python major version isn't 2, since we're
working on support for py3, and we'll probably reach a point before
we're finished where bup doesn't immediately crash with py3, but might
well do very bad things to the repository.
Allow overriding the check by setting
BUP_ALLOW_UNEXPECTED_PYTHON_VERSION=true so people can still test py3
if they like.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Julien Goodwin [Thu, 30 Aug 2018 09:57:18 +0000 (19:57 +1000)]
tindex: add 0o to current octal literal
Needed for python3.
This preserves current behaviour, but since I get no test failures if I
just chop of the leading 0 and make it a normal int there's a good
chance there's underlying breakage.
Signed-off-by: Julien Goodwin <jgoodwin@studio442.com.au>
[rlb@defaultvalue.org: adjust commit message summary] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Julien Goodwin [Thu, 30 Aug 2018 08:25:25 +0000 (18:25 +1000)]
Replace remaining print statements with print function
Also add the appropriate __future__ import statement.
Signed-off-by: Julien Goodwin <jgoodwin@studio442.com.au>
[rlb@defaultvalue.org: memtest-cmd.py: remove a few preexisting
trailing spaces that were carried over; adjust commit summary] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Fri, 6 Jul 2018 17:01:55 +0000 (12:01 -0500)]
vfs: include unique repo id in resolution cache key
...since resolve() currently requires a full parent path and the root
refs are only applicable to a particular repository.
Use differing integers to identify repositories that may be
independent (with respect to refs, tags, etc.), and use (typically
small) integers rather than the repo path/address so that they'll be
short if we want to embed them directly in cache keys later.
Use realpath() for local repositories in order to detect when the same
repository is reachable by multiple paths. (Something similar could
eventually be done for remotes.)
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 6 May 2018 16:46:45 +0000 (11:46 -0500)]
vfs: cache resolve() calls to improve (fuse) performance
Include resolve() results in the vfs cache. This substantially
improves fuse "cat somefile" performance. (Observed a ~2x rate
improvement with a 500MB urandom file).
This appears to be due to the fact that fuse read(path, offset, len)
is called many times for the file, resulting in many corresponding,
redundant resolve(path) calls.
The previous fuse implementation, based on the previous vfs had its
own cache, but moving the caching to the vfs should be more generally
helpful.
Now bup fuse will again ignore repository changes that affect paths it
has already examined. This matches its behavior in the current stable
release (0.29.1).
Thanks to voldial for reporting the problem.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Robert Evans [Sun, 29 Apr 2018 10:47:30 +0000 (06:47 -0400)]
Add bup split --noop <--blobs|--tree>
This prints the resulting id without storing in the repo.
Signed-off-by: Robert Evans <evansr@google.com>
[rlb@defaultvalue.org: remove trailing period from commit summary] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Patrick Rouleau [Mon, 23 Apr 2018 01:47:37 +0000 (21:47 -0400)]
fix t/root-status for CygWin
Signed-off-by: Patrick Rouleau <prouleau72@gmail.com> Tested-by: Patrick Rouleau <prouleau72@gmail.com> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 31 Mar 2018 16:53:24 +0000 (11:53 -0500)]
is_superuser: test for group 544 or 0 on cygwin
This appears to be the appropriate way to check for admin status in
cygwin right now: https://cygwin.com/ml/cygwin/2015-02/msg00057.html
Thanks to at least Andrew Skretvedt, Ruvim Pinka, renpj, and Iar De
for reporting the problem, Ruvim Pinka, Paul Kronenwetter, and renpj
for proposing earlier solutions, and Ben Kelly and Johannes Berg for
helping test this approach.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 31 Mar 2018 18:24:48 +0000 (13:24 -0500)]
buptest: base testing subproc funcs ex() and exo() on run()
Rework the subprocess functions, providing ex() and exo() as concise
test functions that print the commands they're executing. Base them
on a common, lower-level run() function.
Drop exc() since ex() and exo() both check the exit status by default.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Tue, 27 Mar 2018 04:09:38 +0000 (23:09 -0500)]
Add "AND CONTRIBUTORS" to options.py license
From https://groups.google.com/d/msg/bup-list/8lcxXuXilTE/JC9rm69oAQAJ
From: Avery Pennarun <apenwarr@gmail.com>
Date: Sun, 25 Mar 2018 21:35:58 -0400
Message-ID: <CAHqTa-2LB2mqscnzZmmixeTbR86BPh=FTk3UyGQKjPwaQPrZ3g@mail.gmail.com>
Subject: Re: bupsplit.c copyright and patching
To: Rob Browning <rlb@defaultvalue.org> Cc: Robert Evans <evansr@google.com>, bup-list <bup-list@googlegroups.com>
On Sun, Mar 25, 2018 at 1:48 PM, Rob Browning <rlb@defaultvalue.org> wrote:
> Avery Pennarun <apenwarr@gmail.com> writes:
>> On Sun, Mar 18, 2018 at 11:52 AM, Rob Browning <rlb@defaultvalue.org> wrote:
>>> Avery Pennarun <apenwarr@gmail.com> writes:
>>> So for the record, it sounds like you approve of changing the relevant
>>> bupsplit.h and bupsplit.c phrase to this?
>>>
>>> THIS SOFTWARE IS PROVIDED BY AVERY PENNARUN AND CONTRIBUTORS``AS IS'' AND ANY
>>
>> Yes, that's fine.
>
> And perhaps you'd approve of making the same change to options.py as
> well?
Good idea. Let's do that :)
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 25 Mar 2018 17:19:53 +0000 (12:19 -0500)]
Add "AND CONTRIBUTORS" to bupsplit.h and bupsplit.c licenses
From https://groups.google.com/d/msg/bup-list/8lcxXuXilTE/UyOe7VGuCQAJ
From: Avery Pennarun <apenwarr@gmail.com>
Date: Sun, 18 Mar 2018 18:35:15 -0400
Message-ID: <CAHqTa-1ghU6+Y0Y2pBOjbS=7CWKMytPvj-c1Z0aE3=PqpPi1OA@mail.gmail.com>
Subject: Re: bupsplit.c copyright and patching
To: Rob Browning <rlb@defaultvalue.org> Cc: Robert Evans <evansr@google.com>, bup-list <bup-list@googlegroups.com>
...
> So for the record, it sounds like you approve of changing the relevant
> bupsplit.h and bupsplit.c phrase to this?
>
> THIS SOFTWARE IS PROVIDED BY AVERY PENNARUN AND CONTRIBUTORS``AS IS'' AND ANY
Yes, that's fine.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Tue, 27 Mar 2018 04:29:17 +0000 (23:29 -0500)]
test-sparse-files: check sparse file size more carefully
Test the actual sparse file with du, not the parent restore
directory. Unnecessarily testing the whole tree broke the tests on a
docker system because the directories themselves were large.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 27 Jan 2018 21:19:24 +0000 (15:19 -0600)]
main: don't set stdio to nonblocking
This was left in the newliner replacement patch, and I can't remember
a justification for it. Having it in there affects other processes,
so take it out.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 27 Jan 2018 17:40:02 +0000 (11:40 -0600)]
Use absolute_import from the __future__ everywhere
Without this, among other things, we can end up with conflicts with
new upstream modules. For example, given lib/bup/io.py:
Traceback (most recent call last):
File "/home/rlb/src/bup/main-4/cmd/bup-index", line 10, in <module>
from bup import metadata, options, git, index, drecurse, hlinkdb
File "/home/rlb/src/bup/main-4/lib/bup/metadata.py", line 10, in <module>
from io import BytesIO
ImportError: cannot import name BytesIO
This switch also revealed a circular dependency between midx and git,
and several places where we weren't qualifying our bup imports
properly, i.e. "import git" rather than "from bup import git".
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>