Rob Browning [Sat, 11 Sep 2021 21:02:19 +0000 (16:02 -0500)]
configure: add --with-pylint=[yes|no|maybe] defaulting to maybe
When set to no, don't run pylint from ./pylint, just exit
successfully. When set to maybe, use dev/have-pylint to figure out if
pylint is available, and if so, run it, otherwise exit successfully
after describing the situation. When set to yes, always try to run
pylint.
This may be useful more generally, but in particular, it makes it
possible to run this:
./configure --with-pylint=maybe
make check-both
in situations where pylint is available for say python 3, but not
python 2.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 27 Aug 2021 20:23:37 +0000 (22:23 +0200)]
test: add pylint and test imports
Add pylint and test for unnecessary imports as a first step, which
requires cleaning them all up.
[rlb@defaultvalue.org: add and use ./pylint]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 7 Aug 2021 17:41:55 +0000 (12:41 -0500)]
Use intprops for INTEGRAL_ASSIGNMENT_FITS INTEGER_TO_PY uadd
Rewrite INTEGRAL_ASSIGNMENT_FITS, INTEGER_TO_PY, and uadd using
intprops, which also avoids needing all the custom (and historically
fragile) compiler option manipulations.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 1 Aug 2021 20:47:07 +0000 (15:47 -0500)]
config.vars.in: add CC so a configure CC will be the default
Add CC to config.vars.in (included by GNUMakefile) so that the CC
detected by ./configure will actually become the default. Previously
it would still reflect GNU make's default, even after something like
this:
CC=/some/where/gcc ./configure
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 18 Dec 2019 21:03:58 +0000 (22:03 +0100)]
git: teach git_config_get() to read from a file
We want to use git_config_get() to have a bup config file
in the future, so teach git_config_get() to read from a
file and add some tests for it.
Use this also to test the opttype conversions from the
previous patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Thu, 9 Jan 2020 12:27:35 +0000 (13:27 +0100)]
git: use git's int parsing with pack.packSizeLimit config
Our parse_num() understands a little more than git, in particular
* T for terabytes
* b suffix when you specify Kb, Mb, Gb or Tb.
Neither of those are understood by git, it only understands the
K, M and G suffixes (case-insensitive). However, a git repository
that actually states 'pack.packSizeLimit' as something that git
doesn't understand is broken for every single git command, and as
we call git, the added flexibility of parse_num() cannot be used.
Thus, simplify the code and just use opttype='int' for this.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 8 Jan 2020 21:48:25 +0000 (22:48 +0100)]
git: allow config retrieval as bool or int
Allow retrieving configuration values as bool or int,
letting 'git config' normalize all the values instead
of trying to do it ourselves in bup.
Also use git config --null to avoid having to worry about
any whitespace issues, although .strip() on the output
would probably work just as well. Previously, the result
would include a terminating newline written out by the
git config command, which is not desirable.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Apr 2020 21:11:16 +0000 (23:11 +0200)]
hashsplit: increase READ_SIZE to 8 MB
It's not really possible to run bup with tiny amounts of memory,
so reading 1 MB or 8 MB doesn't make a significant difference
here.
However, python actually implements read() as mmap() (at least
on my Linux system), with the requested read size given to mmap
as the size. The kernel then doesn't appear to do any readahead
(which makes sense), which kills performance.
Even if this wasn't the case though, read() of 8MB isn't much
of an issue, so increase the size.
Note that 8 MB is also the size for the fadvise() code.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Jan 2020 20:24:09 +0000 (21:24 +0100)]
save: refactor the code using helper classes
Instead of keeping three lists, refactor the code to use
a helper class for a folder being collected, and an item
in that folder being collected.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Mon, 1 Feb 2021 23:01:47 +0000 (00:01 +0100)]
bup: make demux errors clearer
If we hit the
assert(n <= MAX_PACKET)
in the demux code then quite likely we've been reading
something that shouldn't even be demuxed, but got some
error message from ssh instead of a muxed connection
record.
Clarify this case a bit by printing out the data that
we got (under the assumption it's an error message) and
raising Exception("Connection broken") instead of just
asserting.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 5 Feb 2020 21:24:01 +0000 (22:24 +0100)]
ls: read metadata only if needed
If we just want to get a list of files, there's no point in
reading metadata, since it simply isn't needed. Pass want_meta
to the appropriate functions only when we need the metadata,
and similarly call vfs.augment_item_meta() only then.
Note that the previous vfs change really made this effective,
we'd otherwise lose the information.
Note also that unfortunately this is necessary even for the
--file-type command line option because otherwise we cannot
identify FIFOs; for everything else the git mode appears to
be sufficient.
Together, this reduces the number of blob reads from 62 to 8
for 'bup ls' on a trivial folder (the bup git's root folder).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sun, 15 Mar 2020 21:38:15 +0000 (22:38 +0100)]
vfs: improve cache behaviour
If _commit_item_from_oid() is called first with require_meta=False,
and then again with require_meta=True, the second and further calls
will not use the cache, as the cached entry is without metadata.
Improve this by overwriting the cache entry if it couldn't be used.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 5 Feb 2020 21:21:18 +0000 (22:21 +0100)]
vfs: read metadata only if needed
If we get into contents(), we have an indication of whether or
not metadata is needed (the want_meta argument), but it doesn't
get passed down to revlist_items() and further down, so that in
cache_commit() we eventually call _revlist_item_from_oid() with
metadata always. This is wasteful.
Fix this by passing the information all the way down.
To make the caching work properly in this case, store a special
entry in the revlist dict (_HAS_META_ENTRY) indicating whether
or not metadata was cached, and if that's not set on a later
lookup, then don't return it if it doesn't have metadata but we
need the metadata.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Thu, 31 Dec 2020 22:10:05 +0000 (23:10 +0100)]
Test symlink target changes between stat and readlink
Test the changes in "metadata: fix symlink stat() vs. readlink() race".
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: adjust commit message; rework to use injection] Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Thu, 31 Dec 2020 22:29:58 +0000 (23:29 +0100)]
metadata: fix symlink stat() vs. readlink() race
We might stat() the file and get some size, but by the time
we readlink() the link has changed and the size is something
else.
Arguably, we can't really avoid races here if we don't have
a consistent snapshot of the filesystem to save, however, in
this case we later get an assertion failure when the data is
read back from the index (or repo), and it's easy to avoid.
Set the size from the actually recorded symlink target.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Thu, 31 Dec 2020 22:10:05 +0000 (23:10 +0100)]
Test file size changes during save
Test the changes in "save: fix race in data vs. metadata size".
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: adjust commit message; rework to use injection] Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Thu, 31 Dec 2020 22:20:53 +0000 (23:20 +0100)]
save: fix race in data vs. metadata size
Fix the race in data vs. metadata size by overriding the metadata
size with what we actually read and stored into the repository.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 30 Dec 2020 22:10:19 +0000 (23:10 +0100)]
midx: make passing idx files along with --dir work
If you do something like
bup midx --dir /some/dir/ ...
with some idx files ("..."), then things don't actually work
properly because --dir gets ignored in this case. Fix that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Python 2 doesn't support them, and at least right now, regardless of
what the current python docs suggest, at the moment, they're just
trivial wrappers around malloc and free.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sun, 17 May 2020 20:03:24 +0000 (22:03 +0200)]
web: omit '.' link
There's no value in offering a link to go to the same place
you're already at, it's just clutter. Should it really be
necessary there's always a browser refresh instead.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Thu, 31 Dec 2020 22:33:03 +0000 (23:33 +0100)]
save: fix symlink target race
If the symlink target changes while save is running, we can end up
reading one target and recording it in the metadata, but then
recording a different one (with perhaps a different size) in the
blob.
Fix this by not reading the symlink again, but just using the one
we already have in the metadata.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 1 Jan 2021 10:04:59 +0000 (11:04 +0100)]
test: test index vs. save file type change
In this case, the item shouldn't be recorded at all.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 2 Jan 2021 19:58:44 +0000 (20:58 +0100)]
save: skip/balk if an entry changed type since indexing
If an entry changed file type (link, regular file, ...) since
indexing, then all kinds of weird things can happen. Skip the
item in such cases and record an error.
This also requires adjusting the test that actually provokes
a failure in metadata read - now we don't store the file at
all, so modify the test to account for that behaviour in the
check.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Wed, 22 Jul 2020 19:35:32 +0000 (21:35 +0200)]
bup: add a test for index contents
Add a test that checks that our index files are exactly
identical to the ones git would write. This is slightly
dangerous as git might introduce a new format, but that
seems unlikely now, it's been stable for a long time;
if it happens we can deal with it then.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Johannes Berg [Wed, 22 Jul 2020 06:59:32 +0000 (08:59 +0200)]
PackWriter: match git's pack names
As reported by Jamie Wyrick, git appears to use the sha1 of
the entire pack file as the (default) name for it, not the
sha1(sorted-object-list) as the git-index-pack man page seems
to imply, since it says:
Once the index has been created, the list of object names is
sorted and the SHA-1 hash of that list is printed to stdout.
If --stdin was also used then this is prefixed by either
"pack\t", or "keep\t" if a new .keep file was successfully
created. This is useful to remove a .keep file used as a lock
to prevent the race with git repack mentioned above.
while also saying:
If <pack-file> is not specified, the pack is written to
objects/pack/ directory of the current Git repository with a
default name determined from the pack content.
Originally, git-index-pack was used by bup, and when that was
changed, the naming convention was kept, presumably according
to this documentation; see commit 4cab9ab71fff ("Write idxs
directly rather than using git-index-pack.")
Change things to use the pack's entire content sha1, that we
calculate anyway, as the name.
This requires updating the 'gc' test to not compare by name
but by (index) content.
Also add a test that our behaviour matches git's.
Reported-by: Jamie Wyrick <terrifiedquack80@gmail.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Johannes Berg [Fri, 31 Jan 2020 22:29:09 +0000 (23:29 +0100)]
metadata: implement write using encode, not vice versa
Using write() for encode() is slower due to the use of BytesIO,
and we never really stream the data out anyway since it is part
of an object that we build, so it's all in memory.
Thus, implement write() using encode() rather than the other
way around.
Together with the previous patches, this speeds up encoding the
metadata by about 54%, and indexing by about 15-20% (my system goes
from ~16.3k paths/sec to ~19.7k paths/sec when all the filesystem
data is already buffered in memory).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Sat, 1 Feb 2020 21:47:09 +0000 (22:47 +0100)]
vint: implement the typical pack() in C
This estimates the size, and if we run over it falls back to the
slower python, and similarly if anything happens with the vuints/vints
contained in there, but it still speeds up the typical case
significantly, and improves raw index speed (on cached filesystem
data) by about 14% (~19.7k paths/sec to ~22.5k paths/sec).
Since it doesn't handle corner cases (very long strings or very large
numbers) add a few tests that cover the fallback to python.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: switch to PyMem_Raw*; rename to
limited_vint_pack; use Py_ssize_t with PyTuple_GET_SIZE] Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 31 Jan 2020 22:27:14 +0000 (23:27 +0100)]
vint: implement encoding in C
The vuint/vint encoding is quite slow, possibly due to the use
of BytesIO and byte-wise writing, for what's really almost always
a fixed-size buffer since we typically don't deal with values
that don't fit into a 64-bit signed integer.
Implement this in C to make it faster, at least for those that
fit into signed 64-bit integers (long long, because python makes
overflow checking for unsigned 64-bit integers hard), and if the
C version throws an OverflowError, fall back to the python code.
Also add a test for something that doesn't actually fit into a
64-bit integer.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: adjusted bup_vuint_encode and bup_vint_encode
to just directly return the PyBytes_FromStringAndSize result] Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Fri, 7 May 2021 20:06:34 +0000 (22:06 +0200)]
GNUmakefile: use correct dev/python path when installing bup
$(bup_python) no longer exists, we should use dev/python.
But since we change working directory, that needs a fully
qualified path.
Reported-by: Quaddle Me <quaddle.me@gmail.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
[rlb@defaultvalue.org: use $(CURDIR) instead of $(PWD); add . target dir] Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Jul 2020 20:42:46 +0000 (22:42 +0200)]
ls: make multiple arguments match real ls
Currently, passing multiple arguments to ls causes it to print
them all in a single list, which can be very confusing as it'll
even columnate them together.
Make this match real ls behaviour (at least as observed on my
system) that prints which path it's giving the output for.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Jul 2020 20:33:43 +0000 (22:33 +0200)]
ftp: print pwd as part of the prompt
Print the current directory as part of the prompt.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Tue, 28 Jul 2020 20:38:01 +0000 (22:38 +0200)]
ftp: honour pwd for ls
Honour the current working directory for 'ls' by changing
ls.within_repo() to get the pwd, and using posixpath to
build the correct path to do the ls for.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Johannes Berg [Mon, 28 Dec 2020 21:56:02 +0000 (22:56 +0100)]
git: remove get_commit_dates()
This function is unused.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: likely orphaned by vfs overhaul]
Johannes Berg [Wed, 30 Dec 2020 22:00:31 +0000 (23:00 +0100)]
git: remove all_packdirs()
This really never made sense - if you have a local repository
and want to run midx/bloom on it, then it doesn't make sense
to touch the index caches for other remote repositories.
And if you want to operate manually on the cache for some odd
reason (it's maintained automatically) then you anyway need to
(and should) pass the --dir option for it explicitly.
Remove this to make things easier to reason about.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
[rlb@defaultvalue.org: previous behavior also didn't respect the
documentation in bup-bloom(1).]
Johannes Berg [Sun, 24 Jan 2021 20:22:32 +0000 (21:22 +0100)]
configure: fix readline.h detection
Unfortunately, readline.h requires stdio.h to be included first,
except where it has been patched (e.g. Debian). Do that in the
configure script so we correctly detect readline.h on systems
that have an unpatched readline version.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 3 Apr 2021 20:17:01 +0000 (15:17 -0500)]
Convert bup to binary executable and run python subcommands directly
Don't execute python subcommands in a child process, run them
directly. Convert bup (and dev/python, etc.) to be C executables.
See the commit messages, comments, and doc changes (e.g. DESIGN) for
more information.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 27 Mar 2021 22:05:39 +0000 (17:05 -0500)]
Redirect to GNU make when possible
Rename Makefile to GNUmakefile and add a Makefile that for at least
FreeBSD make will redirect the build to GNU make, and don't fail if
make is not GNU make without checking gmake.
Thanks to Greg Troxel for a reminder about the issue.
Signed-off-by: Rob Browning <rlb@defaultvalue.org Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 27 Mar 2021 18:38:56 +0000 (13:38 -0500)]
Move msg() to bup/io.[hc] in preparation for more sharing
We're going to need to share some python compatibility code between
dev/python and bup, etc., so create src/bup for the shared code, and
move msg() there.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 27 Mar 2021 17:56:21 +0000 (12:56 -0500)]
Rework compilation of binaries; prepare for automatic dependencies
Don't involve user-adjustable variables like CFLAGS until the final
compilation steps, in part so it's clearer where they fit in. Handle
all of bup's flags independently, and split them into separate
library (helpers) and executable (bin/bup, etc.) groups via
bup_shared_cflags, helpers_cflags, embed_cflags, etc.
Change the build rules to specify -fPIE for the executables, and
include all dependencies ($^) and use OUTPUT_OPTION in preparation for
automatic C dependency generation.
Add -Wno-unused-command-line-argument for clang.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sun, 21 Mar 2021 17:29:11 +0000 (12:29 -0500)]
Makefile: don't depend on dev/python to clean
"make clean" should always work, even if we can't build dev/python.
Thanks to Greg Troxel for reporting the problem, and Johannes Berg for
suggesting additional improvements.
Signed-off-by: Rob Browning <rlb@defaultvalue.org>
Rob Browning [Sat, 13 Feb 2021 19:14:19 +0000 (13:14 -0600)]
Stash the env PYTHONPATH during startup and restore it in main
We have to set the PYTHONPATH from bup.c so that Py_Main will be able
to find bup.*, but we don't want that change to affect subprocesses,
so stash the original PYTHONPATH in bup_main.env_pythonpath, and
restore the original environment value once we've made it to the
python side (in bup.main).
Thanks to Johannes Berg for pointing out some issues with a previous
version of the changes.
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>