From: Rob Browning Date: Tue, 10 Sep 2019 06:56:02 +0000 (-0500) Subject: Prevent Python 3 from interfering with argv bytes X-Git-Tag: 0.31~242 X-Git-Url: https://arthur.barton.de/gitweb/?p=bup.git;a=commitdiff_plain;h=ff935e1abef2ebe89a809c100edc7931523f3349 Prevent Python 3 from interfering with argv bytes Python 3 insists on treating all arguments as Unicode, and and if the bytes don't fit, it shoehorns them in anyway[1]. We need the raw, original bytes in many cases (paths being the obvious example), and Python claims they can be extracted via fsdecode. But experimentation with 3.7 has demonstrated that while this is necessary, it is not sufficient to handle all possible binary arguments in at least a UTF-8 locale. The interpreter may crash at startup with some (randomly generated) argument values: Fatal Python error: _PyMainInterpreterConfig_Read: memory allocation failed ValueError: character U+134bd2 is not in range [U+0000; U+10ffff] Current thread 0x00007f2f0e1d8740 (most recent call first): Traceback (most recent call last): File "t/test-argv", line 28, in out = check_output(cmd) File "/usr/lib/python3.7/subprocess.py", line 395, in check_output **kwargs).stdout File "/usr/lib/python3.7/subprocess.py", line 487, in run output=stdout, stderr=stderr) To fix that, always set the encoding to ISO-8859-1 before launching Python, which should hopefully (given that ISO-8859-1 is a single-byte "pass through" encoding), prevent Python from interfering with the arguments. Add t/test-argv to perform randomized testing for clean argv pass-through. At the moment, with Python 3.7.3, if I disable the code before the python exec in cmd/bup-python, this test eventually provokes the crash above (though not on every run). [1] https://www.python.org/dev/peps/pep-0383/ Thanks to Aaron M. Ucko for pointing out LC_ALL had been overlooked in an earlier version of this patch, and would have undone the adjustments. Signed-off-by: Rob Browning Tested-by: Rob Browning --- diff --git a/Makefile b/Makefile index e5e2b67..8fbc251 100644 --- a/Makefile +++ b/Makefile @@ -175,6 +175,7 @@ cmdline_tests := ifeq "2" "$(bup_python_majver)" cmdline_tests += \ + t/test-argv \ t/test-ftp \ t/test-save-restore \ t/test-packsizelimit \ diff --git a/cmd/python-cmd.sh b/cmd/python-cmd.sh index cce1b8b..30a8434 100644 --- a/cmd/python-cmd.sh +++ b/cmd/python-cmd.sh @@ -13,6 +13,25 @@ done script_home="$(cd "$(dirname "$cmdpath")" && pwd -P)" cd "$top" +# Force python to use ISO-8859-1 (aka Latin 1), a single-byte +# encoding, to help avoid any manipulation of data from system APIs +# (paths, users, groups, command line arguments, etc.) + +# Preserve for selective use +if [ "${LC_CTYPE+x}" ]; then export BUP_LC_CTYPE="$LC_CTYPE"; fi +if [ "${LC_ALL+x}" ]; then + export BUP_LC_ALL="$LC_ALL" + export LC_COLLATE="$LC_ALL" + export LC_MONETARY="$LC_ALL" + export LC_NUMERIC="$LC_ALL" + export LC_TIME="$LC_ALL" + export LC_MESSAGES="$LC_ALL" + unset LC_ALL +fi + +export PYTHONCOERCECLOCALE=0 # Perhaps not necessary, but shouldn't hurt +export LC_CTYPE=ISO-8859-1 + bup_libdir="$script_home/../lib" # bup_libdir will be adjusted during install export PYTHONPATH="$bup_libdir${PYTHONPATH:+:$PYTHONPATH}" diff --git a/lib/bup/compat.py b/lib/bup/compat.py index f713f52..47d0fa0 100644 --- a/lib/bup/compat.py +++ b/lib/bup/compat.py @@ -12,6 +12,7 @@ py3 = py_maj >= 3 if py3: + from os import fsencode from shlex import quote range = range str_type = str @@ -27,6 +28,10 @@ if py3: def items(x): return x.items() + def argv_bytes(x): + """Return the original bytes passed to main() for an argv argument.""" + return fsencode(x) + def bytes_from_uint(i): return bytes((i,)) @@ -87,6 +92,10 @@ else: # Python 2 def items(x): return x.iteritems() + def argv_bytes(x): + """Return the original bytes passed to main() for an argv argument.""" + return x + def bytes_from_uint(i): return chr(i) diff --git a/t/echo-argv-bytes b/t/echo-argv-bytes new file mode 100755 index 0000000..99a145e --- /dev/null +++ b/t/echo-argv-bytes @@ -0,0 +1,21 @@ +#!/bin/sh +"""": # -*-python-*- +bup_python="$(dirname "$0")/../cmd/bup-python" || exit $? +exec "$bup_python" "$0" ${1+"$@"} +""" +# end of bup preamble + +from __future__ import absolute_import, print_function + +from os.path import abspath, dirname +from sys import stdout +import os, sys + +script_home = abspath(dirname(sys.argv[0] or '.')) +sys.path[:0] = [abspath(script_home + '/../lib'), abspath(script_home + '/..')] + +from bup.compat import argv_bytes + +for arg in [argv_bytes(x) for x in sys.argv]: + os.write(stdout.fileno(), arg) + os.write(stdout.fileno(), b'\0\n') diff --git a/t/test-argv b/t/test-argv new file mode 100755 index 0000000..2742364 --- /dev/null +++ b/t/test-argv @@ -0,0 +1,29 @@ +#!/bin/sh +"""": # -*-python-*- +bup_python="$(dirname "$0")/../cmd/bup-python" || exit $? +exec "$bup_python" "$0" ${1+"$@"} +""" +# end of bup preamble + +from __future__ import absolute_import, print_function + +from os.path import abspath, dirname +from random import randint +from subprocess import check_output +from sys import stderr, stdout +import sys + +script_home = abspath(dirname(sys.argv[0] or '.')) +sys.path[:0] = [abspath(script_home + '/../lib'), abspath(script_home + '/..')] + +from wvtest import wvcheck, wvfail, wvmsg, wvpass, wvpasseq, wvpassne, wvstart + +wvstart('command line arguments are not mangled') + +def rand_bytes(n): + return bytes([randint(1, 255) for x in range(n)]) + +for trial in range(100): + cmd = [b't/echo-argv-bytes', rand_bytes(randint(1, 32))] + out = check_output(cmd) + wvpasseq(b'\0\n'.join(cmd) + b'\0\n', out)