On amd64 sys.getsizeof(b'') is a bit under 40 bytes across python 2/3,
while a buffer(b'') adds 64 bytes in python 2, and the memoryview(b'')
that replaces it in python 3 adds 200. So just copy the bytes unless
the added overhead is worth it.
And while we're here, fix a few more python 3 str/bytes compatability
issues, and remove some redundant offset arithemetic by leaning on
range() a bit more. (Not that it likely matters, but aside from being
simpler, this is apparently more efficient too, because it moves more
of the work to C).
Signed-off-by: Rob Browning <rlb@defaultvalue.org> Tested-by: Rob Browning <rlb@defaultvalue.org>