_hashsplit.c: switch rollsum_roll() to a macro instead of an inline function.
gcc 4.3's optimizer manages to fail at optimizing the inline, but works okay
with the macro.
Mysteriously, if find_ofs() is *not* static (and therefore presumably
*harder* to optimize), the optimizer works either way. But removing the
static is just wrong, so use the macro instead.
The difference in speed is about 53 megs/sec vs 80 megs/sec on my machine
for this command:
bup random 100M 2>/dev/null | bup split -N --bench