This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
I think it's nice to mark the test as skipped instead of omitting
it entirely, hence #ifdefs in the code instead of excluding the test
in Makefile.am/meson.build.
The patch is not minimal, but a function to parse size_t is probably
going to come in handy in other places, so I think it's nicer to define
a proper parsing function than to open-code the cast.
compress_blob took src, src_size, dst and *dst_size, but dst_size
wasn't used as an input parameter with the size of dst, but only as an
output parameter. dst was implicitly assumed to be at least src_size-1.
This code wasn't *wrong*, because the only real caller in
journal-file.c got it right. But it was misleading, and the tests in
test-compress.c got it wrong, and worked only because the output
buffer happened to be the same size as input buffer. So add a seperate
dst_allocated_size parameter to make it explicit what the size of the
buffer is, and to allow test to proceed with different output buffer
sizes.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
If both lz4 and xz are enabled, this results in a limit of
2×3×2 s ~= 12 s runtime.
Previous implementation started with really small buffer sizes. When
combined with a short time limit this resulteded in abysmal results for xz.
It seems that the initialization overead is really significant for small
buffers. Since xz will not be used by default anymore, this does not
seem worth fixing. Instead buffer sizes are changed to run a
pseudo-random non-repeating pattern. This should allow reasonable testing
for all buffer sizes. For testing, both runtime and the buffer size seed
can be specified on the command line. Sufficiently large runtime allows
all buffer sizes up to 1MB to be tested.
Existing test would use highly-compressible repeatable
input. Two types of input are added:
- zeros
- random blocks interspersed with zeros
The idea is to get more information about behaviour in various cases.
On Intel Xeon the results are:
% ./test-compress-benchmark
XZ/zeros: compressed & decompressed 2535301373 bytes in 32.56s (74.26MiB/s), mean compresion 99.96%, skipped 3160 bytes
LZ4/zeros: compressed & decompressed 2535304362 bytes in 1.16s (2088.69MiB/s), mean compresion 99.60%, skipped 171 bytes
XZ/simple: compressed & decompressed 2535300963 bytes in 30.42s (79.48MiB/s), mean compresion 99.95%, skipped 3570 bytes
LZ4/simple: compressed & decompressed 2535303543 bytes in 1.22s (1978.86MiB/s), mean compresion 99.60%, skipped 990 bytes
XZ/random: compressed & decompressed 381756649 bytes in 60.02s (6.07MiB/s), mean compresion 39.64%, skipped 27813723 bytes
LZ4/random: compressed & decompressed 2507385036 bytes in 0.97s (2477.52MiB/s), mean compresion 54.77%, skipped 27919497 bytes
If someone has ideas for more realistic test cases, they can be easily
added to this framework.
This is useful to test the behaviour of the compressor for various buffer
sizes.
Time is limited to a minute per compression, since otherwise, when LZ4
takes more than a second which is necessary to reduce the noise, XZ
takes more than 10 minutes.
% build/test-compress-benchmark (without time limit)
XZ: compressed & decompressed 2535300963 bytes in 794.57s (3.04MiB/s), mean compresion 99.95%, skipped 3570 bytes
LZ4: compressed & decompressed 2535303543 bytes in 1.56s (1550.07MiB/s), mean compresion 99.60%, skipped 990 bytes
% build/test-compress-benchmark (with time limit)
XZ: compressed & decompressed 174321481 bytes in 60.02s (2.77MiB/s), mean compresion 99.76%, skipped 3570 bytes
LZ4: compressed & decompressed 2535303543 bytes in 1.63s (1480.83MiB/s), mean compresion 99.60%, skipped 990 bytes
It appears that there's a bug in lzma_end where it leaks 32 bytes.