Commit Graph

59 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek e4a321fc08 journal/compress: remove loop in decompress_startswith_zstd()
This should be more efficient with no downsides. Same considerations as in the
previous commit hold.
2020-07-21 17:42:15 +02:00
Zbigniew Jędrzejewski-Szmek a24153279e journal/compress: fix zstd decompression with capped output size
decompress_blob_zstd() would allocate ever bigger buffers in a loop trying to
get a buffer big enough to decompress the input data. This is wasteful, since
we can just query the size of the decompressed data from the compressed header.
Worse, it doesn't work when the output size is capped, i.e. when dst_max != 0.
If the decompressed blob happened to be bigger than dst_max, decompression
would fail with -ENOBUFS. We need to use "stream decompression" instead, and
only get min(uncompressed size, dst_max) bytes of output.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856037 in a second way.
2020-07-21 17:42:15 +02:00
Zbigniew Jędrzejewski-Szmek b4a11ca3f2 journal: use -EPROTONOSUPPORT for unknown compression
We might add more compression types in the future, and we should treat that
as unsupported, and not a format error.
2020-07-21 17:42:15 +02:00
Zbigniew Jędrzejewski-Szmek 8ab0f03266 journal/compress: drop "future" code in zstd compression
We generally don't include stuff that is not used. This can be
easily ressurected if ever needed.

Fixes CID#1430210.
2020-07-07 12:06:26 +02:00
Lennart Poettering 8653185a9e journal: support zstd compression for large objects in journal files 2020-06-25 15:02:18 +02:00
Lennart Poettering e9ece6a0e3 journal: fix definition of _OBJECT_COMPRESSED_MAX
The object flags field is a bitmask, hence don't sloppily define
_OBJECT_COMPRESSED_MAX as one mor than the previous flag. That worked OK
as long as we only had two flags, but will fall apart as soon as we have
three. Let's fix this.

(It's kinda sloppy how the string table is built here, as it will be
quite sparse as soon as we have more enum entries, but let's keep it for
now.)
2020-06-25 15:00:37 +02:00
Norbert Lange ef5924aa31 coredump: add zstandard support for coredumps
this will hook libzstd into coredump,
using this format as default.
2020-05-04 10:59:43 +02:00
Yu Watanabe 455fa9610c tree-wide: drop string.h when string-util.h or friends are included 2019-11-04 00:30:32 +09:00
Lennart Poettering 4094c4bfb7 journal: properly read unaligned le64 integers
Fixes: #13051

Replaces: #13064
2019-07-16 15:22:26 +02:00
Zbigniew Jędrzejewski-Szmek ca78ad1de9 headers: remove unneeded includes from util.h
This means we need to include many more headers in various files that simply
included util.h before, but it seems cleaner to do it this way.
2019-03-27 11:53:12 +01:00
Zbigniew Jędrzejewski-Szmek e41ef6fd00 journal: adapt for new improved LZ4_decompress_safe_partial()
With lz4 1.8.3, this function can now decompress partial results into a smaller
buffer. The release news don't say anything interesting, but the test case that
was previously failing now works OK.

Fixes #10259.

A test is added. It shows that with *older* lz4, a partial decompression can
occur with the returned size smaller then the requested number of bytes _and_
smaller then the size of the compressed data:

(lz4-libs-1.8.2-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 4194304
Decompressed partial 1/1 → -2 (bad)
Decompressed partial 2/2 → -2 (bad)
Decompressed partial 3/3 → -2 (bad)
Decompressed partial 4/4 → -2 (bad)
Decompressed partial 5/5 → -2 (bad)
Decompressed partial 6/6 → 6 (good)
Decompressed partial 7/7 → 6 (good)
Decompressed partial 8/8 → 6 (good)
Decompressed partial 9/9 → 6 (good)
Decompressed partial 10/10 → 6 (good)
Decompressed partial 11/11 → 6 (good)
Decompressed partial 12/12 → 6 (good)
Decompressed partial 13/13 → 6 (good)
Decompressed partial 14/14 → 6 (good)
Decompressed partial 15/15 → 6 (good)
Decompressed partial 16/16 → 6 (good)
Decompressed partial 17/17 → 6 (good)
Decompressed partial 18/18 → -16459 (bad)

(lz4-libs-1.8.3-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 12
Decompressed partial 1/1 → 1 (good)
Decompressed partial 2/2 → 2 (good)
Decompressed partial 3/3 → 3 (good)
Decompressed partial 4/4 → 4 (good)
...

If we got such a short "successful" decompression in decompress_startswith() as
implemented before this patch, we could be confused and return a false negative
result. But it turns out that this only occurs with small output buffer
sizes. We use greedy_realloc() to manager the buffer, so it is always at least
64 bytes. I couldn't hit a case where decompress_startswith() would actually
return a bogus result. But since the lack of proof is not conclusive, the code
for *older* lz4 is changed too, just to be safe. We cannot rule out that on a
different architecture or with some unlucky compressed string we could hit this
corner case.

The fallback code is guarded by a version check. The check uses a function not
the compile-time define, because there was no soversion bump in lz4 or new
symbols, and we could be compiled against a newer lz4 and linked at runtime
with an older one. (This happens routinely e.g. when somebody upgrades a subset
of distro packages.)
2018-10-30 11:04:51 +01:00
Zbigniew Jędrzejewski-Szmek e0a1d4b049 Drop support for lz4 < 1.3.0
lz4-r130 was released on May 29th, 2015. Let's drop the work-around for older
versions. In particular, we won't test any new code against those ancient
releases, so we shouldn't pretend they are supported.
2018-10-29 21:54:42 +01:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering 5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Zbigniew Jędrzejewski-Szmek 2504834861 journal: avoid undefined behaviour in float division by 0.0
Coverity says that's undefined. I'm pretty sure we always would get a nan, but
let's avoid (formally) undefined behaviour since that can cause compilers to do
strange things.
2017-11-28 21:34:50 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering 4aa1d31c89 Merge pull request #6974 from keszybz/clean-up-defines
Clean up define definitions
2017-10-04 19:25:30 +02:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00
Zbigniew Jędrzejewski-Szmek 349cc4a507 build-sys: use #if Y instead of #ifdef Y everywhere
The advantage is that is the name is mispellt, cpp will warn us.

$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build

squash! build-sys: use #if Y instead of #ifdef Y everywhere

v2:
- fix incorrect setting of HAVE_LIBIDN2
2017-10-04 12:09:29 +02:00
Zbigniew Jędrzejewski-Szmek 691b90d465 journal: fix warning about LZ4_compress_limitedOutput 2016-12-10 13:52:49 -05:00
Zbigniew Jędrzejewski-Szmek 8e170d2909 compress: fix gcc warnings about void* used in arithmetic
src/journal/compress.c: In function ‘compress_blob_lz4’:
src/journal/compress.c:115:49: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
         r = LZ4_compress_limitedOutput(src, dst + 8, src_size, (int) dst_alloc_size - 8);
                                                 ^
src/journal/compress.c: In function ‘decompress_blob_xz’:
src/journal/compress.c:179:35: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
                 s.next_out = *dst + used;
                                   ^
src/journal/compress.c: In function ‘decompress_blob_lz4’:
src/journal/compress.c:218:37: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
         r = LZ4_decompress_safe(src + 8, out, src_size - 8, size);
                                     ^
src/journal/compress.c: In function ‘decompress_startswith_xz’:
src/journal/compress.c:294:38: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
                 s.next_out = *buffer + *buffer_size - s.avail_out;
                                      ^
src/journal/compress.c:294:53: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
                 s.next_out = *buffer + *buffer_size - s.avail_out;
                                                     ^
src/journal/compress.c: In function ‘decompress_startswith_lz4’:
src/journal/compress.c:327:45: warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]
         r = LZ4_decompress_safe_partial(src + 8, *buffer, src_size - 8,
                                             ^

LZ4 and XZ functions use char* and unsigned char*, respectively,
so keep void* in our internal APIs and add casts.
2016-04-02 18:58:21 -04:00
Elias Probst 82e24b0068
Use `PRIu64` to print `uint64_t` in log msgs 2016-02-29 23:00:21 +01:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Lennart Poettering afd806fc48 Merge pull request #1607 from keszybz/lz4-remove-v1
Remove the old version of the lz4 stream compressor
2016-01-20 17:24:59 +01:00
Zbigniew Jędrzejewski-Szmek d487b81513 journal: fix reporting of output size in compres_stream_lz4
The header is 7 bytes, and this size was not accounted for in
total_out. This means that we could create a file that was 7 bytes
longer than requested, and the debug output was also inconsistent.
2015-12-13 15:00:19 -05:00
Zbigniew Jędrzejewski-Szmek 5d6f46b6bf journal: add dst_allocated_size parameter for compress_blob
compress_blob took src, src_size, dst and *dst_size, but dst_size
wasn't used as an input parameter with the size of dst, but only as an
output parameter. dst was implicitly assumed to be at least src_size-1.

This code wasn't *wrong*, because the only real caller in
journal-file.c got it right. But it was misleading, and the tests in
test-compress.c got it wrong, and worked only because the output
buffer happened to be the same size as input buffer. So add a seperate
dst_allocated_size parameter to make it explicit what the size of the
buffer is, and to allow test to proceed with different output buffer
sizes.
2015-12-13 14:54:47 -05:00
Zbigniew Jędrzejewski-Szmek 1f4b467daa journal: in some cases we have to decompress the full lz4 field
lz4 has to decompress a whole "sequence" at a time. When the compressed
data is composed of a repeating pattern, the whole set of repeats has
do be docompressed, and the output buffer has to be big enough.

This is unfortunate, because potentially the slowdown is very big. We
are only interested in the field name, but we might have to decompress
the whole thing. But the full cost will be borne out only when the
full entry is a repeating pattern. In practice this shouldn't happen
(apart from tests and the like). Hopefully lz4 will be fixed to avoid
this problem, or it will grow a new function which we can use [1], so
this fix should be remporary.

[1] https://groups.google.com/d/msg/lz4c/_3kkz5N6n00/oTahzqErCgAJ
2015-12-13 14:54:47 -05:00
Zbigniew Jędrzejewski-Szmek b3aa622929 lz4: fix size check which had no chance of working on big-endian 2015-12-02 09:50:01 -05:00
Lennart Poettering b5efdb8af4 util-lib: split out allocation calls into alloc-util.[ch] 2015-10-27 13:45:53 +01:00
Lennart Poettering 8b43440b7e util-lib: move string table stuff into its own string-table.[ch] 2015-10-27 13:25:56 +01:00
Lennart Poettering c004493cde util-lib: split out IO related calls to io-util.[ch] 2015-10-26 01:24:38 +01:00
Tom Gundersen 7c8871d315 Merge pull request #1654 from poettering/util-lib
Various changes to src/basic/
2015-10-25 14:22:43 +01:00
Lennart Poettering 3ffd4af220 util-lib: split out fd-related operations into fd-util.[ch]
There are more than enough to deserve their own .c file, hence move them
over.
2015-10-25 13:19:18 +01:00
Lennart Poettering 07630cea1f util-lib: split our string related calls from util.[ch] into its own file string-util.[ch]
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.

This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.

Also touches a few unrelated include files.
2015-10-24 23:05:02 +02:00
Lennart Poettering 0240c60369 journal: irrelevant coding style fixes 2015-10-24 15:08:15 +02:00
Zbigniew Jędrzejewski-Szmek 8e64dd1ecf compress: remove the lz4 v1 compression
This was the original lz4 file header, custom in systemd, that was
not compatible with the lz4 binary. It was not compiled in by default,
and was only used for coredumps stored as files on disk. It is safe to
remove it after a transition period in which coredumps have been
rotated.
2015-10-23 09:46:23 -04:00
Zbigniew Jędrzejewski-Szmek 5146f9f065 compress: return errors without logging, do not fake errno
Logging for compression and decompression is assymetrical on purpose:
if compiled without some type of compression, those compression code
paths should never be invoked. OTOH, it is possible to encounter
unsupported format on decompression, so leave those log_debug statements
in, to make it easier to diagnose stuff.
2015-10-14 21:24:36 -04:00
Zbigniew Jędrzejewski-Szmek e068517205 compress: fix mmap error handling 2015-10-14 10:15:27 -04:00
Zbigniew Jędrzejewski-Szmek 4b5bc5396c coredump: use lz4frame api to compress coredumps
This converts the stream compression to use the new lz4frame api,
compatible with lz4cat. Previous code used custom headers, so the
compressed file was not compatible with lz4 command line tools.
I considered this the last blocker to using lz4 by default.

Speed seems to be reasonable, although a bit (a few percent) slower
than the lz4 binary, even though compression is the same. I don't
consider this important. It could be caused by the overhead of library
calls, but is probably caused by slightly different buffer sizes or
such. The code in this patch uses mmap, since since this allows the
buffer to be reused while not making the code more complicated at all.
In my testing, this version is noticably faster (~20%) than a naive
single-buffered version. mmap can cause the program to be killed with
SIGBUS, if the underlying file is truncated or a disk error occurs. We
only use this from within coredump and coredumpctl, so I don't
consider this an issue.

Old decompression code is retained and is used if the new code fails
indicating a format error. There have been reports of various smaller
distributions using previous lz4 code, i.e. the old format, and it is
nice to provide backwards compatibility. We can remove the legacy code
in a few versions.

The way that blobs are compressed in the journal is not affected.
2015-10-10 23:05:21 -04:00
Lennart Poettering 59f448cf15 tree-wide: never use the off_t unless glibc makes us use it
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.

Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
2015-09-10 18:16:18 +02:00
Zbigniew Jędrzejewski-Szmek a6dcc7e592 Introduce loop_read_exact helper
Usually when using loop_read(), we want to read the full buffer.
Add a helper that mirrors loop_write(), and returns 0 when full buffer
was read, and an error otherwise.

Use -ENODATA for the short read, to distinguish it from a read error.
2015-03-09 22:10:54 -04:00
Thomas Hindoe Paaboel Andersen 2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Zbigniew Jędrzejewski-Szmek 1fa2f38f0f Assorted format fixes
Types used for pids and uids in various interfaces are unpredictable.
Too bad.
2015-01-22 01:14:52 -05:00
Zbigniew Jędrzejewski-Szmek 553acb7b6b treewide: sanitize loop_write
loop_write() didn't follow the usual systemd rules and returned status
partially in errno and required extensive checks from callers. Some of
the callers dealt with this properly, but many did not, treating
partial writes as successful. Simplify things by conforming to usual rules.
2014-12-09 21:36:08 -05:00
Evangelos Foutras b4232628f3 journal/compress: use LZ4_compress_continue()
We can't use LZ4_compress_limitedOutput_continue() because in the
worst-case scenario the compressed output can be slightly bigger than
the input block. This generally affects very few blocks and is no reason
to abort the compression process.

I ran into this when I noticed that Chromium core dumps weren't being
compressed. After switching to LZ4_compress_continue() a ~330MB Chromium
core dump gets compressed to ~17M.
2014-08-30 17:41:15 -04:00
Zbigniew Jędrzejewski-Szmek fa1c4b518e Fix misuse of uint64_t as size_t
They have different size on 32 bit, so they are really not interchangable.
2014-08-03 23:53:49 -04:00
Zbigniew Jędrzejewski-Szmek 01c3322e01 compress: fix return value 2014-07-18 21:44:36 -04:00
Zbigniew Jędrzejewski-Szmek 3b1a55e110 Fix build without any compression enabled 2014-07-11 10:42:27 -04:00