decompress_blob_zstd() would allocate ever bigger buffers in a loop trying to
get a buffer big enough to decompress the input data. This is wasteful, since
we can just query the size of the decompressed data from the compressed header.
Worse, it doesn't work when the output size is capped, i.e. when dst_max != 0.
If the decompressed blob happened to be bigger than dst_max, decompression
would fail with -ENOBUFS. We need to use "stream decompression" instead, and
only get min(uncompressed size, dst_max) bytes of output.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856037 in a second way.
SD_JOURNAL_FOREACH_DATA() and SD_JOURNAL_FOREACH_UNIQUE() would immediately
terminate when a field couldn't be accessed. This can happen for example when a
field is compressed with an unavailable compression format. But it's likely
that this is the wrong thing to do: the caller for example might want to
iterate over the fields but isn't interested in all of them. coredumpctl is
like this: it uses SD_JOURNAL_FOREACH_DATA() but only uses a subset of the
fields.
Add two new functions sd_journal_enumerate_good_data() and
sd_journal_enumerate_good_unique() that retry sd_journal_enumerate_data() and
sd_journal_enumerate_unique() if the return value is something that applies to
a single field: ENOBUS, E2BIG, EOPNOTSUPP.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856037.
An alternative would be to make the macros themselves smarter instead of adding
new symbols, and do the looping internally in the macro. I don't like that
approach for two reasons. First, it would embed the logic in the macro, so
recompilation would be required if we decide to update the logic. With the
current version of the patch, recompilation is required to use the new symbols,
but after that, library upgrades are enough. So the current approach is safer
in case further updates are needed. Second, our headers use primitive C, and it
is hard to do the macros without using newer features.
Even with the new keyed hash table journal feature: if an attacker
manages to get access to the journal file id it could synthesize records
that result in hash collisions. Let's rotate automatically when we
notice that, so that a new journal file ID is generated, our performance
is restored and the attacker has to guess a new file ID before being
able to trigger the issue again.
That said, untrusted peers should never get access to journal files in
the first case...
This adds a new (incompatible) feature to journal files: if enabled the
hash function used for the hash tables is no longer jenkins hash with a
zero key, but siphash keyed by the file uuid that is included in the
file header anyway. This should make our hash tables more robust against
collision attacks, as long as the attacker has no read access to the
journal files. We switch from jenkins to siphash simply because it's
more well-known and we standardize for the rest of our codebase onto it.
This is hardening in order to make collision attacks harder for clients
that can forge log messages but have no read access to the logs. It has
no effect on clients that have read access.
Let's prefix this with "jenkins_" since it wraps the jenkins hash. We
want to add support for other hash functions to journald soon, hence
better be clear with what this is. In particular as all other symbols
defined by lookup3.h actually are prefixed "jenkins_".
Let's clean this up a bit, following our usual nomenclature to name
return parameters ret-xyz.
This is mostly a bit of renaming, but there's also some minor other
changes: if we return a pointer to a mmap'ed object plus its offset, in
almost all cases we are happy if either parameter is NULL in case the
caller is not interested in it. Let's fix the remaining case to do this
too, to minimize surprises.
The object flags field is a bitmask, hence don't sloppily define
_OBJECT_COMPRESSED_MAX as one mor than the previous flag. That worked OK
as long as we only had two flags, but will fall apart as soon as we have
three. Let's fix this.
(It's kinda sloppy how the string table is built here, as it will be
quite sparse as soon as we have more enum entries, but let's keep it for
now.)
This way journalctl can make use of libqrencode if it's there, but will
quietly not use it if it isn't.
This means libqrencode remains a build-time dep, but not a strict
runtime dependency.
I figure we should do something similar for a bunch of other "leaf"
libraries we only use few symbols of. Specifically the following are
probably good candidates:
* pcre2
* libpwquality
* p11kit
* elfutils
and possibly:
* libcryptsetup (only in some parts. i.e. building systemd-cryptsetup
without it makes no sense. However building the dissect option with
libcryptsetup as optional dep does make sense)
* possibly the compression libraries (at least the ones we never use for
compression, but only as alternative ones for decompression)
Already covered like this is:
* libxkcommon
Presently, CLI utilities such as systemctl will check whether they have a tty
attached or not to decide whether to parse /proc/cmdline or EFI variable
SystemdOptions looking for systemd.log_* entries.
But this check will be misleading if these tools are being launched by a
daemon, such as a monitoring daemon or automation service that runs in
background.
Make log handling of CLI tools uniform by never checking /proc/cmdline or EFI
variables to determine the logging level.
Furthermore, introduce a new log_setup_cli() shortcut to set up common options
used by most command-line utilities.
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.
Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
poll() sets POLLNVAL inside of the poll structures if an invalid fd is
passed. So far we generally didn't check for that, thus not taking
notice of the error. Given that this specific kind of error is generally
indication of a programming error, and given that our code is embedded
into our projects via NSS or because people link against our library,
let's explicitly check for this and convert it to EBADF.
(I ran into a busy loop because of this missing check when some of my
test code accidentally closed an fd it shouldn't close, so this is a
real thing)
This is a follow-up for 9f83091e3c.
Instead of reading the mtime off the configuration files after reading,
let's do so before reading, but with the fd we read the data from. This
is not only cleaner (as it allows us to save one stat()), but also has
the benefit that we'll detect changes that happen while we read the
files.
This also reworks unit file drop-ins to use the common code for
determining drop-in mtime, instead of reading system clock for that.
It annoyed me for quite a while that running "journalctl --file=…" on a
file that is not readable failed with a "File not found" error instead
of a permission error. Let's fix that.
We make this work by using the GLOB_NOCHECK flag for glob() which means
that files are not accessible will be returned in the array as they are
instead of being filtered away. This then means that our later attemps
to open the files will fail cleanly with a good error message.
- If length of formatted string >= LONG_LINE_MAX then return -ENOBUFS
- Normal Case:
- length of formatted string < POSIX defined LINE_MAX
- Allocate sbuf to accomodate the message
- Rare case:
- LINE_MAX < length of formatted string < LONG_LINE_MAX
- Allocate the required length using alloca()
Let's introduce an explicit line ending marker for line endings due to
pid change.
Let's also make sure we don't get confused with buffer management.
Fixes: #15654
If the previous received buffer length is almost equal to the allocated
buffer size, before this change the next read can only receive a couple
of bytes (in the worst case only 1 byte), which is not efficient.
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
HUGE_SIZE was defined inconsistently.
> In file included from ../src/basic/alloc-util.h:9,
> from ../src/journal/test-compress.c:9:
> ../src/journal/test-compress.c: In function ‘main’:
> ../src/journal/test-compress.c:280:33: error: ‘HUGE_SIZE’ undeclared (first use in this function)
> 280 | assert_se(huge = malloc(HUGE_SIZE));
Don't assume that 4MB can be allocated from stack since there could be smaller
DefaultLimitSTACK= in force, so let's use malloc(). NUL terminate the huge
strings by hand, also ensure termination in test_lz4_decompress_partial() and
optimize the memset() for the string.
Some items in /proc and /etc may not be accessible to poor unprivileged users
due to e.g. SELinux, BOFH or both, so check for EACCES and EPERM.
/var/tmp may be a symlink to /tmp and then path_compare() will always fail, so
let's stick to /tmp like elsewhere.
/tmp may be mounted with noexec option and then trying to execute scripts from
there would fail.
Detect and warn if seccomp is already in use, which could make seccomp test
fail if the syscalls are already blocked.
Unset $TMPDIR so it will not break specifier tests where %T is assumed to be
/tmp and %V /var/tmp.
Our journal code is generally supposed to be written in a fashion that
the underlying file can be deallocated any time, i.e. our mmap of it
suddenly becomes all zeroes. The idea is that we catch that when parsing
everything. For that to work safely we need to make sure that when doing
arithmetics or comparisons on values read from the map we don't run into
TTOCTTOU issues when determining validity. Hence we need to copy out the
values before use and operate on the copies. This requires some special
care since the C compiler could suppress our copies as optimization.
Hence use the new READ_NOW() macro to force a copy by using memcpy(),
and use it whenever we start doing an arithmetic operation on it, or
validity checking of multiple steps.
Fixes: #14943
Mappings canbe replaced by all zeroes under our feet if vacuuming
decides to unallocate some file. Hence let's not check for this kind of
stuff in an assert.
(Typically, we should genreate runtime errors in this case, in
particular EBADMSG, which the callers generally look for. But in this
case this is just an extra precaution check anyway, so let's just remove
it.)
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
Let's make it optional whether auditing is enabled at journald start-up
or not.
Note that this only controls whether audit is enabled/disabled in the
kernel. Either way we'll still collect the audit data if it is
generated, i.e. if some other tool enables it, we'll collect it.
Fixes: #959
journal_file_fstat() returns an error if we call it on already unlinked
journal file and hence we never reach remove_file_real() which is the
entire point.
I must have made some mistake while testing the fix that got me thinking
the issue is gone while opposite was true.
Fixes#14695
This regression was introduced in #14913.
The current_file variable can be NULL, as, for example, with the
following commands:
* journalctl --list-boots
* journalctl -b -1 --no-pager
Since current_file is only checked for pointer equality with f, removing
the assertion is safe here.
When having a service which intentionally outputs multiple equal lines,
all these messages might be inserted with the same timestamp.
journalctl has a mechanism to avoid duplicate lines, which might be in
different journal files.
This patch allows duplicate lines, if they are from the same file.
It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().
socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
.msg_namelen was set to a bogus value before we actually stored the path in the
the structure. sockaddr_un_set_path() returns the length, so just use that.
Fixes#14799.
If we have exit on idle, then operations such as "journalctl
--namespace=foo --rotate" should work even if the journal daemon is
currently not running.
(Note that we don't do activation by varlink for the main instance of
journald, I am not sure the deadlocks it might introduce are worth it)
If we do, we operate on a separate set of logs and runtime objects
The namespace is configured via argv[1].
Fixes: #12123Fixes: #10230#9519
(These latter two issues ask for slightly different stuff, but the
usecases generally can be solved by running separate instances of
journald now, hence also declaring that as "Fixes:")