This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
The warning is not emitted for absolute paths like /dev/sda or /home, which are
converted to .device and .mount unit names without any fuss.
Most of the time it's unlikely that users use invalid unit names on purpose,
so let's warn them. Warnings are silenced when --quiet is used.
$ build/systemctl show -p Id hello@foo-bar/baz
Invalid unit name "hello@foo-bar/baz" was escaped as "hello@foo-bar-baz" (maybe you should use systemd-escape?)
Id=hello@foo-bar-baz.service
$ build/systemd-run --user --slice foo-bar/baz --unit foo-bar/foo true
Invalid unit name "foo-bar/foo" was escaped as "foo-bar-foo" (maybe you should use systemd-escape?)
Invalid unit name "foo-bar/baz" was escaped as "foo-bar-baz" (maybe you should use systemd-escape?)
Running as unit: foo-bar-foo.service
Fixes#8302.
We update the boot ID whenever the file is opened for writing (i.e. set
to ONLINE stat), even if we never write a single entry to it. Hence,
don't insist that the last entry's boot ID matches the file header.
As pointed out by Matthijs van Duin:
https://lists.freedesktop.org/archives/systemd-devel/2018-March/040499.html
Previously the compression threshold was hardcoded to 512, which meant that
smaller values wouldn't be compressed. This left some storage savings on the
table, so instead, we make that number tunable.
Even if pager_open() fails, in general, we should continue the operations.
All erroneous cases in pager_open() show log message in the function.
So, it is not necessary to check the returned value.
"noreturn" is reserved and can be used in other header files we include:
[ 16s] In file included from /usr/include/gcrypt.h:30:0,
[ 16s] from ../src/journal/journal-file.h:26,
[ 16s] from ../src/journal/journal-vacuum.c:31:
[ 16s] /usr/include/gpg-error.h:1544:46: error: expected ‘,’ or ‘;’ before ‘)’ token
[ 16s] void gpgrt_log_bug (const char *fmt, ...) GPGRT_ATTR_NR_PRINTF(1,2);
Here we include grcrypt.h (which in turns include gpg-error.h) *after* we
"noreturn" was defined in macro.h.
At various places we only want to close fds if they are not
stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call
for that, and port everything over.
When running journalctl --user-unit=foo as an unprivileged user we could get
the usual hint:
Hint: You are currently not seeing messages from the system and other users.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
...
But with --user-unit our filter is:
(((_UID=0 OR _UID=1000) AND OBJECT_SYSTEMD_USER_UNIT=foo.service) OR
((_UID=0 OR _UID=1000) AND COREDUMP_USER_UNIT=foo.service) OR
(_UID=1000 AND USER_UNIT=foo.service) OR
(_UID=1000 AND _SYSTEMD_USER_UNIT=foo.service))
so we would never see messages from other users.
We could still see messages from the system. In fact, on my machine the
only messages with OBJECT_SYSTEMD_USER_UNIT= are from the system:
journalctl $(journalctl -F OBJECT_SYSTEMD_USER_UNIT|sed 's/.*/OBJECT_SYSTEMD_USER_UNIT=\0/')
Thus, a more correct hint is that we cannot see messages from the system.
Make it so.
Fixes#7887.
The Linux kernel exposes the birth time now for files through statx()
hence make use of it where available. We keep the xattr logic in place
for this however, since only a subset of file systems on Linux currently
expose the birth time. NFS and tmpfs for example do not support it. OTOH
there are other file systems that do support the birth time but might
not support xattrs (smb…), hence make the best of the two, in particular
in order to deal with journal files copied between file system types and
to maintain compatibility with older file systems that are updated to
newer version of the file system.
Let's add a common implementation for regular file checks, that are
careful to return the right error code (EISDIR/EISLNK/EBADFD) when we
are encountering a wrong file node.
Let's make sure we aren't confused if a journal file is replaced by a
different one (for example due to rotation) if we are in a q overflow:
let's compare the inode/device information, and if it changed replace
any open file object as needed.
Fixes: #8198
Let's be more careful with the naming, and indicate that the function
is about *named* journal files, and will validate the name as needed.
(in opposition to add_any_file() which doesn't care about names)
Coverity now started warning about this ("Calling unlinkat without checking
return value (as is done elsewhere 12 out of 15 times).", and it is right:
most of the time we should at list print a log message so people can figure
out something is wrong when this happens.
v2:
- use warning level in journald too (this is unlikely to happen ever, so it
should be safe to something that is visible by default).
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
(Lennart: slightly reworked version, that dropped some of the commenting
which was solved otherwise)
In that case we have no inotify fd yet, and there's nothing to process
hence. Let's make the call a NOP.
(Previously, without this change we'd end up trying to read off inotify
fd -1, which is quite a problem... 😢)
This ensures that clients can't keep all files pinned interfering with
our vacuuming logic.
This should fix the last issue pointed out in #7998 and #8032Fixes: #7998
This adds proper handling of IN_Q_OVERFLOW: when the inotify queue runs
over we'll reiterate all directories we are looking at. At the same time
we'll mark all files and directories we encounter that way with a
generation counter we first increased. All files and directories not
marked like this are then unloaded.
With this logic we do the best when the inotify queue overflows: we
synchronize our in-memory state again with what's on disk.
This contains some refactoring of the directory logic, to share more
code between uuid directories and "root" directories and generally make
things a bit more readable by splitting things up into smaller bits.
See: #7998#8032
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
**Warning**
This change does not appear to solve the case of a "paused" output
stream. If somebody is using `journalctl | less` and pauses the
output, then without a background thread periodically listening for
inotify delete events and cleaning up, journal logs will eventually
stop flowing in cases where a journal client with enough open files
causes the "free" disk space threshold to be crossed.
LOG_FAC() is the general way to extract the logging facility (when it has
been combined with the logging priority).
LOG_FACMASK can be used to mask off the priority so you only have the
logging facility bits... but to get the logging facility e.g. LOG_USER,
you also have to bitshift it as well. (The priority is in the low bits,
and so only requires masking).
((priority & LOG_FACMASK) == LOG_KERN) happens to work only because
LOG_KERN is 0, and hence has the same value with or without the bitshift.
Code that relies on weird assumptions like this could make it harder to
realize how the logging values are treated.
Let the journal capture messages emitted by systemd, before it ran
exec("/usr/lib/systemd/systemd-journald"). Usually such messages will only
appear with `systemd.log_level=debug`. kmsg lines written after the exec()
will be ignored as before.
In other words, we are avoiding reading our own lines, which start
"systemd-journald[100]: " assuming we are PID 100. But now we will start
allowing ourself to read lines which start "systemd[100]: ", or any other
prefix which is not "systemd-journald[100]: ".
So this can't help you see messages when we fail to exec() journald :). But,
it makes it easier to see what the pre-exec() messages look like in
the successful case. Comparing messages like this can be useful when
debugging. Noticing weird omissions of messages, otoh, makes me anxious.
Red is used for highligting, the same as grep does. Except when the line is
highlighted red already, because it has high priority, in which case plain ansi
highlight is used for the matched substring.
Coloring is implemented for short and cat outputs, and not for other types.
I guess we could also add it for verbose output in the future.
Case sensitive or case insensitive matching can be requested using
--case-sensitive[=yes|no].
Unless specified, matching is case sensitive if the pattern contains any
uppercase letters, and case insensitive otherwise. This matches what
forward-search does in emacs, and recently also --ignore-case in less. This
works surprisingly well, because usually when one is wants to do case-sensitive
matching, the pattern is usually camel-cased. In the less frequent case when
case-sensitive matching is required with an all-lowercase pattern,
--case-sensitive can be used to override the automatic logic.
This changes real_journal_next() to leverage the IteratedCache for
accelerating iteration across the open journal files.
journalctl timing comparisons with 100 journal files of 8MiB size
party to this boot:
Pre (~v235):
# time ./journalctl -b --no-pager > /dev/null
real 0m9.613s
user 0m9.560s
sys 0m0.053s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.548s
user 0m9.525s
sys 0m0.023s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.612s
user 0m9.582s
sys 0m0.030s
Post-IteratedCache:
# time ./journalctl -b --no-pager > /dev/null
real 0m8.449s
user 0m8.425s
sys 0m0.024s
# time ./journalctl -b --no-pager > /dev/null
real 0m8.409s
user 0m8.382s
sys 0m0.027s
# time ./journalctl -b --no-pager > /dev/null
real 0m8.410s
user 0m8.350s
sys 0m0.061s
~12.5% improvement, the benefit increases the more log files there are.
Previously, we'd refuse open journal files with suffixes that aren't
either .journal or .journal~. With this change we only care when we are
creating the journal file.
I looked over the sources to see whether we ever pass files discovered
by directory enumeration to journal_file_open() without first checking
the suffix (in which case the old check made sense), but I couldn't find
any. hence I am pretty sure removing this check is safe.
Fixes: #7972
This removes LOG_TARGET_SAFE. It's made redundant by the new
"prohibit-ipc" logging flag, as it used to have a similar effect: avoid
logging to the journal/syslog, i.e. any local services in order to avoid
deadlocks when we lock from PID 1 or its utility processes (such as
generators).
All previous users of LOG_TARGET_SAFE are switched over to the new
setting. This makes things a bit safer for all, as not even the
SYSTEMD_LOG_TARGET env var can be used to accidentally log to the
journal anymore in these programs.
Apparently O_NONBLOCK is the modern name used in most documentation and
for most cases in our sources. Let's hence replace the old alias
O_NDELAY and stick to O_NONBLOCK everywhere.