The refactor in #9200 inadvertently dropped the variable assignment to
traverse the device and its hierarchy in add_matches_for_device().
This was uncovered by Coverity (CID #1393310).
Fix that by restoring the assignment.
Tested: `journalctl /dev/sda` now filters journalctl output again.
This is mostly fall-out from d1a1f0aaf0,
however some cases are older bugs.
There might be more issues lurking, this was a simple grep for "%m"
across the tree, with all lines removed that mention "errno" at all.
The journal verification functions would reject such an entry. It would probably
still display fine (because we prefer _SOURCE_REALTIME_TIMESTAMP= if present), but
it seems wrong to create an entry that would not pass verification.
The code to open journal files seems like the wrong place to enforce this. We
already check during boot and refuse to boot if machine-id is missing, no need
to enforce this here. In particular, it seems better to write logs from
journald even if they are not completely functional rather than refuse to
operate at all, and systemd-journal-remote also writes journal files and may
even be run on a system without systemd at all.
The docker image that oss-fuzz uses has an empty /etc/machine-id. Obviously
this is an error in the docker, but docker is fact of life, and it seems better
for systemd-journal-remote to work in such an incomplete environment.
In journal_file_set_online() the offline thread doesn't need to be
joined if it's been canceled before actually reaching the phase of
writing the offline state.
When dealing with a large number of template instances, for example
when launching daemons per VRF, it is hard for operators to correlate
log lines to arguments.
Add a new with-unit mode which, if available, prefixes unit and user
unit names when displaying its log messages instead of the syslog
identifier. It will also use the full timestamp with timezones, like
the short-full mode.
Most our other parsing functions do this, let's do this here too,
internally we accept that anyway. Also, the closely related
load_env_file() and load_env_file_pairs() also do this, so let's be
systematic.
This makes most header files easier to look at. Also Emacs gets really
slow when browsing through large sections of overly long prototypes,
which is much improved by this macro.
We should probably not do something similar with too many other cases,
as macros like this might help readability for some, but make it worse
for others. But I think given the complexity of this specific prototype
and how often we use it, it's worth doing.
This simplifies the use of tempfiles in tests and fixes "leaked"
temporary files in test-fileio, test-catalog, test-conf-parser.
Not the whole tree is converted.
Configuration through environment variable is inconvenient with meson, because
they cannot be convieniently changed and/or are not preserved during
reconfiguration (https://github.com/mesonbuild/meson/issues/1503).
This adds -Dvalgrind=true/false, which has the advantage that it can be set
at any time with meson configure -Dvalgrind=... and ninja will rebuild targets
as necessary. Additional minor advantages are better consistency with the
options for hashmap debugging, and typo avoidance with '#if' instead of '#ifdef'.
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.
In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.
Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.
Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.
Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
It might happen that we try to bisect through a chain of offset arrays in the
journal whose last element was just allocated but no item yet written
to. In that case that array will be all NUL, but it might still end up
in our array chain cache. If it does, we cannot use it for bisection,
since for bisection we need to know the value of the first entry in that
array, but if it's uninitialized it does not have a first value. Hence,
as a simple fix, in this unlikely case, simply ignore the chain cache.
This is supposed to fix the issue pointed out in #8432, but in a more
permissive way, as this case isn't strictly a badly formatted journal
but actually a valid state (though one within a very short time window),
and we should make the best of it, and handle it gracefully.
Background: in each journal file entries are linked up in large arrays
of offsets. In each array the entries are strictly ordered by the
offsets of the entries, which permits search by bisection. These arrays
are allocated with a fixed size and then filled up as entries are added
to the journal file. If an array is fully filled up, a new array
(double in size as the old one) is appended to the journal file, and
linked up. This means, the journal file will contain a series of chained
up arrays, each time doubling in size, and strictly ordered. When
looking for an entry we maintain a "chain cache", which allows us to
bypass traversing the chain in full if we look for entries close to each
other in a short time. With the fix above we make sure we don't
erroneously use a chain cache item that doesn't carry enough information
for this bisection to work.
Original issue identified (with patch) by @Kxuan.
Replaces: #8432
This is similar to TAKE_PTR() but operates on file descriptors, and thus
assigns -1 to the fd parameter after returning it.
Removes 60 lines from our codebase. Pretty good too I think.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
This rearranges chase_symlinks() a bit: if no special flags are
specified it will now revert to behaviour before
b12d25a8d6. However, if the new
CHASE_TRAIL_SLASH flag is specified it will follow the behaviour
introduced by that commit.
I wasn't sure which one to make the beaviour that requires specification
of a flag to enable. I opted to make the "append trailing slash"
behaviour the one to enable by a flag, following the thinking that the
function should primarily be used to generate a normalized path, and I
am pretty sure a path without trailing slash is the more "normalized"
one, as the trailing slash is not really a part of it, but merely a
"decorator" that tells various system calls to generate ENOTDIR if the
path doesn't refer to a path.
Or to say this differently: if the slash was part of normalization then
we really should add it in all cases when the final path is a directory,
not just when the user originally specified it.
Fixes: #8544
Replaces: #8545
The warning is not emitted for absolute paths like /dev/sda or /home, which are
converted to .device and .mount unit names without any fuss.
Most of the time it's unlikely that users use invalid unit names on purpose,
so let's warn them. Warnings are silenced when --quiet is used.
$ build/systemctl show -p Id hello@foo-bar/baz
Invalid unit name "hello@foo-bar/baz" was escaped as "hello@foo-bar-baz" (maybe you should use systemd-escape?)
Id=hello@foo-bar-baz.service
$ build/systemd-run --user --slice foo-bar/baz --unit foo-bar/foo true
Invalid unit name "foo-bar/foo" was escaped as "foo-bar-foo" (maybe you should use systemd-escape?)
Invalid unit name "foo-bar/baz" was escaped as "foo-bar-baz" (maybe you should use systemd-escape?)
Running as unit: foo-bar-foo.service
Fixes#8302.
We update the boot ID whenever the file is opened for writing (i.e. set
to ONLINE stat), even if we never write a single entry to it. Hence,
don't insist that the last entry's boot ID matches the file header.
As pointed out by Matthijs van Duin:
https://lists.freedesktop.org/archives/systemd-devel/2018-March/040499.html
Previously the compression threshold was hardcoded to 512, which meant that
smaller values wouldn't be compressed. This left some storage savings on the
table, so instead, we make that number tunable.
Even if pager_open() fails, in general, we should continue the operations.
All erroneous cases in pager_open() show log message in the function.
So, it is not necessary to check the returned value.
"noreturn" is reserved and can be used in other header files we include:
[ 16s] In file included from /usr/include/gcrypt.h:30:0,
[ 16s] from ../src/journal/journal-file.h:26,
[ 16s] from ../src/journal/journal-vacuum.c:31:
[ 16s] /usr/include/gpg-error.h:1544:46: error: expected ‘,’ or ‘;’ before ‘)’ token
[ 16s] void gpgrt_log_bug (const char *fmt, ...) GPGRT_ATTR_NR_PRINTF(1,2);
Here we include grcrypt.h (which in turns include gpg-error.h) *after* we
"noreturn" was defined in macro.h.
At various places we only want to close fds if they are not
stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call
for that, and port everything over.
When running journalctl --user-unit=foo as an unprivileged user we could get
the usual hint:
Hint: You are currently not seeing messages from the system and other users.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
...
But with --user-unit our filter is:
(((_UID=0 OR _UID=1000) AND OBJECT_SYSTEMD_USER_UNIT=foo.service) OR
((_UID=0 OR _UID=1000) AND COREDUMP_USER_UNIT=foo.service) OR
(_UID=1000 AND USER_UNIT=foo.service) OR
(_UID=1000 AND _SYSTEMD_USER_UNIT=foo.service))
so we would never see messages from other users.
We could still see messages from the system. In fact, on my machine the
only messages with OBJECT_SYSTEMD_USER_UNIT= are from the system:
journalctl $(journalctl -F OBJECT_SYSTEMD_USER_UNIT|sed 's/.*/OBJECT_SYSTEMD_USER_UNIT=\0/')
Thus, a more correct hint is that we cannot see messages from the system.
Make it so.
Fixes#7887.
The Linux kernel exposes the birth time now for files through statx()
hence make use of it where available. We keep the xattr logic in place
for this however, since only a subset of file systems on Linux currently
expose the birth time. NFS and tmpfs for example do not support it. OTOH
there are other file systems that do support the birth time but might
not support xattrs (smb…), hence make the best of the two, in particular
in order to deal with journal files copied between file system types and
to maintain compatibility with older file systems that are updated to
newer version of the file system.
Let's add a common implementation for regular file checks, that are
careful to return the right error code (EISDIR/EISLNK/EBADFD) when we
are encountering a wrong file node.
Let's make sure we aren't confused if a journal file is replaced by a
different one (for example due to rotation) if we are in a q overflow:
let's compare the inode/device information, and if it changed replace
any open file object as needed.
Fixes: #8198
Let's be more careful with the naming, and indicate that the function
is about *named* journal files, and will validate the name as needed.
(in opposition to add_any_file() which doesn't care about names)
Coverity now started warning about this ("Calling unlinkat without checking
return value (as is done elsewhere 12 out of 15 times).", and it is right:
most of the time we should at list print a log message so people can figure
out something is wrong when this happens.
v2:
- use warning level in journald too (this is unlikely to happen ever, so it
should be safe to something that is visible by default).
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
(Lennart: slightly reworked version, that dropped some of the commenting
which was solved otherwise)
In that case we have no inotify fd yet, and there's nothing to process
hence. Let's make the call a NOP.
(Previously, without this change we'd end up trying to read off inotify
fd -1, which is quite a problem... 😢)
This ensures that clients can't keep all files pinned interfering with
our vacuuming logic.
This should fix the last issue pointed out in #7998 and #8032Fixes: #7998
This adds proper handling of IN_Q_OVERFLOW: when the inotify queue runs
over we'll reiterate all directories we are looking at. At the same time
we'll mark all files and directories we encounter that way with a
generation counter we first increased. All files and directories not
marked like this are then unloaded.
With this logic we do the best when the inotify queue overflows: we
synchronize our in-memory state again with what's on disk.
This contains some refactoring of the directory logic, to share more
code between uuid directories and "root" directories and generally make
things a bit more readable by splitting things up into smaller bits.
See: #7998#8032
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
**Warning**
This change does not appear to solve the case of a "paused" output
stream. If somebody is using `journalctl | less` and pauses the
output, then without a background thread periodically listening for
inotify delete events and cleaning up, journal logs will eventually
stop flowing in cases where a journal client with enough open files
causes the "free" disk space threshold to be crossed.
LOG_FAC() is the general way to extract the logging facility (when it has
been combined with the logging priority).
LOG_FACMASK can be used to mask off the priority so you only have the
logging facility bits... but to get the logging facility e.g. LOG_USER,
you also have to bitshift it as well. (The priority is in the low bits,
and so only requires masking).
((priority & LOG_FACMASK) == LOG_KERN) happens to work only because
LOG_KERN is 0, and hence has the same value with or without the bitshift.
Code that relies on weird assumptions like this could make it harder to
realize how the logging values are treated.
Let the journal capture messages emitted by systemd, before it ran
exec("/usr/lib/systemd/systemd-journald"). Usually such messages will only
appear with `systemd.log_level=debug`. kmsg lines written after the exec()
will be ignored as before.
In other words, we are avoiding reading our own lines, which start
"systemd-journald[100]: " assuming we are PID 100. But now we will start
allowing ourself to read lines which start "systemd[100]: ", or any other
prefix which is not "systemd-journald[100]: ".
So this can't help you see messages when we fail to exec() journald :). But,
it makes it easier to see what the pre-exec() messages look like in
the successful case. Comparing messages like this can be useful when
debugging. Noticing weird omissions of messages, otoh, makes me anxious.
Red is used for highligting, the same as grep does. Except when the line is
highlighted red already, because it has high priority, in which case plain ansi
highlight is used for the matched substring.
Coloring is implemented for short and cat outputs, and not for other types.
I guess we could also add it for verbose output in the future.
Case sensitive or case insensitive matching can be requested using
--case-sensitive[=yes|no].
Unless specified, matching is case sensitive if the pattern contains any
uppercase letters, and case insensitive otherwise. This matches what
forward-search does in emacs, and recently also --ignore-case in less. This
works surprisingly well, because usually when one is wants to do case-sensitive
matching, the pattern is usually camel-cased. In the less frequent case when
case-sensitive matching is required with an all-lowercase pattern,
--case-sensitive can be used to override the automatic logic.
This changes real_journal_next() to leverage the IteratedCache for
accelerating iteration across the open journal files.
journalctl timing comparisons with 100 journal files of 8MiB size
party to this boot:
Pre (~v235):
# time ./journalctl -b --no-pager > /dev/null
real 0m9.613s
user 0m9.560s
sys 0m0.053s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.548s
user 0m9.525s
sys 0m0.023s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.612s
user 0m9.582s
sys 0m0.030s
Post-IteratedCache:
# time ./journalctl -b --no-pager > /dev/null
real 0m8.449s
user 0m8.425s
sys 0m0.024s
# time ./journalctl -b --no-pager > /dev/null
real 0m8.409s
user 0m8.382s
sys 0m0.027s
# time ./journalctl -b --no-pager > /dev/null
real 0m8.410s
user 0m8.350s
sys 0m0.061s
~12.5% improvement, the benefit increases the more log files there are.
Previously, we'd refuse open journal files with suffixes that aren't
either .journal or .journal~. With this change we only care when we are
creating the journal file.
I looked over the sources to see whether we ever pass files discovered
by directory enumeration to journal_file_open() without first checking
the suffix (in which case the old check made sense), but I couldn't find
any. hence I am pretty sure removing this check is safe.
Fixes: #7972
This removes LOG_TARGET_SAFE. It's made redundant by the new
"prohibit-ipc" logging flag, as it used to have a similar effect: avoid
logging to the journal/syslog, i.e. any local services in order to avoid
deadlocks when we lock from PID 1 or its utility processes (such as
generators).
All previous users of LOG_TARGET_SAFE are switched over to the new
setting. This makes things a bit safer for all, as not even the
SYSTEMD_LOG_TARGET env var can be used to accidentally log to the
journal anymore in these programs.
Apparently O_NONBLOCK is the modern name used in most documentation and
for most cases in our sources. Let's hence replace the old alias
O_NDELAY and stick to O_NONBLOCK everywhere.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
We use the same check at two places, let's add a tiny helper function
for it, since it's not entirely trivialy, and we changes this before
multiple times, and it's a good thing if we can change it at one place
only instead of multiple.
This ensures that in all threads we fork off in the background in our
code we mask out all signals, so that our thread won't end up getting
signals delivered the main process should be getting.
We always set the signal mask before forking off the thread, so that the
thread has the right mask set from its earliest existance on.
Instead of compiling those files twice, once for libsystemd and once for
libshared, compile once as a static archive and then link into both.
This reduce the meson target for man=no compile to 1291.
gcrypt_util_sources had to be moved because otherwise they appeared twice
in libshared.so halfproducts, causing an error.
-fvisibility=default is added to libbasic, libshared_static so that the symbols
appear properly in the exported symbol list in libshared.
The advantage is that files are not compiled twice. When configured with -Dman=false,
the ninja target list is reduced from 1588 to 1347 targets. The difference in compilation
time is small (<10%). I think this is because of -O0 and ccache and multiple cores, and
in different settings the compilation time could be reduced. The main advantage is that
errors and warnings are not reported twice.