We now have the "BPF" pseudo-controllers. These should never be assumed
to be accessible as /sys/fs/cgroup/<controller> and not through
"cgroup.subtree_control" either, hence always check explicitly before we
go to the file system. We do this through our new CGROUP_MASK_V1 and
CGROUP_MASK_V2 definitions.
Also, when we fail, don't clobber the return value.
This brings the call more in-line with our usual coding style, and
removes surprises.
None of the callers seemed to care about this behaviour.
This was mostly prompted by seeing the expression "in_initrd() && flags
& PROC_CMDLINE_RD_STRICT", which uses & and && without any brackets.
Let's make that a bit more readable and hide all doubts about operator
precedence.
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.
In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.
While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
journald calls fd_get_path() a lot (it probably shouldn't, there's some
room for improvement there, but I'll leave that for another time), hence
it's worth optimizing the call a bit, in particular as it's easy.
Previously we'd open the dir /proc/self/fd/ first, before reading the
symlink inside it. This means the whole function requires three system
calls: open(), readlinkat(), close(). The reason for doing it this way
is to distinguish the case when we see ENOENT because /proc is not
mounted and the case when the fd doesn't exist.
With this change we'll directly go for the readlink(), and only if that
fails do an access() to see if /proc is mounted at all.
This optimizes the common case (where the fd is valid and /proc
mounted), in favour of the uncommon case (where the fd doesn#t exist or
/proc is not mounted).
I noticed while profiling journald that we invoke readlinkat() a ton on
open /proc/self/fd/<fd>, and that the returned paths are more often than
not longer than the 99 chars used before, when we look at archived
journal files. This means for these cases we generally need to execute
two rather than one syscalls.
Let's increase the buffer size a tiny bit, so that we reduce the number
of syscalls executed. This is really a low-hanging fruit of
optimization.
Our current set of flags allows an option to be either
use just in initrd or both in initrd and normal system.
This new flag is intended to be used in the case where
you want apply some settings just in initrd or just
in normal system.
This is useful for a couple of cases, I'm mostly interested in case #1:
1. Verifying "reasonable" values in a trivially scriptable way
2. Debugging unexpected time span parsing directly
Test Plan:
```
% build/systemd-analyze timespan 20
Original: 20
μs: 20
Human: 20us
% build/systemd-analyze timespan 20ms
Original: 20ms
μs: 20000
Human: 20ms
% build/systemd-analyze timespan 20z
Failed to parse time span '20z': Invalid argument
```
This is the counterpiece to the boot counting implemented in
systemd-boot: if a boot is detected as successful we mark drop the
counter again from the booted snippet or kernel image.
For the definition of assert_cc() we try to use static_assert and check
for it with "#ifdef". But that can only work if assert.h is imported
before. Hence let's do so.
This is a nice little optimization when using static const strings: we
can now use them directly as JsonVariant objecs, without any additional
allocation.
Simply as a safety precaution so that json objects we read are not
arbitrary amounts deep, so that code that processes json objects
recursively can't be easily exploited (by hitting stack limits).
Follow-up for oss-fuzz#10908
(Nice is that we can accomodate for this counter without increasing the
size of the JsonVariant object.)
Let's move things around a bit, so that the trailing unused whitespace
within the structure due to padding is placed together, so that it is
easier to use for new fields. (Found with pahole)