It was only used in one place, where we don't actually need it, and
it is too easy to forget to update it when adding new items to the table.
Let's just drop it.
systemd only uses functions that are as of Linux 4.15+ provided
externally to the CPU controller (currently usage_usec), so if we have a
new enough kernel, we don't need to set CGROUP_MASK_CPU for
CPUAccounting=true as the CPU controller does not need to necessarily be
enabled in this case.
Part of this patch is modelled on an earlier patch by Ryutaroh Matsumoto
(see PR #9665).
I decided to use a separate definition for this because it's too easy to return
positive from functions which don't need this distinction and only return
negative on error and success otherwise.
If we create a cgroup in one controller it might already have been
created in another too, if we have jointly mounted controllers. Take
that into consideration.
The function takes a pointer to a random block of memory and
the length of that block. It shouldn't crash every time it sees
a zero byte at the beginning there.
This should help the dev-kmsg fuzzer to keep going.
With gcc-7.1.1-3.fc26.aarch64:
../src/basic/json.c: In function ‘json_format’:
../src/basic/json.c:1409:40: warning: comparison is always true due to limited range of data type [-Wtype-limits]
if (*q >= 0 && *q < ' ')
^~
../src/basic/json.c: In function ‘inc_lines_columns’:
../src/basic/json.c:1762:31: warning: comparison is always true due to limited range of data type [-Wtype-limits]
} else if (*s >= 0 && *s < 127) /* Process ASCII chars quickly */
^~
Cast to (signed char) silences the warning, but a cast to (int) for some reason
doesn't.
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
This introduces a wrapper around extrac_first_word() called
proc_cmdline_extract_first(), which suppresses "rd." parameters
depending on the specified calls.
This allows us to share more code between proc_cmdline_parse_given() and
proc_cmdline_get_key(), and makes it easier to reuse this logic for
other purposes.
Normally, we want to immediately quit on ^C. But when we are running under
less, people may set SYSTEMD_LESS without K, in which case they can use ^C to
communicate with less, and e.g. start and stop following input.
Fixes#6405.
All users of the macro (except for one, in serialize.c), use the macro in
connection with read_line(), so they must include fileio.h. Let's not play
libc games and require multiple header file to be included for the most common
use of a function.
The removal of def.h includes is not exact. I mostly went over the commits that
switch over to use read_line() and add def.h at the same time and reverted the
addition of def.h in those files.
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
This makes DEPTH_MAX lower value, as test-json fails with stack
overflow.
Note that the test can pass with 8k, but for safety, here set to 4k.
Fixes#10738.
This helper is useful to ensure pidns/userns joining is properly
executed (as that requires a fork after the setns()). This is
particularly important when it comes to /proc/self/ access or
SCM_CREDENTIALS, but is generally the safer mode of operation.
Rename rdrand64 to rdrand, and switch from uint64_t to unsigned long.
This produces code that will compile/assemble on both x86-64 and x86-32.
This could be useful when running a 32-bit copy of systemd on a modern
Intel processor.
RDRAND is inherently arch-specific, so relying on the compiler-defined
'long' type seems reasonable.
We only use this when we don't require the best randomness. The primary
usecase for this is UUID generation, as this means we don't drain
randomness from the kernel pool for them. Since UUIDs are usually not
secrets RDRAND should be goot enough for them to avoid real-life
collisions.
Originally, the high_quality_required boolean argument controlled two
things: whether to extend any random data we successfully read with
pseudo-random data, and whether to return -ENODATA if we couldn't read
any data at all.
The boolean got replaced by RANDOM_EXTEND_WITH_PSEUDO, but this name
doesn't really cover the second part nicely. Moreover hiding both
changes of behaviour under a single flag is confusing. Hence, let's
split this part off under a new flag, and use it from random_bytes().
This should normally not happen, but given that the man page suggests
something about this in the context of interruption, let's handle this
and propagate an I/O error.
It's more descriptive, since we also have a function random_bytes()
which sounds very similar.
Also rename pseudorandom_bytes() to pseudo_random_bytes(). This way the
two functions are nicely systematic, one returning genuine random bytes
and the other pseudo random ones.
It's cheap to get RDRAND and given that srand() is anyway not really
useful for trusted randomness let's use RDRAND for it, after all we have
all the hard work for that already in place.
Otherwise doing comparing a CGroupMask (which is unsigned in effect)
with the result of CGROUP_CONTROLLER_TO_MASK() will result in warnings
about signedness differences.
We now have the "BPF" pseudo-controllers. These should never be assumed
to be accessible as /sys/fs/cgroup/<controller> and not through
"cgroup.subtree_control" either, hence always check explicitly before we
go to the file system. We do this through our new CGROUP_MASK_V1 and
CGROUP_MASK_V2 definitions.
Also, when we fail, don't clobber the return value.
This brings the call more in-line with our usual coding style, and
removes surprises.
None of the callers seemed to care about this behaviour.
This was mostly prompted by seeing the expression "in_initrd() && flags
& PROC_CMDLINE_RD_STRICT", which uses & and && without any brackets.
Let's make that a bit more readable and hide all doubts about operator
precedence.
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.
In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.
While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
journald calls fd_get_path() a lot (it probably shouldn't, there's some
room for improvement there, but I'll leave that for another time), hence
it's worth optimizing the call a bit, in particular as it's easy.
Previously we'd open the dir /proc/self/fd/ first, before reading the
symlink inside it. This means the whole function requires three system
calls: open(), readlinkat(), close(). The reason for doing it this way
is to distinguish the case when we see ENOENT because /proc is not
mounted and the case when the fd doesn't exist.
With this change we'll directly go for the readlink(), and only if that
fails do an access() to see if /proc is mounted at all.
This optimizes the common case (where the fd is valid and /proc
mounted), in favour of the uncommon case (where the fd doesn#t exist or
/proc is not mounted).
I noticed while profiling journald that we invoke readlinkat() a ton on
open /proc/self/fd/<fd>, and that the returned paths are more often than
not longer than the 99 chars used before, when we look at archived
journal files. This means for these cases we generally need to execute
two rather than one syscalls.
Let's increase the buffer size a tiny bit, so that we reduce the number
of syscalls executed. This is really a low-hanging fruit of
optimization.
Our current set of flags allows an option to be either
use just in initrd or both in initrd and normal system.
This new flag is intended to be used in the case where
you want apply some settings just in initrd or just
in normal system.
This is useful for a couple of cases, I'm mostly interested in case #1:
1. Verifying "reasonable" values in a trivially scriptable way
2. Debugging unexpected time span parsing directly
Test Plan:
```
% build/systemd-analyze timespan 20
Original: 20
μs: 20
Human: 20us
% build/systemd-analyze timespan 20ms
Original: 20ms
μs: 20000
Human: 20ms
% build/systemd-analyze timespan 20z
Failed to parse time span '20z': Invalid argument
```
This is the counterpiece to the boot counting implemented in
systemd-boot: if a boot is detected as successful we mark drop the
counter again from the booted snippet or kernel image.