This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
Let's fork off sync() ina process instead of a thread, as a safety
measure. This is beneficial to ensure that the original process can exit
without having to wait for the sync() to finish (note that the kernel
will delay process termination until all threads finished their
syscalls). In case of hanging NFS this increases the chance that PID 1
can safely transition to the "systemd-shutdown" process as the sync() is
initiated early on but definitely not waited for.
systemd creates several device nodes in /run/systemd/inaccessible/.
This makes CGroup's settings related to IO can take device node
files in the directory.
If multiple RestrictAddressFamilies= settings, some of them are
whitelist and the others are blacklist, are sent to bus, then parsing
result was corrupted.
This fixes the parse logic, now it is the same as one used in
load-fragment.c
If multiple SystemCallFilter= settings, some of them are whitelist
and the others are blacklist, are sent to bus, then the parse
result was corrupted.
This fixes the parse logic, now it is the same as one used in
load-fragment.c
Up until now, the behaviour in systemd has (mostly) been to silently
ignore failures to action unit directives that refer to an unavailble
controller. The addition of AssertControlGroupController and its
conditional counterpart allow explicit specification of the desired
behaviour when such a situation occurs.
As for how this can happen, it is possible that a particular controller
is not available in the cgroup hierarchy. One possible reason for this
is that, in the running kernel, the controller simply doesn't exist --
for example, the CPU controller in cgroup v2 has only recently been
merged and was out of tree until then. Another possibility is that the
controller exists, but has been forcibly disabled by `cgroup_disable=`
on the kernel command line.
In future this will also support whatever comes out of issue #7624,
`DefaultXAccounting=never`, or similar.
This code is executed before we parse command line/configuration
parameters, hence let's not use arg_system to figure our how to clean up
things, but instead PID == 1. Let's move that check inside of the
function, to make things a bit more robust abstract from the outside.
Also, let's add a log message about this, that was so far missing.
Let's place them in initialize_runtime(), where they appear to fit best.
Effectively this is just a move a little bit down, swapping places with
log_execution_mode(), which should require neither call to be done
first.
Note that changes the conditionalization a bit for these calls, from
(PID == 1) to (arg_system && arg_action == ACTION_RUN). At this point this is pretty much the same
however, as we don't allow PID 1 without ACTION_RUN and without
arg_system set, safety_checks() ensures that.
It's part of finalizing our runtime parameters, hence let's move this
into load_configuration() after we loaded everything else. This is safe,
since we don't use it between the location where it was and where we
place it now yet.
This just sets up some variables the loaded configuration will then
modify. Let's invoke it hence right before loading the configuration.
This moves the initialization just a tiny bit later, but that shouldn't
matter, since we never access it in-between.
Let's make sure that if we are PID 1 we are invoked in ACTION_RUN mode,
and in arg_system mode, as well as the opposite.
Everything else is untested and probably not worth supporting hence
let's bail out early if people try anyway.
Let's shorten main() a bit, and split out everything that loads our
configuration and runtime parameters into a function of its own.
No changes in behaviour.