The code so far logged about some errors but was silent on others. Let's
stream-line that and make the function fully self-logging on all error
conditions.
Similar to the previous commit: in many cases no further fd processing
needs to be done in forked of children before execve() or any of its
flavours are called. In those case we can use FORK_RLIMIT_NOFILE_SAFE
instead.
Whenever we invoke external, foreign code from code that has
RLIMIT_NOFILE's soft limit bumped to high values, revert it to 1024
first. This is a safety precaution for compatibility with programs using
select() which cannot operate with fds > 1024.
This commit adds the call to rlimit_nofile_safe() to all invocations of
exec{v,ve,l}() and friends that either are in code that we know runs
with RLIMIT_NOFILE bumped up (which is PID 1 and all journal code for
starters) or that is part of shared code that might end up there.
The calls are placed as early as we can in processes invoking a flavour
of execve(), but after the last time we do fd manipulations, so that we
can still take benefit of the high fd limits for that.
This is really basic stuff and in a follow-up commit will use it all
across the codebase, including in process-util.[ch] which is in
src/basic/. Hence let's move it back to src/basic/ itself.
This helper sets RLIMIT_NOFILE's soft limit to 1024 (FD_SETSIZE) for
compatibility with apps using select().
The idea is that we use this helper to reset the limit whenever we
invoke foreign code from our own processes which have bumped
RLIMIT_NOFILE high.
This is in many cases redundant, as a similar check is done by various
callers already, but in other cases (where we read the color from a
static table for example), it's nice to let the color check be done by
the table code itself, and since it doesn't hurt in the other cases just
do it again.
it's already part of @ipc, no need to have it in both. Given that @ipc
is much more popular (as it is part of @system-service for example),
let's not define it a second time.
Let's generalize this, so that we can use this in nspawn later on, which
is pretty useful as we need to be able to mask files from the inner
child of nspawn too, where the host's /run/systemd/inaccessible
directory is not visible anymore. Moreover, if nspawn can create these
nodes on its own before the payload this means the payload can run with
fewer privileges.
add new "systemd-run-generator" for running arbitrary commands from the kernel command line as system services using the "systemd.run=" kernel command line switch
Previously, we'd link the unit file into /etc in this case, but that
should only be done if the unit file is not in the search path anyway,
and this is already done implicitly anyway for all enabled unit files,
hence no reason to duplicate this here.
Fixes: #10253
Quite often when we generate objects some fields should only be
generated in some conditions. Let's add high-level support for that.
Matching the existing JSON_BUILD_PAIR() this adds
JSON_BUILD_PAIR_CONDITIONAL() which is very similar, but takes an
additional parameter: a boolean condition. If "true" this acts like
JSON_BUILD_PAIR(), but if false then the whole pair is suppressed.
This sounds simply, but requires a tiny bit of complexity: when complex
sub-variants are used in fields, then we also need to suppress them.
This adds SuccessActionExitStatus= and FailureActionExitStatus= that may
be used to configure the exit status to propagate in when
SuccessAction=exit or FailureAction=exit is used.
When not specified let's also propagate the exit status of the main
process we fork off for the unit.
Let's simplify things and drop the logic that /var/lib/machines is setup
as auto-growing btrfs loopback file /var/lib/machines.raw.
THis was done in order to make quota available for machine management,
but quite frankly never really worked properly, as we couldn't grow the
file system in sync with its use properly. Moreover philosophically it's
problematic overriding the admin's choice of file system like this.
Let's hence drop this, and simplify things. Deleting code is a good
feeling.
Now that regular file systems provide project quota we could probably
add per-machine quota support based on that, hence the btrfs quota
argument is not that interesting anymore (though btrfs quota is a bit
more powerful as it allows recursive quota, i.e. that the machine pool
gets an overall quota in addition to per-machine quota).
We invoke this usually on a temporary path before renaming it into
place. This means the log message is quite suprising as it mentions a
weird path with random characters in it. Hence, let's downgrade the
message in order not to confuse the user.
Otherwise a "reboot" or "poweroff" in the initramfs will have to wait
until systemd-fsck-root.service has completed, which might never happen
if the root device never shows up.