It was done for mount units already (see commit 142b8142d7). For the
same reasons and for consistency we should also stop activating automagically
swaps when their device is hot-plugged.
From the bug:
> According to the documentation of systemd.automount if the automoint point is
> automagically created if it doesn't exist yet. This ofcourse means the
> filesystem underneath has to be writable, which for / means not only does
> -.mount need to be started but also systemd-remount-fs.service has to be run,
> which isn't guaranteed by the default automount dependencies.
>
> For .mount units there is an automatic default After= dependency on
> local-fs-pre.target, would probably make sense to do the same for automount
> units to avoid it failing on the corner-case where it has to create directory.
Fixes#13306.
Just as `RuntimeMaxSec=` is supported for service units, add support for
it to scope units. This will gracefully kill a scope after the timeout
expires from the moment the scope enters the running state.
This could be used for time-limited login sessions, for example.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
Fixes: #12035
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
This makes it easy to tell that the function only uses the Unit* for
reporting, and only makes changes to the other argument (which most likely
also points at the same Unit structure) for modifications.
No functional change, just adjusting code to follow the same pattern
everywhere. In particular, never call _verify() on an already loaded unit,
but return early from the caller instead. This makes the code a bit easier
to follow.
Now all unit types define .load. But even if it wasn't defined, we'd need
to call unit_load_fragment_and_dropin() anyway, so this code would not have
worked correctly.
Also, unit_load_fragment_and_dropin() either returns -ENOENT or changes
UNIT_STUB to UNIT_LOADED, so we don't need to repeat this here.
There is a slight functional change when load_state == UNIT_MERGED. Before,
we would not call unit_load_dropin(), but now we do. I'm not sure if this
causes an actual difference in behaviour, but since all other unit types do
this, I think it's better to do the same thing here too.
unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional()
are really the same, with one minor difference in behaviour. Let's drop
the second function.
"_optional" in the name suggests that it's the "dropin" part that is optional.
(Which it is, but in this case, we mean the fragment to be optional.)
I think the new version with a flag is easier to understand.
The default message for ENOSPC is very misleading: it says that the disk is
filled, but in fact the inotify watch limit is the problem.
So let's introduce and use a wrapper that simply calls inotify_add_watch(2) and
which fixes the error message up in case ENOSPC is returned.
PID1 may modified the environment passed by the kernel when it starts
running. Commit 9d48671c62 unset $HOME for
example.
In case PID1 is going to switch to a new root and execute a new system manager
which is not systemd, we should restore the original environment as the new
manager might expect some variables to be set by default (more specifically
$HOME).
This is the most basic consumer of the new systemd-vs-kernel checker,
both acting as a reasonable standalone exerciser of the code, and also
as a way for easy inspection of deviations from systemd internal state.
We currently don't have any mitigations against another privileged user
on the system messing with the cgroup hierarchy, bringing the system out
of line with what we've set in systemd. We also don't have any real way
to surface this to the user (we do have logs, but you have to know to
look in the first place).
There are a few possible solutions:
1. Maintaining our own cgroup tree with the new fsopen API and having a
read-only copy for everyone else. However, there are some
complications on this front, and this may be infeasible in some
environments. I'd rate this as a longer term effort that's tangential
to this patch.
2. Actively checking for changes with {fa,i}notify and changing them
back afterwards to match our configuration again. This is also
possible, but it's also good to have a way to do passive monitoring
of the situation without taking hard action. Also, currently daemons
like senpai do actually need to modify the tree behind systemd's
back (although hopefully this should be more integrated soon).
This patch implements another option, where one can, on demand, monitor
deviations in cgroup memory configuration from systemd's internal state.
Currently the only consumer is `systemd-analyze dump`, but the interface
is generic enough that it can also be exposed elsewhere later (for
example, over D-Bus).
Currently only memory limit style properties are supported, but later I
also plan to expand this out to other properties that systemd should
have ultimate control over.
mkdir -p is called both when setting up the autofs mount, as well
as after being notified that the real mount unit should be called.
However the first mkdir -p is hardcoded with 0555, while the second
uses the value specified to DirectoryMode in the automount unit; the
second mkdir -p is only needed when called from coldplug, so under
normal operation the dirs are incorrectly created with mode 0555.
This replaces the hardcoded 0555 mode with the value of DirectoryMode.
Closes#13683.
Setting the log level based on the signal made sense when signals that
were used were fixed. Since we allow signals to be configured, it doesn't
make sense to log at notice level about e.g. a restart or stop operation
just because the signal used is different.
This avoids messages like:
six.service: Killing process 210356 (sleep) with signal SIGINT.
v2:
- if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary
to preserve backwards compatibility (and keep KillSignal= generally useful).
A later version of the DefaultMemory{Low,Min} patch changed these to
require explicitly setting memory_foo_set, but we only set that in
load-fragment, not dbus-cgroup.
Without these, we may fall back to either DefaultMemoryFoo or
CGROUP_LIMIT_MIN when we really shouldn't.
This is an oversight from https://github.com/systemd/systemd/pull/12332.
Sadly the tests didn't catch it since it requires a real cgroup
hierarchy to see, and it wasn't seen in prod since we're only currently
using DefaultMemoryLow, not DefaultMemoryMin. :-(
As documented in the man-page, readdir() may return a directory entry with
d_type == DT_UNKNOWN. This must be handled for regular filesystems.
dirent_ensure_type() is available to set d_type if necessary. Use it in
some more places.
Without this systemd will fail to boot correctly with nfsroot and some
other filesystems.
Closes#13609