This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.
This also fixes#6391.
2d79a0bbb9 did that for TimeoutSec=,
89beff89ed did that for JobTimeoutSec=,
and 0004f698df did that for
x-systemd.device-timeout=. But after parsing x-systemd.device-timeout=xxx
we write it out as JobRunningTimeoutSec=xxx. Two options:
- write out JobRunningTimeoutSec=<a very big number>,
- change JobRunningTimeoutSec= to behave like the other options.
I think it would be confusing for JobRunningTimeoutSec= to have different
syntax then TimeoutSec= and JobTimeoutSec=, so this patch implements the
second option.
Fixes#6264, https://bugzilla.redhat.com/show_bug.cgi?id=1462378.
cgroup namespace wasn't useful for delegation because it allowed resource
control interface files (e.g. memory.high) to be written from inside the
namespace - this allowed the namespace parent's resource distribution to be
disturbed by its namespace-scoped children.
A new mount option, "nsdelegate", was added to cgroup v2 to address this issue.
The flag is meangingful only when mounting cgroup v2 in the init namespace and
makes a cgroup namespace a delegation boundary. The kernel feature is pending
for v4.13.
This should have been the default behavior on cgroup namespaces and this commit
makes systemd try "nsdelegate" first when trying to mount cgroup v2 and fall
back if the option is not supported.
Note that this has danger of breaking usages which depend on modifying the
parent's resource settings from the namespace root, which isn't a valid thing
to do, but such usages may still exist.
If an error is encountered in any of the Exec* lines, WorkingDirectory,
SELinuxContext, ApparmorProfile, SmackProcessLabel, Service (in .socket
units), User, or Group, refuse to load the unit. If the config stanza
has support, ignore the failure if '-' is present.
For those configuration directives, even if we started the unit, it's
pretty likely that it'll do something unexpected (like write files
in a wrong place, or with a wrong context, or run with wrong permissions,
etc). It seems better to refuse to start the unit and have the admin
clean up the configuration without giving the service a chance to mess
up stuff.
Note that all "security" options that restrict what the unit can do
(Capabilities, AmbientCapabilities, Restrict*, SystemCallFilter, Limit*,
PrivateDevices, Protect*, etc) are _not_ treated like this. Such options are
only supplementary, and are not always available depending on the architecture
and compilation options, so unit authors have to make sure that the service
runs correctly without them anyway.
Fixes#6237, #6277.
When running systemd-analyze verify I would get a random subset of warnings
(sometimes none, sometimes one or two):
dev-mapper-luks\x2d8db85dcf\x2d6230\x2d4e88\x2d940d\x2dba176d062b31.swap: Unit is bound to inactive unit dev-mapper-luks\x2d8db85dcf\x2d6230\x2d4e88\x2d940d\x2dba176d062b31.device. Stopping, too.
home.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device. Stopping, too.
boot.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-56c56bfd\x2d93f0\x2d48fb\x2dbc4b\x2d90aa67144ea5.device. Stopping, too.
When running with debug on, it's pretty obvious what is happening:
home.mount: Changed dead -> mounted
home.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device. Stopping, too.
home.mount: Trying to enqueue job home.mount/stop/fail
home.mount: Installed new job home.mount/stop as 27
home.mount: Enqueued job home.mount/stop as 27
...
dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Installed new job dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device/start as 47
dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Changed dead -> plugged
dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Job dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device/start finished, result=done
Fixes#2206, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808151.
Commit 74dd6b515f (core: run each system
service with a fresh session keyring) broke adding keys to user keyring.
Added keys could not be accessed with error message:
keyctl_read_alloc: Permission denied
So link the user keyring to our session keyring.
When umounting an NFS filesystem, it is not safe to lstat(2) the mountpoint at
all as that can block indefinitely if the NFS server is down.
umount() will not block, but lstat() will.
This patch therefore removes the call to lstat(2) and defers the handling of
any error to the child process which will issue the umount call.
This extends 2d79a0bbb9 to the kernel
command line parsing.
The parsing is changed a bit to only understand "0" as infinity. If units are
specified, parse normally, e.g. "0s" is just 0. This makes it possible to
provide a zero timeout if necessary.
Simple test is added.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1462378.
Under nspawn, systemd would print:
Got address error code: Operation not permitted
Got address error code: Operation not permitted
Got start error code: Operation not permitted
which is quite unclear out of context. Change that to:
Failed to add address 127.0.0.1 to loopback interface: Operation not permitted
Failed to add address ::1 to loopback interface: Operation not permitted
Failed to bring loopback interface up: Operation not permitted
This is just a cosmetic issue.
Garbage collection of jobs (especially the ones that we create automatically)
is something of an internal implementation detail and should not be made
visible to the users. But it's probably still useful to log this in the
journal, so the code is rearranged to skip one of the messages if we log to the
console and the journal separately, and to keep the message if we log
everything to the console.
Fixes#6254.
Fun fact 1 suggests that a "close()" is needed, but that close() has long since been
removed. So the comment in now meaningless and possibly confusing.
Fun fact 2 refers to a bug that has been fixed in Linux prior to v4.12
Commit: 9fa4eb8e490a ("autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL")
so revise the comment so that no-one goes pointlessly looking for the bug.
The CapabilityBoundingSet option only makes sense if we are running as
PID1.
The system.conf.d(5) manpage, already states that the CapabilityBoundingSet
option:
Controls which capabilities to include in the capability bounding set
for PID 1 and its children.
https://github.com/systemd/systemd/issues/6080
This patch is a bit more complex thant I hoped. In particular the single
IOScheduling= property exposed on the bus is split up into
IOSchedulingClass= and IOSchedulingPriority= (though compat is
retained). Otherwise the asymmetry between setting props and getting
them is a bit too nasty.
Fixes#5613
This also alters the documentation to recommend memfds rather than /run
for serializing state across reboots. That's because /run doesn't
actually have the same lifecycle as the fd store, as it is cleared out
on restarts.
Fixes: #5606
When specifiers are included in the Environment block in StartTransientUnit,
we resolve specifiers on the PID1 side. Nevertheless we store the unresolved
version in the transient unit file, so that it'll be resolved when loading
the unit. I think this looks nicer.
I also removed the writing of the merged Environment block to the transient
file. Afaict, this resulted in variables being written multiple times, but
this needs to be tested properly.
Fixes#5699.
Apart from bugs (as in #6152), this can happen if we ever make
our requirements for environment entries more stringent. As with
the rest of deserialization, we should just warn and continue.
This changes loopback setup to not only start the loopback device but
also add the relevant IP addresses to it. This way, we can synchronously
wait until that's complete, and properly guarantee that loopback setup
is complete at the time we start our first processes.
This is a semi-revert of f3fc48150b, but
heavily updated.
Fixes: #5641
Device is gone and most likely it will get garbage collected. However in
cases when it doesn't get gc'ed (because it is referenced by some
other unit, e.g. mount from fstab) we need to unset sysfs. This is
because when device appears next time, possibly, with different sysfs
path we need to update the sysfs path. Current code could end up caching
stale sysfs path forever.
In reality this is not a problem for normal disks (unless you swap them
during system runtime). However this issue causes failures to mount
filesystems on LVM where sysfs path depends on activation
order (i.e. logical volumes from volume group that is activated first
get assigned lower dm-X numbers and corresponding syspaths).
Fixes#6126.
When a DBus name is released, NameOwnerChanged signal contains an empty string
as new_owner. Commit bbc2908 changed interpretation of the empty string to a
valid name, which is not consistent with values that are sent by dbus-daemon.
As a side effect, this masks symptoms of systemd-logind dbus disconnections
(#2925) by completely restarting it so it can freshly reconnect to dbus.
This reworks timer_enter_waiting() in a couple of ways in order to clean
it up a bit and fix#5629.
Most importantly, we previously we initialized ts_monotonic to either
the current time in CLOCK_MONOTONIC or in CLOCK_BOOTTIME, depending on
t->wake_system. Then given specific conditions we'd use this time as
base for our timers. And afterwards, if t->wake_system was on we'd
convetr the resulting value from CLOCK_MONOTONIC to CLOCK_BOOTTIME again
— which of course is wrong since we already were in CLOCK_BOOTTIME! This
fixes this logic, by using a triple timestamp so that we always have the
right base around, and initially only calculate in CLOCK_MONOTONIC and
only convert as last step.
Conversion between the clocks is now done with the generic
usec_shift_clock(), and additions via usec_add() making these
calculations a bit safer.
Fixes: #5629
Also called "ANSI-C Quoting" in info:(bash) ANSI-C Quoting.
The escaping rules are a POSIX proposal, and are described in
http://austingroupbugs.net/view.php?id=249. There's a lot of back-and-forth on
the details of escaping of control characters, but we'll be only using a small
subset of the syntax that is common to all proposals and is widely supported.
Unfortunately dash and fish and maybe some other shells do not support it (see
the man page patch for a list).
This allows environment variables to be safely exported using show-environment
and imported into the shell. Shells which do not support this syntax will have
to do something like
export $(systemctl show-environment|grep -v '=\$')
or whatever is appropriate in their case. I think csh and fish do not support
the A=B syntax anyway, so the change is moot for them.
Fixes#5536.
v2:
- also escape newlines (which currently disallowed in shell values, so this
doesn't really matter), and tabs (as $'\t'), and ! (as $'!'). This way quoted
output can be included directly in both interactive and noninteractive bash.
This is a fixup of commit a2df3ea4ae.
When there is a running job with JobRunningTimeoutSec= and systemd serializes
its state (e.g. during daemon-reload), the timer event source won't be properly
restored in job_coldplug().
Thus save and serialize begin_running_usec too and reinitialize the timer based
on that value.