Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).
Transient units require a unique name, that previously had no
configuration file on disk.
A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:
$ systemd-run /bin/ping www.heise.de
Will cause systemd to create a new transient service and run ping in it.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
Internally we store all time values in usec_t, however parse_usec()
actually was used mostly to parse values in seconds (unless explicit
units were specified to define a different unit). Hence, be clear about
this and name the function about what we pass into it, not what we get
out of it.
Previously it was necessary to pull in remote-fs-pre.target to order the
mount units against network.target since the ordering was done
transitively via remote-fs-pre.target.
As network implementations shouldn't need to know about the specific
use-case of network mounts we instead now simply order network.target
against all mounts too. This should make it unnecessary for network
managing services to import remote-fs-pre.target explicitly, as
network.target will now suffice.
This introduces remote-fs-setup.target independently of
remote-fs-pre.target. The former is only for pulling things in, the
latter only for ordering.
The new semantics:
remote-fs-setup.target: is pulled in automatically by all remote mounts.
Shall be used to pull in other units that want to run when at least one
remote mount is set up. Is not ordered against the actual mount units,
in order to allow activation of its dependencies even 'a posteriori',
i.e. when a mount is established outside of systemd and is only picked
up by it.
remote-fs-pre.target: needs to be pulled in automatically by the
implementing service, is otherwise not part of the initial transaction.
This is ordered before all remote mount units.
A service that wants to be pulled in and run before all remote mounts
should hence have:
a) WantedBy=remote-fs-setup.target -- so that it is pulled in
b) Wants=remote-fs-pre.target + Before=remote-fs-pre.target -- so that
it is ordered before the mount point, normally.
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
.mount units coming from /proc/self/mountinfo file are
unmounted after local-fs.target is reached during shutdown.
Problem: .mount units popping up in mountinfo file are
added to systemd without any dependency. For that reason,
they are the first one to be unmounted during shutdown.
Whichever program mounted the file system deserves a
chance to also unmount it. This patch ensures that
/proc/self/mountinfo units will be unmounted after
local-fs.target during shutdown (if they haven't been
unmounted already)
There are very few differences in the implementations of the kill method in the
unit types that have one. Let's unify them.
This does not yet unify unit_kill() with unit_kill_context().
If you enter unit_add_exec_dependencies with m->where = NULL, you'll
very likely end up aborting somewhere under socket_needs_mount.
(When systemd goes to check to see if the journald socket requires your
mount, it'll do path_startswith(path, m->where)... *kaboom*)
This patch should ensure that:
a) both branches in mount_add_one() set m->where, and
b) mount_add_extras() calls unit_add_exec_dependencies() *after*
setting m->where.
Under some circumstances this could lead to a segfault since we we
half-initialized a mount unit, then tried to hook it into the network of
things and while doing that recursively ended up looking at our
half-initialized mount unit again assuming it was fully initialized.
Note: I did s/MANAGER/SYSTEMD/ everywhere, even though it makes the
patch quite verbose. Nevertheless, keeping MANAGER prefix in some
places, and SYSTEMD prefix in others would just lead to confusion down
the road. Better to rip off the band-aid now.
In some cases, like wrong configuration, restarting after error
does not help, so administrator can specify statuses by RestartPreventExitStatus
which will not cause restart of a service.
Sometimes you have non-standart exit status, so this can be specified
by SuccessfulExitStatus.
It made no sense, and since we are documenting the bus calls now and
want to include them in our stability promise we really should get it
cleaned up sooner, not later.
If accessing an automount point triggers more changes to
/proc/self/mountinfo than just to add the directly wanted mount, these
changes can lead to spurious -ENODEV notifications on the automount unit
causing the request to fail when in fact the mount will be setup right
afterwards.
Having information from /proc/self/mountinfo is sufficient to consider a
mount unit loaded.
When there's no mountinfo, the loading of the fragment for the mount
unit is not optional. No extra dependency links must be added when the
loading fails.
https://bugzilla.redhat.com/show_bug.cgi?id=835848
The rule is that units that encapsulate our own code are prefixed with
"systemd-". Since the fsck units invoke our own code, hence add the
missing prefix. Since a long long time the fsck units didn't invoke the
naked fsck binaries anymore, and it is unlikely that this well ever
change. On the opposite: the code in systemd-fsck will probably get more
complex over time to handle fsck progress to plymouth forwarding.
Same for quotacheck (but not quotaon!)
As described in
https://bugs.freedesktop.org/show_bug.cgi?id=50184
the journal currently doesn't set fields such as _SYSTEMD_UNIT
properly for messages coming from processes that have already
terminated. This means among other things that "systemctl status" may
not show some of the output of services that wrote messages just
before they exited.
This patch fixes this by having processes that log to the journal
write their unit identifier to journald when the connection to
/run/systemd/journal/stdout is opened. Journald stores the unit ID
and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained
normally (i.e. from the cgroup). To prevent impersonating another
unit, this information is only used when the caller is root.
This doesn't fix the general problem of getting metadata about
messages from terminated processes (which requires some kernel
support), but it allows "systemctl status" and similar queries to do
the Right Thing for units that log via stdout/stderr.
Instead of generic "Starting..." and "Started" messages for all unit use
type-dependent messages. For example, mounts will announce "Mounting..."
and "Mounted".
Add status messages to units of types that used to be entirely silent
(automounts, sockets, targets, devices). For unit types whose jobs are
instantaneous, report only the job completion, not the starting event.
Socket units with non-instantaneous jobs are rare (Exec*= is not used
often in socket units), so I chose not to print the starting messages
for them either.
This will hopefully give people better understanding of the boot.
The kernel will only notify us of cgroups running empty if no subcgroups
exist anymore. Hence make sure we don't leave our own control/ subcgroup
around longer than necessary.
https://bugzilla.redhat.com/show_bug.cgi?id=818381
RequiresMountsFor= is a shortcut for adding requires and after
dependencies to all mount units neeed for the specified paths.
This solves a couple of issues regarding dep loop cycles for encrypted
swap.
Type=idle is much like Type=simple, however between the fork() and the
exec() in the child we wait until PID 1 informs us that no jobs are
left.
This is mostly a cosmetic fix to make gettys appear only after all boot
output is finished and complete.
Note that this does not impact the normal job logic as we do not delay
the completion of any jobs. We just delay the invocation of the actual
binary, and only for services that otherwise would be of Type=simple.
The ability to set MountAuto=no and SwapAuto=no was useful during the
adoption phase of systemd, so that distributions could stick to their
classic mount scripts a bit longer. It is about time to get rid of it
now.
Previously, we were brutally and onconditionally killing all processes
in a service's cgroup before starting the service anew, in order to
ensure that StartPre lines cannot be misused to spawn long-running
processes.
On logind-less systems this has the effect that restarting sshd
necessarily calls all active ssh sessions, which is usually not
desirable.
With this patch control processes for a service are placed in a
sub-cgroup called "control/". When starting a service anew we simply
kill this cgroup, but not the main cgroup, in order to avoid killing any
long-running non-control processes from previous runs.
https://bugzilla.redhat.com/show_bug.cgi?id=805942
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.