Change cunescape() to return a normal error code, so that we can
distuingish OOM errors from parse errors.
This also adds a flags parameter to control whether "relaxed" or normal
parsing shall be done. If set no parse failures are generated, and the
only reason why cunescape() can fail is OOM.
If you have for example ext4 on iscsi devices it is possible to setup
qoutas there. Unfortunately, because such fstab entry contains _netdev,
systemd will not add dependency to quotaon.service.
Because the order of coldplugging is not defined, we can reference a
not-yet-coldplugged unit and read its state while it has not yet been
set to a meaningful value.
This way, already active units may get started again.
We fix this by deferring such actions until all units have been at
least somehow coldplugged.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401
This change introduces a new state "tentative" for device units. Device
units are considered "plugged" when udev announced them, "dead" when
they are not available in the kernel, and "tentative" when they are
referenced in /proc/self/mountinfo or /proc/swaps but not (yet)
announced via udev.
This should fix a race when device nodes (like loop devices) are created
and immediately mounted. Previously, systemd might end up seeing the
mount unit before the device, and would thus pull down the mount because
its BindTo dependency on the device would not be fulfilled.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
Add unit dependencies for dynamic (i. e. not from fstab) mounts. With that,
mount units properly bind to their underlying device, and thus get
automatically stopped/unmounted when the underlying device goes away.
This cleans up stale mounts from unplugged devices.
Thanks to Lennart Poettering for pointing out the fix!
Unit _start() and _stop() implementations can fail with -EAGAIN to delay
execution temporarily. Thus, we should not output status messages before
invoking these calls, but after, and only when we know that the
invocation actually made a change.
After all, mounts below these directories are pretty much guaranteed to
be virtual, and it's hence unnecessary to unmount them during shutdown.
Moreover, in less-priviliged containers we might lack the rights to
unmount them, hence don't even try.
http://lists.freedesktop.org/archives/systemd-devel/2015-January/027113.html
deb6120920 'man: there's actually no "fail" fstab option, but only
"nofail" removed it from our documentation, which I missed.
fstab(5) only mentions "auto", "noauto", and "nofail". Stick to
those three.
strempty() will return an empty string in case the input parameter is
a NULL pointer. The correct test to check for an empty string is
isempty(), so use that instead.
This fixes a regression from commit 17a1c59 ("core/mount: filter out
noauto,auto,nofail,fail options").
We passed the full option string from fstab to /bin/mount. It would in
turn pass the full option string to its helper, if it needed to invoke
one. Some helpers would ignore things like "nofail", but others would
be confused. We could try to get all helpers to ignore those
"meta-options", but it seems better to simply filter them out.
In our model, /bin/mount simply has no business in knowing whether the
mount was configured as fail or nofail, auto or noauto, in the
fstab. If systemd tells invokes a command to mount something, and it
fails, it should always return an error. It seems cleaner to filter
out the option, since then there's no doubt how the command should
behave.
https://bugzilla.redhat.com/show_bug.cgi?id=1177823
This fixes parsing of options in shared/generator.c. Existing code
had some issues:
- it would treate whitespace and semicolons as seperators. fstab(5)
is pretty clear that only commas matter. And the syntax does
not allow for spaces to be inserted in the field in fstab.
Whitespace might be escaped, but then it should not seperate
options. Treat whitespace and semicolons as any other character.
- it assumed that x-systemd.device-timeout would always be followed
by "=". But this is not guaranteed, hasmntopt will return this
option even if there's no value. Uninitialized memory could be read.
- some error paths would log, and inconsistently, some would just
return an error code.
Filtering is split out to a separate function and tests are added.
Similar code paths in other places are adjusted to use the new function.
-n is only allowed for root. /etc/mtab is nowadays almost always a link to /proc/,
so in practice this does not really matter too much, but should allow .mount units
to work in --user mode.
https://bugs.freedesktop.org/show_bug.cgi?id=87602
Containers do not really support .device, .automount or .swap units;
Systems compiled without support for swap do not support .swap units;
Systems without kdbus do not support .busname units.
With this change attempts to start a unsupported unit types will result
in an immediate "unsupported" job result, which is a lot more
descriptive then before. Also, attempts to start device units in
containers will now immediately fail instead of causing jobs to be
enqueued that never go away.
When creating a new mount unit after an event on /proc/self/mountinfo,
check the mount options as well as the fstype to determine if this is a
remote mount that requires network access.
This is an attempt to add it the remote-fs dependencies to a mount unit
if the options change, like when the utab options are picked up after
mountinfo has already been processed. It just adds the remote-fs
dependencies, leaving the local-fs ones in place.
With this change I always get mount units with proper remote-fs
dependencies when mounted with the _netdev option.
Parsing the mount table with libmount races against the mount command,
which will handle the actual mounting before updating utab. This means
the poll event on /proc/self/mountinfo can kick of a reparse in systemd
before the utab information is available.
This change adds in an additional event source using inotify to watch
for changes to utab. It only watches for IN_MOVED_TO events, matching
libmount behavior of always overwriting this file using rename(2).
This does add a second pass through the mount table parsing when utab is
updated.
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
- Rename log_meta() → log_internal(), to follow naming scheme of most
other log functions that are usually invoked through macros, but never
directly.
- Rename log_info_object() to log_object_info(), simply because the
object should be before any other parameters, to follow OO-style
programming style.
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.
For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.
This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
This way, the list of arguments to that function gets more comprehensive,
and we can get around passing lots of NULL and 0 arguments from socket.c,
swap.c and mount.c.
It also allows for splitting up the code in exec_spawn().
While at it, make ExecContext const in execute.c.
Instead of adjusting job timeouts in the core, let fstab-generator
write out a dropin snippet with the appropriate JobTimeout.
x-systemd-device.timeout option is removed from Options= line
in the generated unit.
The functions to write dropins are moved from core/unit.c to
shared/dropin.c, to make them available outside of core.
generator.c is moved to libsystemd-label, because it now uses
functions defined in dropin.c, which are in libsystemd-label.
/etc/mtab should die die die. It's sad enough util-linux still contains
support for it, but we don't have to partake in that charade, so let's
turn this off.
This is in-line with the fact that since years we already have been
"tainting" systemd if we detect /etc/mtab not being a symlink...
Of course, util-linux is currently broken, and still touches /etc/mtab,
weven if we pass "--no-mtab" to it:
https://bugzilla.redhat.com/show_bug.cgi?id=1109367
But hey, let's hope that gets fixed quickly, even if total removal of
/etc/mtab support from util-linux might not happen so quickly...
Let's automatically initialize the kill, exec and cgroup contexts of the
various unit types when the object is constructed, instead of
invididually in type-specific code.
Also, when PrivateDevices= is set, set DevicePolicy= to closed.
Previously the returned object of constructor functions where sometimes
returned as last, sometimes as first and sometimes as second parameter.
Let's clean this up a bit. Here are the new rules:
1. The object the new object is derived from is put first, if there is any
2. The object we are creating will be returned in the next arguments
3. This is followed by any additional arguments
Rationale:
For functions that operate on an object we always put that object first.
Constructors should probably not be too different in this regard. Also,
if the additional parameters might want to use varargs which suggests to
put them last.
Note that this new scheme only applies to constructor functions, not to
all other functions. We do give a lot of freedom for those.
Note that this commit only changes the order of the new functions we
added, for old ones we accept the wrong order and leave it like that.
Given that we now have KillMode=mixed where SIGTERM might kill a smaller
set than SIGKILL we need to make sure to always go explicitly throught
the SIGKILL state to get the right end result.
It is nicer to predefine patterns using configure time check instead of
using casts everywhere.
Since we do not need to use any flags, include "%" in the format instead
of excluding it like PRI* macros.
Also, introduce a new environment variable named $WATCHDOG_PID which
cotnains the PID of the process that is supposed to send the keep-alive
events. This is similar how $LISTEN_FDS and $LISTEN_PID work together,
and protects against confusing processes further down the process tree
due to inherited environment.
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
We now treat passno as boleans in the generators, and don't need this any more. fsck itself
is able to sequentialize checks on the same local media, so in the common case the ordering
is redundant.
It is still possible to force an order by using .d fragments, in case that is desired.
For cifs mount like //server/share, we would get
RequiresMountsFor=/server/share, which probably isn't
harmful, but quite confusing.
Unfortunately a bunch of static functions had to be moved
up, but patch is really one line.
Usually the network is stopped before filesystems are umounted.
Ordering network filesystems before remote-fs.target means that their
unmounting will be performed earlier, and can terminate sucessfully.
https://bugs.freedesktop.org/show_bug.cgi?id=70002
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.
This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.
This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.
With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.
https://bugs.freedesktop.org/show_bug.cgi?id=69740
Previously we did operations like attach, trim or migrate only on the
controllers that were enabled for a specific unit. With this changes we
will now do them for all supproted controllers, and fall back to all
possible prefix paths if the specified paths do not exist.
This fixes issues if a controller is being disabled for a unit where it
was previously enabled, and makes sure that all processes stay as "far
down" the tree as groups exist.
These mounts should be kept around and unmounted in the shutdown ramfs.
Currently, we will still attempt to umount these in the final kill spree, but
we should consider avoiding that too. Also, the should_umount function should
be generalised and put into util.c or something like that, but we are still
discussing precisely how.
This makes mount units work like swap units: when the backing device appears
the mount unit will be started.
v2: the device should want the mount unconditionally, not only for DefaultDependencies=yes
Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).
Transient units require a unique name, that previously had no
configuration file on disk.
A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:
$ systemd-run /bin/ping www.heise.de
Will cause systemd to create a new transient service and run ping in it.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
Internally we store all time values in usec_t, however parse_usec()
actually was used mostly to parse values in seconds (unless explicit
units were specified to define a different unit). Hence, be clear about
this and name the function about what we pass into it, not what we get
out of it.
Previously it was necessary to pull in remote-fs-pre.target to order the
mount units against network.target since the ordering was done
transitively via remote-fs-pre.target.
As network implementations shouldn't need to know about the specific
use-case of network mounts we instead now simply order network.target
against all mounts too. This should make it unnecessary for network
managing services to import remote-fs-pre.target explicitly, as
network.target will now suffice.
This introduces remote-fs-setup.target independently of
remote-fs-pre.target. The former is only for pulling things in, the
latter only for ordering.
The new semantics:
remote-fs-setup.target: is pulled in automatically by all remote mounts.
Shall be used to pull in other units that want to run when at least one
remote mount is set up. Is not ordered against the actual mount units,
in order to allow activation of its dependencies even 'a posteriori',
i.e. when a mount is established outside of systemd and is only picked
up by it.
remote-fs-pre.target: needs to be pulled in automatically by the
implementing service, is otherwise not part of the initial transaction.
This is ordered before all remote mount units.
A service that wants to be pulled in and run before all remote mounts
should hence have:
a) WantedBy=remote-fs-setup.target -- so that it is pulled in
b) Wants=remote-fs-pre.target + Before=remote-fs-pre.target -- so that
it is ordered before the mount point, normally.
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
.mount units coming from /proc/self/mountinfo file are
unmounted after local-fs.target is reached during shutdown.
Problem: .mount units popping up in mountinfo file are
added to systemd without any dependency. For that reason,
they are the first one to be unmounted during shutdown.
Whichever program mounted the file system deserves a
chance to also unmount it. This patch ensures that
/proc/self/mountinfo units will be unmounted after
local-fs.target during shutdown (if they haven't been
unmounted already)
There are very few differences in the implementations of the kill method in the
unit types that have one. Let's unify them.
This does not yet unify unit_kill() with unit_kill_context().
If you enter unit_add_exec_dependencies with m->where = NULL, you'll
very likely end up aborting somewhere under socket_needs_mount.
(When systemd goes to check to see if the journald socket requires your
mount, it'll do path_startswith(path, m->where)... *kaboom*)
This patch should ensure that:
a) both branches in mount_add_one() set m->where, and
b) mount_add_extras() calls unit_add_exec_dependencies() *after*
setting m->where.
Under some circumstances this could lead to a segfault since we we
half-initialized a mount unit, then tried to hook it into the network of
things and while doing that recursively ended up looking at our
half-initialized mount unit again assuming it was fully initialized.
Note: I did s/MANAGER/SYSTEMD/ everywhere, even though it makes the
patch quite verbose. Nevertheless, keeping MANAGER prefix in some
places, and SYSTEMD prefix in others would just lead to confusion down
the road. Better to rip off the band-aid now.
In some cases, like wrong configuration, restarting after error
does not help, so administrator can specify statuses by RestartPreventExitStatus
which will not cause restart of a service.
Sometimes you have non-standart exit status, so this can be specified
by SuccessfulExitStatus.
It made no sense, and since we are documenting the bus calls now and
want to include them in our stability promise we really should get it
cleaned up sooner, not later.
If accessing an automount point triggers more changes to
/proc/self/mountinfo than just to add the directly wanted mount, these
changes can lead to spurious -ENODEV notifications on the automount unit
causing the request to fail when in fact the mount will be setup right
afterwards.
Having information from /proc/self/mountinfo is sufficient to consider a
mount unit loaded.
When there's no mountinfo, the loading of the fragment for the mount
unit is not optional. No extra dependency links must be added when the
loading fails.
https://bugzilla.redhat.com/show_bug.cgi?id=835848
The rule is that units that encapsulate our own code are prefixed with
"systemd-". Since the fsck units invoke our own code, hence add the
missing prefix. Since a long long time the fsck units didn't invoke the
naked fsck binaries anymore, and it is unlikely that this well ever
change. On the opposite: the code in systemd-fsck will probably get more
complex over time to handle fsck progress to plymouth forwarding.
Same for quotacheck (but not quotaon!)
As described in
https://bugs.freedesktop.org/show_bug.cgi?id=50184
the journal currently doesn't set fields such as _SYSTEMD_UNIT
properly for messages coming from processes that have already
terminated. This means among other things that "systemctl status" may
not show some of the output of services that wrote messages just
before they exited.
This patch fixes this by having processes that log to the journal
write their unit identifier to journald when the connection to
/run/systemd/journal/stdout is opened. Journald stores the unit ID
and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained
normally (i.e. from the cgroup). To prevent impersonating another
unit, this information is only used when the caller is root.
This doesn't fix the general problem of getting metadata about
messages from terminated processes (which requires some kernel
support), but it allows "systemctl status" and similar queries to do
the Right Thing for units that log via stdout/stderr.
Instead of generic "Starting..." and "Started" messages for all unit use
type-dependent messages. For example, mounts will announce "Mounting..."
and "Mounted".
Add status messages to units of types that used to be entirely silent
(automounts, sockets, targets, devices). For unit types whose jobs are
instantaneous, report only the job completion, not the starting event.
Socket units with non-instantaneous jobs are rare (Exec*= is not used
often in socket units), so I chose not to print the starting messages
for them either.
This will hopefully give people better understanding of the boot.
The kernel will only notify us of cgroups running empty if no subcgroups
exist anymore. Hence make sure we don't leave our own control/ subcgroup
around longer than necessary.
https://bugzilla.redhat.com/show_bug.cgi?id=818381
RequiresMountsFor= is a shortcut for adding requires and after
dependencies to all mount units neeed for the specified paths.
This solves a couple of issues regarding dep loop cycles for encrypted
swap.
Type=idle is much like Type=simple, however between the fork() and the
exec() in the child we wait until PID 1 informs us that no jobs are
left.
This is mostly a cosmetic fix to make gettys appear only after all boot
output is finished and complete.
Note that this does not impact the normal job logic as we do not delay
the completion of any jobs. We just delay the invocation of the actual
binary, and only for services that otherwise would be of Type=simple.
The ability to set MountAuto=no and SwapAuto=no was useful during the
adoption phase of systemd, so that distributions could stick to their
classic mount scripts a bit longer. It is about time to get rid of it
now.
Previously, we were brutally and onconditionally killing all processes
in a service's cgroup before starting the service anew, in order to
ensure that StartPre lines cannot be misused to spawn long-running
processes.
On logind-less systems this has the effect that restarting sshd
necessarily calls all active ssh sessions, which is usually not
desirable.
With this patch control processes for a service are placed in a
sub-cgroup called "control/". When starting a service anew we simply
kill this cgroup, but not the main cgroup, in order to avoid killing any
long-running non-control processes from previous runs.
https://bugzilla.redhat.com/show_bug.cgi?id=805942
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.