This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
When a service exits succesfully and has RemainAfterExit set, its hold
on the console (in m->n_on_console) wasn't released since the unit state
didn't change.
We now treat passno as boleans in the generators, and don't need this any more. fsck itself
is able to sequentialize checks on the same local media, so in the common case the ordering
is redundant.
It is still possible to force an order by using .d fragments, in case that is desired.
The Service type's forbid_restart field was not preserved by
serialization/deserialization, so the fact that the service should not
be restarted after stopping was lost.
If a systemctl stop foo command has been given, but the foo service
has not yet stopped, and then the systemctl --system daemon-reload was
given, then when the foo service eventually stopped, systemd would
restart it.
https://bugs.freedesktop.org/show_bug.cgi?id=69800
Previously we did operations like attach, trim or migrate only on the
controllers that were enabled for a specific unit. With this changes we
will now do them for all supproted controllers, and fall back to all
possible prefix paths if the specified paths do not exist.
This fixes issues if a controller is being disabled for a unit where it
was previously enabled, and makes sure that all processes stay as "far
down" the tree as groups exist.
Previously the specifier calls could only indicate OOM by returning
NULL. With this change they will return negative errno-style error codes
like everything else.
"Scope" units are very much like service units, however with the
difference that they are created from pre-existing processes, rather
than processes that systemd itself forks off. This means they are
generated programmatically via the bus API as transient units rather
than from static configuration read from disk. Also, they do not provide
execution-time parameters, as at the time systemd adds the processes to
the scope unit they already exist and the parameters cannot be applied
anymore.
The primary benefit of this new unit type is to create arbitrary cgroups
for worker-processes forked off an existing service.
This commit also adds a a new mode to "systemd-run" to run the specified
processes in a scope rather then a transient service.
Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).
Transient units require a unique name, that previously had no
configuration file on disk.
A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:
$ systemd-run /bin/ping www.heise.de
Will cause systemd to create a new transient service and run ping in it.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
- This changes all logind cgroup objects to use slice objects rather
than fixed croup locations.
- logind can now collect minimal information about running
VMs/containers. As fixed cgroup locations can no longer be used we
need an entity that keeps track of machine cgroups in whatever slice
they might be located. Since logind already keeps track of users,
sessions and seats this is a trivial addition.
- nspawn will now register with logind and pass various bits of metadata
along. A new option "--slice=" has been added to place the container
in a specific slice.
- loginctl gained commands to list, introspect and terminate machines.
- user.slice and machine.slice will now be pulled in by logind.service,
since only logind.service requires this slice.
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
When a sigchld is received from an alien child, main_pid is set to
0 then service_enter_running calls main_pid_good to check if the
child is running. This incorrectly returned true because
kill(main_pid, 0) would return >= 0.
This fixes an error where a service would die and the cgroup would
become empty but the service would still report as active (running).
We can assume that a service for which a watchdog timeout was triggered
is unresponsive to a clean shutdown. However, it still makes sense to
execute the post-stop cleanup commands that can be configured with
ExecStopPost=. Hence, when the timeout is hit enter STOP_SIGKILL rather
than FINAL_SIGKILL.
Just calling service_enter_dead() does not kill any processes.
As a result, the old process may still be running when the new one is
started.
After a watchdog failure the service is in an undefined state.
Using the normal shutdown mechanism makes no sense. Instead all processes
are just killed and the service can try to restart.
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
Let's say you have two initscripts, A and B:
A contains in its LSB header:
Required-Start: C
and B contains in its LSB header:
Provides: C
When systemd is parsing /etc/rc.d/, depending on the file order, you
can end up with either:
- B is parsed first. An unit "C.service" will be "created" and will be
added as additional name to B.service, with unit_add_name. No bug.
- A is parsed first. An unit "C.service" is created for the
"Required-Start" dependency (it will have no file attached, since
nothing provides this dependency yet). Then B is parsed and when trying
to handle "Provides: C", unit_add_name is called but will fail, because
"C.service" already exists in manager->units. Therefore, a merge should
occur for that case.
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
There are very few differences in the implementations of the kill method in the
unit types that have one. Let's unify them.
This does not yet unify unit_kill() with unit_kill_context().
Dropping the distribution specific #ifdefs in
88516c0c95 broke the .sh suffix stripping
since we now always used the else clause of the rc. check.
We eventually want to drop the rc. prefix stripping, but for now we
assume that no sysv init script uses both an rc. prefix and .sh suffix,
so make the check for the .sh suffix and rc. prefix mutually exclusive.
When watches are installed from the bottom, it is always possible
to race, and miss a file creation event. The race can be avoided
if a watch is first established for a parent directory, and then for
the file in the directory. If the file is created in the time between,
the watch on the parent directory will fire.
Some messages (mostly at debug level) are added to help diagnose
pidfile issues.
Should fix https://bugzilla.redhat.com/show_bug.cgi?id=917075.
Now, actually check if the environment variable names and values used
are valid, before accepting them. With this in place are at some places
more rigid than POSIX, and less rigid at others. For example, this code
allows lower-case environment variables (which POSIX suggests not to
use), but it will not allow non-UTF8 variable values.
All in all this should be a good middle ground of what to allow and what
not to allow as environment variables.
(This also splits out all environment related calls into env-util.[ch])
A watchdog notification may be handled after the watchdog timer was stopped
while stopping the service. As a result the timer is restarted and the
service may be restarted as well.
The watchdog timestamp is initially set during startup in
service_enter_start_post() and cleared when the timer is stopped. Therefore
it can be used as an indication if the timer should be reset.
For services without ExecStop= the state SERVICE_STOP is never entered. as
a result the watchdog timer is not stopped and the service is restarted (if
it is configuered to restart).
Stopping the watchdog timer for SERVICE_STOP_SIGTERM as well fixes this.
This makes sure that a service is not indefinitely restarted in a tight
loop if it fails before it is able to process its socket.
This corrects the breakage introduced with
8d1b002a2e. Shame on me.
We no longer allow early-boot init scripts, however in late boot the
syslog socket and local mounts are established anyway, so let's simplify
our dep graph a bit.
If $syslog doesn't resolve to syslog.target anymore there's no reason to
keep syslog.target around anymore. Let's remove it.
Note that many 3rd party service unit files order themselves after
syslog.target. These will be dangling dependencies now, which should be
unproblematic, however.
Systemd should not introduce any new facilities. Distributions which still
need to support their non-standard/legacy facilities should add them as
patches to their packaging.
The following facilities are no longer recognized:
$x-display-manager
$mail-transfer-agent
$mail-transport-agent
$mail-transfer-agent
$smtp
$null
This target is no longer available:
mail-transfer-agent.target
TARGET_UBUNTU is effectively the same as TARGET_DEBIAN. Given the Ubuntu
is unlikely to use systemd anytime soon there's no point in keeping this
separate.
This remove distro-specific support for early-boot SysV init scripts.
(And leaves support for normal SysV scripts untouched).
If distributions wish to continue to allow early-boot SysV scripts in
their distribution-specific way they should either maintain this patch
downstream manually, or write a generator for them, or simply ship all
those scripts with a .service wrapper.
As it turns out reboot() doesn't actually imply a file system sync, but
only a disk sync. Accordingly, readd explicit sync() invocations
immediately before we invoke reboot().
This is much less dramatic than it might sounds as we umount all
disks/read-only remount them anyway before going down.
This was premarily intended to support the LSB facility $httpd which is
only known by Fedora, and a bad idea since it lacks any real-life
usecase.
Similar, drop support for some other old Fedora-specific facilities.
Also, document the rules for introduction of new facilities, to clarify
the situation for the future.
This commit checks for a usage line which contains [{|]reload[|}"] (to
not errnously match force-reload).
Heuristics like this suck, but it solves a real problem and there
appears to be no better way...
Use cases:
* iptables.service – atomically reload rules without having to flush
them beforehand (which may leave the system insecure if reload fails)
* rpc-nfsd.service – reexport filesystems after /etc/exports update
without completely stopping and restarting nfsd
(In both cases, the actual service is provided by a kernel module and
does not have any associated user-space processes, thus Type=oneshot.)
Instead of doing hand optimized fd bisect arrays just use plain old
hashmaps. Now I can understand my own code again. Yay!
As a side effect this should fix some bad memory accesses caused by
accesses after mmap(), introduced in 189.
Note: I did s/MANAGER/SYSTEMD/ everywhere, even though it makes the
patch quite verbose. Nevertheless, keeping MANAGER prefix in some
places, and SYSTEMD prefix in others would just lead to confusion down
the road. Better to rip off the band-aid now.
In many cases this might have a negative effect since we drop escaping
from strings where we better shouldn't have dropped it.
If unescaping makes sense for some settings we can readd it later again,
on a per-case basis.
https://bugs.freedesktop.org/show_bug.cgi?id=54522
In some cases, like wrong configuration, restarting after error
does not help, so administrator can specify statuses by RestartPreventExitStatus
which will not cause restart of a service.
Sometimes you have non-standart exit status, so this can be specified
by SuccessfulExitStatus.
There is no point in clearing the bits of a "struct stat" when the very
next statement just calls stat or fstat to fill in that same memory.
[zj: two more places]
It made no sense, and since we are documenting the bus calls now and
want to include them in our stability promise we really should get it
cleaned up sooner, not later.
When an automatic restart is already queued, then make subsequent start
jobs wait until the restart can be handled (i.e. after the holdhoff
time), instead of simply fail.
all other dependencies are in 3rd person. Change BindTo= accordingly to
BindsTo=.
Of course, the dependency is widely used, hence we parse the old name
too for compatibility.
With misconfigured mysql, which uses Restart=always, the following two
messages would loop indefinitely and the "systemctl start" would never
finish:
Job pending for unit, delaying automatic restart.
mysqld.service holdoff time over, scheduling restart.
In service_enter_dead() always set the state to SERVICE_FAILED/DEAD first
before setting SERVICE_AUTO_RESTART. This is to allow running jobs to
complete. OnFailure will be also triggered at this point, so there's no
need to do it again from service_stop() (where it was added in commit
f0c7b229).
Note that OnFailure units should better trigger only after giving up
auto-restarting, but that's for another patch to solve.
https://bugzilla.redhat.com/show_bug.cgi?id=832039
This option never made much sense. It was originally intended to make
sure that the usual startup output of sysv scripts goes to the terminal.
However, since SysV scripts started from a terminal would not output to
that terminal, but rather /dev/console this effect was more often than
not actually taking place. Nowadays systemd has much nicer boot time
status output than SysV which makes the sysv output redundant. Finally,
all output of services goes to the journal anyway, and is not lost.
Hence, let's drop this option, and simplify things a bit.
As described in
https://bugs.freedesktop.org/show_bug.cgi?id=50184
the journal currently doesn't set fields such as _SYSTEMD_UNIT
properly for messages coming from processes that have already
terminated. This means among other things that "systemctl status" may
not show some of the output of services that wrote messages just
before they exited.
This patch fixes this by having processes that log to the journal
write their unit identifier to journald when the connection to
/run/systemd/journal/stdout is opened. Journald stores the unit ID
and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained
normally (i.e. from the cgroup). To prevent impersonating another
unit, this information is only used when the caller is root.
This doesn't fix the general problem of getting metadata about
messages from terminated processes (which requires some kernel
support), but it allows "systemctl status" and similar queries to do
the Right Thing for units that log via stdout/stderr.
We want to avoid a deadlock when a service has ExecStartPre= programs
that wait for the job queue to run empty because of Type=idle, but which
themselves keep the queue non-empty because START_PRE was considered
ACTIVATING and hence the job not complete. With this patch we alter the
state translation table so that it is impossible ever to wait for
Type=idle unit, hence removing the deadlock.
UnitPath= is also writable via native units and may be used by generators
to clarify from which file a unit is generated. This patch also hooks up
the cryptsetup and fstab generators to set UnitPath= accordingly.
When service_stop() handles a service in the SERVICE_AUTO_RESTART state,
it calls service_set_state() to transition it to the SERVICE_DEAD state.
However if the service failed, it should transition it to SERVICE_FAILED
instead, which will trigger its OnFailure units. To achieve this, we now
call service_enter_dead() in place of service_set_state(), which will
transition the service to either SERVICE_DEAD or SERVICE_FAILED as is
appropriate.
Also, some misleading comments are adjusted: service_stop() is not only
called on a user request, but also during an automatic restart in order
to handle dependencies.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=45511
Instead of generic "Starting..." and "Started" messages for all unit use
type-dependent messages. For example, mounts will announce "Mounting..."
and "Mounted".
Add status messages to units of types that used to be entirely silent
(automounts, sockets, targets, devices). For unit types whose jobs are
instantaneous, report only the job completion, not the starting event.
Socket units with non-instantaneous jobs are rare (Exec*= is not used
often in socket units), so I chose not to print the starting messages
for them either.
This will hopefully give people better understanding of the boot.
The kernel will only notify us of cgroups running empty if no subcgroups
exist anymore. Hence make sure we don't leave our own control/ subcgroup
around longer than necessary.
https://bugzilla.redhat.com/show_bug.cgi?id=818381
Type=idle is much like Type=simple, however between the fork() and the
exec() in the child we wait until PID 1 informs us that no jobs are
left.
This is mostly a cosmetic fix to make gettys appear only after all boot
output is finished and complete.
Note that this does not impact the normal job logic as we do not delay
the completion of any jobs. We just delay the invocation of the actual
binary, and only for services that otherwise would be of Type=simple.
Previously, we were brutally and onconditionally killing all processes
in a service's cgroup before starting the service anew, in order to
ensure that StartPre lines cannot be misused to spawn long-running
processes.
On logind-less systems this has the effect that restarting sshd
necessarily calls all active ssh sessions, which is usually not
desirable.
With this patch control processes for a service are placed in a
sub-cgroup called "control/". When starting a service anew we simply
kill this cgroup, but not the main cgroup, in order to avoid killing any
long-running non-control processes from previous runs.
https://bugzilla.redhat.com/show_bug.cgi?id=805942
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.