Also add a bit of debugging output to help diagnose problems,
add missing units, and simplify cppflags.
Move test-engine to normal tests from manual tests, it should now
work without destroying the system.
As Zbigniew pointed out a new ConditionFirstBoot= appears like the nicer
way to hook in systemd-firstboot.service on first boots (those with /etc
unpopulated), so let's do this, and get rid of the generator again.
A new tool "systemd-firstboot" can be used either interactively on boot,
where it will query basic locale, timezone, hostname, root password
information and set it. Or it can be used non-interactively from the
command line when prepareing disk images for booting. When used
non-inertactively the tool can either copy settings from the host, or
take settings on the command line.
$ systemd-firstboot --root=/path/to/my/new/root --copy-locale --copy-root-password --hostname=waldi
The tool will be automatically invoked (interactively) now on first boot
if /etc is found unpopulated.
This also creates the infrastructure for generators to be notified via
an environment variable whether they are running on the first boot, or
not.
Only accept cpu quota values in percentages, get rid of period
definition.
It's not clear whether the CFS period controllable per-cgroup even has a
future in the kernel, hence let's simplify all this, hardcode the period
to 100ms and only accept percentage based quota values.
Similar to CPUShares= and BlockIOWeight= respectively. However only
assign the specified weight during startup. Each control group
attribute is re-assigned as weight by CPUShares=weight and
BlockIOWeight=weight after startup. If not CPUShares= or
BlockIOWeight= be specified, then the attribute is re-assigned to each
default attribute value. (default cpu.shares=1024, blkio.weight=1000)
If only CPUShares=weight or BlockIOWeight=weight be specified, then
that implies StartupCPUShares=weight and StartupBlockIOWeight=weight.
Running systemctl enable/disable/set-default/... with the --root
option under strace reveals that it accessed various files and
directories in the main fs, and not underneath the specified root.
This can lead to correct results only when the layout and
configuration in the container are identical, which often is not the
case. Fix this by adding the specified root to all file access
operations.
This patch does not handle some corner cases: symlinks which point
outside of the specified root might be interpreted differently than
they would be by the kernel if the specified root was the real root.
But systemctl does not create such symlinks by itself, and I think
this is enough of a corner case not to be worth the additional
complexity of reimplementing link chasing in systemd.
Also, simplify the code in a few places and remove an hypothetical
memory leak on error.
safe_close_pair() is more like safe_close(), except that it handles
pairs of fds, and doesn't make and misleading allusion, as it works
similarly well for socketpairs() as for pipe()s...
A service with PrivateNetwork= cannot access abstract namespace sockets
of the host anymore, hence let's better not use abstract namespace
sockets for this, since we want to make sure that PrivateNetwork=
is useful and doesn't break sd_notify().
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
The system state knows the states starting →
running/degraded/maintenance → stopping, where:
starting = system startup
running = normal operation
degraded = at least one unit is currently in failed state
maintenance = rescue/emergency mode is active or queued
stopping = system shutdown
When the manager receives a SIGUSR2 signal, it opens a memory stream
with open_memstream(), uses the returned file handle for logging, and
dumps the logged content with log_dump().
However, the char* buffer is only safe to use after the file handle has
been flushed with fflush, as the man pages states:
When the stream is closed (fclose(3)) or flushed (fflush(3)), the
locations pointed to by ptr and sizeloc are updated to contain,
respectively, a pointer to the buffer and the current size of the
buffer.
These values remain valid only as long as the caller performs no
further output on the stream. If further output is performed, then the
stream must again be flushed before trying to access these variables.
Without that call, dump remains NULL and the daemon crashes in
log_dump().
This is primarily useful for services that need to track clients which
reference certain objects they maintain, or which explicitly want to
subscribe to certain events. Something like this is done in a large
number of services, and not trivial to do. Hence, let's unify this at
one place.
This also ports over PID 1 to use this to ensure that subscriptions to
job and manager events are correctly tracked. As a side-effect this
makes sure we properly serialize and restore the track list across
daemon reexec/reload, which didn't work correctly before.
This also simplifies how we distribute messages to broadcast to the
direct busses: we only track subscriptions for the API bus and
implicitly assume that all direct busses are subscribed. This should be
a pretty OK simplification since clients connected via direct bus
connections are shortlived anyway.
Previously the returned object of constructor functions where sometimes
returned as last, sometimes as first and sometimes as second parameter.
Let's clean this up a bit. Here are the new rules:
1. The object the new object is derived from is put first, if there is any
2. The object we are creating will be returned in the next arguments
3. This is followed by any additional arguments
Rationale:
For functions that operate on an object we always put that object first.
Constructors should probably not be too different in this regard. Also,
if the additional parameters might want to use varargs which suggests to
put them last.
Note that this new scheme only applies to constructor functions, not to
all other functions. We do give a lot of freedom for those.
Note that this commit only changes the order of the new functions we
added, for old ones we accept the wrong order and leave it like that.
In some cases it is interesting to map a PID to two units at the same
time. For example, when a user logs in via a getty, which is reexeced to
/sbin/login that binary will be explicitly referenced as main pid of the
getty service, as well as implicitly referenced as part of the session
scope.
When a process dies that we can associate with a specific unit, start
watching all other processes of that unit, so that we can associate
those processes with the unit too.
Also, for service units start doing this as soon as we get the first
SIGCHLD for either control or main process, so that we can follow the
processes of the service from one to the other, as long as process that
remain are processes of the ones we watched that died and got reassigned
to us as parent.
Similar, for scope units start doing this as soon as the scope
controller abandons the unit, and thus management entirely reverts to
systemd. To abandon a unit introduce a new Abandon() scope unit method
call.
This reverts commit 28c758de94
but makes job_coldplug smarter.
In (v1) I changed the job start timestamp to be always set, so the
start time can be reported in the cylon eye message. The bug was that
when deserializing jobs, they would be ignored if their start
timestamp was unset which was synonymous with no timeout. But after
the change, jobs would have a start timestamp set despite having no
timeout. After deserialization they would be considered immediately
expired. Fix this by checking if the timeout is not zero when
considering jobs for expiration.
When set to auto, status will shown when the first ephemeral message
is shown (a job has been running for five seconds). Then until the
boot or shutdown ends, status messages will be shown.
No indication about the switch is done: I think it should be clear
for the user that first the cylon eye and the ephemeral messages appear,
and afterwards messages are displayed.
The initial arming of the event source was still wrong, but now should
really be fixed.
It would fire just once.
Also fix units from sec to usec as appropriate.
Decrease the switching interval to 1/3 s, so that when the time
remaining is displayed with 1s precision, it doesn't jump by 2s every
once in a while. Also, the system is feels noticably faster when the
status changes couple of times per second instead of every few
seconds.
Produces output like:
[ *** ] (1 of 2) A start job is running for slow.service (33s / 1min 30s)
The first nubmer is the time since job start, the second is the job timeout.
It is nicer to predefine patterns using configure time check instead of
using casts everywhere.
Since we do not need to use any flags, include "%" in the format instead
of excluding it like PRI* macros.
Information about signals which are not routinely received by systemd
are printed at info level. This should make it easier to see what is
happening in the system.
The only problem is that libgen.h #defines basename to point to it's
own broken implementation instead of the GNU one. This can be fixed
by #undefining basename.
Since we want to retain the ability to break kernel ←→ userspace ABI
after the next release, let's not make use by default of kdbus, so that
people with future kernels will not suddenly break with current systemd
versions.
kdbus support is left in all builds but must now be explicitly requested
at runtime (for example via setting $DBUS_SESSION_BUS). Via a configure
switch the old behaviour can be restored. In fact, we change autogen.sh
to do this, so that git builds (which run autogen.sh) get kdbus by
default, but tarball builds (which ue the configure defaults) do not get
it, and hence this stays out of the distros by default.
This way, we can avoid executing two /bin/swapon jobs to be dispatched
for the same swap device if it is configured for two different paths.
Previously we were just tracking the device nodes of active swap
devices, which would not allow us to recognize the identity of two swap
devices before they are active.
https://bugs.freedesktop.org/show_bug.cgi?id=69835
Given that plymouth listens on an abstract namespace socket and if
CLONE_NEWNET is not used the abstract namespace is shared with the host
we might actually end up send plymouth data to the host.
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
$ touch src/core/dbus.c; make CFLAGS=-O0
make --no-print-directory all-recursive
Making all in .
CC src/core/libsystemd_core_la-dbus.lo
CCLD libsystemd-core.la
$ touch src/core/dbus.c; make CFLAGS=-Og
make --no-print-directory all-recursive
Making all in .
CC src/core/libsystemd_core_la-dbus.lo
src/core/dbus.c: In function 'init_registered_system_bus':
src/core/dbus.c:798:18: warning: 'id' may be used uninitialized in this function [-Wmaybe-uninitialized]
dbus_free(id);
^
CCLD libsystemd-core.la
-Og Optimize debugging experience. -Og enables optimizations that do
not interfere with debugging. It should be the optimization level of
choice for the standard edit-compile-debug cycle, offering a
reasonable level of optimization while maintaining fast compilation
and a good debugging experience.
Also, we need to use proper strv_env_xyz() calls when putting together
the environment array, since otherwise settings won't be properly
overriden.
And let's get rid of strv_appendf(), is overkill and there was only one
user.
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.
This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.
This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.
With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.
https://bugs.freedesktop.org/show_bug.cgi?id=69740
Prefer firmware-provided performance data over loader-exported ones; if
ACPI data is available, always use it, otherwise try to read the loader
data.
The firmware-provided variables start at the time the first EFI image
is executed and end when the operating system exits the boot services;
the (loader) time calculated in systemd-analyze increases.
Stop importing non-sensical kernel-exported variables. All
parameters in the kernel command line are exported to the
initial environment of PID1, but suppressed if they are
recognized by kernel built-in code. The EFI booted kernel
will add further kernel-internal things which do not belong
into userspace.
The passed original environ data of the process is not touched
and preserved across re-execution, to allow external reading of
/proc/self/environ for process properties like container*=.
Make Type=idle communication bidirectional: when bootup is finished,
the manager, as before, signals idling Type=idle jobs to continue.
However, if the boot takes too long, idling jobs signal the manager
that they have had enough, wait a tiny bit more, and continue, taking
ownership of the console. The manager, when signalled that Type=idle
jobs are done, makes a note and will not write to the console anymore.
This is a cosmetic issue, but quite noticable, so let's just fix it.
Based on Harald Hoyer's patch.
https://bugs.freedesktop.org/show_bug.cgi?id=54247http://unix.stackexchange.com/questions/51805/systemd-messages-after-starting-login/
Since we'll unload all units/job during a reload, and then readd them it
is really useful for clients to be aware of this phase hence sent a
signal out before and after. This signal is called "Reloading" (despite
the fact that it is also sent out during reexecution, which we consider
a special case in this context) and has one boolean parameter which is
true for the signal sent before the reload, and false for the signal
after the reload. The UnitRemoved/JobRremoved and UnitNew/JobNew due to
the reloading are guranteed to be between the pair of Reloading
messages.
Since we should allow registering/unregistering transient units with the
same name in a tight-loop, we need to make the GC more aggressive, so
that dead units are cleaned up immediately instead of later.
hence, execute the GC sweep on every event loop iteration and clean up
units. This of course, means we need to be careful with adding units to
the GC queue, which we already are since we execute check_gc() of each
unit type already when adding something to the queue.
Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).
Transient units require a unique name, that previously had no
configuration file on disk.
A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:
$ systemd-run /bin/ping www.heise.de
Will cause systemd to create a new transient service and run ping in it.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
This complements existing functionality of setting variables
through 'systemctl set-environment', the kernel command line,
and through normal environment variables for systemd in session
mode.
This will add another color to the legend called "Loading unit files"
Like the generators it will mark a part of the systemd bar indicating
the time spent while loading unit files.
This partially reverts commit c3a170f3, which moved
efi_get_boot_timestamps too early in main(), before
/sys is assured to be mounted
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64371
[tomegun: in particular /sys/firmware/efi/efivars needs to be
mounted, which is not a problem if a systemd-initramfs containing
the correct module is being used. But not everyone uses an
initramfs...]
When a trigger unit wants to know if a stop is queued for it, we should
just check precisely that and do not check whether it is actually
stopped already. This is because we use these checks usually from state
change calls where the state variables are not updated yet.
This change splits unit_pending_inactive() into two calls
unit_inactive_or_pending() and unit_stop_pending(). The former checks
state and pending jobs, the latter only pending jobs.
The time for systemd initialization and selinux policy loading
is accounted to the initrd or the kernel, which is wrong.
Instead of:
Startup finished in 5.559s (firmware) + 36ms (loader) + 665ms (kernel) +
975ms (initrd) + 1.410s (userspace) = 8.647s
the more correct output is:
Startup finished in 5.559s (firmware) + 36ms (loader) + 665ms (kernel) +
475ms (initrd) + 1.910s (userspace) = 8.647s
They are irrelevant and misleading.
E.g. systemd-analyze:
Startup finished in 6d 4h 15min 32.330s (kernel) + 49ms 914us (userspace) = 6d 4h 15min 32.380s
becomes
Startup finished in 53.735ms (userspace) = 53.735ms
which looks much better :)
When switching root, i.e. LANG can be set to the locale of the initramfs
or "C", if it was unset. When systemd deserializes LANG in the real root
this would overwrite the setting previously gathered by locale_set().
To reproduce, boot with an initramfs without locale.conf or change
/etc/locale.conf to a different language than the initramfs and check a
daemon started by systemd:
$ tr "$\000" '\n' </proc/$(pidof sshd)/environ | grep LANG
LANG=C
To prevent that, serialization of environment variables is skipped, when
serializing for switching root.
https://bugzilla.redhat.com/show_bug.cgi?id=949525
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
Instead of outputting "5h 55s 50ms 3us" we'll now output "5h
55.050003s". Also, while outputting the accuracy is configurable.
Basically we now try use "dot notation" for all time values > 1min. For
>= 1s we use 's' as unit, otherwise for >= 1ms we use 'ms' as unit, and
finally 'us'.
This should give reasonably values in most cases.
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
Harald encountered division by zero in manager_print_jobs_in_progress.
Clearly we had the watch enabled when we shouldn't - there were no
running jobs in m->jobs, only waiting ones. This is either a deadlock,
or maybe some of them would be detected as runnable in the next dispatch
of the run queue. In any case we mustn't crash.
Fix it by starting and stopping the watch based on n_running_jobs
instead of the number of all jobs.
All active units will call unit_notify() during coldplug, so we just
make sure we're counting from zero again and get the correct result for
n_on_console.
For n_running_jobs we likewise reset it to zero and then count
the running jobs as we encounter them in deserialization.
Sometimes the boot gets stuck until a timeout hits. The usual timeouts
are on the order of minutes, so users may lose patience.
Print animated status messages telling the names of units with running
jobs to make it easy to see what systemd is waiting for.
The animation looks cooler with a shorter interval, but 1 s is OK and
should not be too hard on slow serial console users.
unit_status_printf() checks the state of the manager, not of the unit
as such. Move it to manager.c and rename it to manager_status_printf().
Temporarily keep unit_status_printf as a wrapper macro.
Add a new job mode: replace-irreversibly. Jobs enqueued using this mode
cannot be implicitly canceled by later enqueued conflicting jobs.
They can however still be canceled with an explicit "systemctl cancel"
call.
The ability to start a new unit with 'systemctl start ...' should not
depend on whether there are other units in the directory. Previously,
an additional 'systemctl daemon-reload' would be necessary to tell
systemd to update the list of unit lookup paths.
This allows us to print simple performance data of all parts of the boot now:
- firmware
- boot loader
- kernel
- initrd
- userspace
This only works for bootloaders which support passing TSC data via EFI
variables. As of now that's only gummiboot.
The MESSAGE_ID=... stanza will appear in countless number of places.
It is just too long to write it out in full each time.
Incidentally, this also fixes a typo of MESSSAGE is three places.
As audit is pretty much just a special kind of logging we should treat
it similar, and manage the audit fd in a static variable.
This simplifies the audit fd sharing with the SELinux access checking
code quite a bit.
Note: I did s/MANAGER/SYSTEMD/ everywhere, even though it makes the
patch quite verbose. Nevertheless, keeping MANAGER prefix in some
places, and SYSTEMD prefix in others would just lead to confusion down
the road. Better to rip off the band-aid now.
This only adds the fields to the D-Bus interfaces but doesn't fill them
in with anything useful yet. Gummiboot exposes the necessary bits of
information to use however and as soon as I get my fingers on a proper
UEFI laptop I'll hook up the remaining bits.
Since we want to stabilize the D-Bus interface soon and include it in
the stability promise we should get the last fixes in, hence this change
now.
also a number of minor fixups and bug fixes: spelling, oom errors
that didn't print errors, not properly forwarding error codes,
few more consistency issues, et cetera
glibc/glib both use "out of memory" consistantly so maybe we should
consider that instead of this.
Eliminates one string out of a number of binaries. Also fixes extra newline
in udev/scsi_id
sd_notify() should work for daemons that chroot() as part of their
initilization, hence it's a good idea to use an abstract namespace
socket which is not affected by chroot.
This adds a timeout if the TTY cannot be acquired and makes sure we
always output the question to the console, never to the TTY of the
respective service.
Previously generated units were always placed at the end of the search
path. With this change there will be three unit dirs instead of one, to
place generated entries at the beginning, in the middle and at the end
of the search path:
beginning: for units that need to override all configuration, regardless
of user or vendor. Example use: system-update-generator uses this to
temporarily redirect default.target.
middle: for units that need to override vendor configuration, but not
vendor configuration. Example use: /etc/fstab should override vendor
supplied configuration (think /tmp), but should not override native user
configuration.
end: does not override anything but is available as well. Possible usage
might be to convert D-Bus bus service files to native units but allowing
vendor supplied native units to win.
We need to be able to show the properties even of inactive units.
systemctl loads the unit before getting its properties, but this is racy
as the garbage collector may kick in right after the loading.
Fix it by always loading the unit before handling a message for it.
https://bugzilla.redhat.com/show_bug.cgi?id=814966#c6
Two of our current job types are special:
JOB_TRY_RESTART, JOB_RELOAD_OR_START.
They differ from other job types by being sensitive to the unit active state.
They perform some action when the unit is active and some other action
otherwise. This raises a question: when exactly should the unit state be
checked to make the decision?
Currently the unit state is checked when the job becomes runnable. It's more
sensible to check the state immediately when the job is added by the user.
When the user types "systemctl try-restart foo.service", he really intends
to restart the service if it's running right now. If it isn't running right
now, the restart is pointless.
Consider the example (from Bugzilla[1]):
sleep.service takes some time to start.
hello.service has After=sleep.service.
Both services get started. Two jobs will appear:
hello.service/start waiting
sleep.service/start running
Then someone runs "systemctl try-restart hello.service".
Currently the try-restart operation will block and wait for
sleep.service/start to complete.
The correct result is to complete the try-restart operation immediately
with success, because hello.service is not running. The two original
jobs must not be disturbed by this.
To fix this we introduce two new concepts:
- a new job type: JOB_NOP
A JOB_NOP job does not do anything to the unit. It does not pull in any
dependencies. It is always immediately runnable. When installed to a unit,
it sits in a special slot (u->nop_job) where it never conflicts with
the installed job (u->job) of a different type. It never merges with jobs
of other types, but it can merge into an already installed JOB_NOP job.
- "collapsing" of job types
When a job of one of the two special types is added, the state of the unit
is checked immediately and the job type changes:
JOB_TRY_RESTART -> JOB_RESTART or JOB_NOP
JOB_RELOAD_OR_START -> JOB_RELOAD or JOB_START
Should a job type JOB_RELOAD_OR_START appear later during job merging, it
collapses immediately afterwards.
Collapsing actually makes some things simpler, because there are now fewer
job types that are allowed in the transaction.
[1] Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=753586
Type=idle is much like Type=simple, however between the fork() and the
exec() in the child we wait until PID 1 informs us that no jobs are
left.
This is mostly a cosmetic fix to make gettys appear only after all boot
output is finished and complete.
Note that this does not impact the normal job logic as we do not delay
the completion of any jobs. We just delay the invocation of the actual
binary, and only for services that otherwise would be of Type=simple.
manager.c takes care of the main loop, unit management, signal handling, ...
transaction.c computes transactions.
After split:
manager.c: 65 KB
transaction.c: 40 KB
This makes it obvious that transactions are short-lived. They are created in
manager_add_job() and destroyed after the application of jobs.
It also prepares for a split of the transaction code to a new source.
Split the uninstallation of the job from job_free() into a separate function.
Adjust the callers.
job_free() now only works on unlinked and uninstalled jobs. This enforces clear
thinking about job lifetimes.
job_free() is IMO too helpful when it unlinks the job from the transaction.
The callers should ensure the job is already unlinked before freeing.
The added assertions check if anyone gets it wrong.
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.