Commit graph

101 commits

Author SHA1 Message Date
Lennart Poettering efdb02375b core: unified cgroup hierarchy support
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.

A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).

It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.

The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.

This patch also removes cg_delete() which is unused now.

On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.

This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.

This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.

The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.

To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.

This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.

When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
2015-09-01 23:52:27 +02:00
Lennart Poettering d79200e26e unit: unify how we assing slices to units
This adds a new call unit_set_slice(), and simplifies
unit_add_default_slice(). THis should make our code a bit more robust
and simpler.
2015-08-31 13:20:43 +02:00
Lennart Poettering 35b7ff80e2 unit: add new macros to test for unit contexts 2015-08-31 13:20:43 +02:00
Lennart Poettering 21b735e798 core: add unit_dbus_interface_from_type() to unit-name.h
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.

Also ports over machinectl to make use of this.
2015-08-28 02:10:10 +02:00
Daniel Mack bbc2908635 core: dbus: track bus names per unit
Currently, PID1 installs an unfiltered NameOwnerChanged signal match, and
dispatches the signals itself. This does not scale, as right now, PID1
wakes up every time a bus client connects.

To fix this, install individual matches once they are requested by
unit_watch_bus_name(), and remove the watches again through their slot in
unit_unwatch_bus_name().

If the bus is not available during unit_watch_bus_name(), just store
name in the 'watch_bus' hashmap, and let bus_setup_api() do the installing
later.
2015-08-06 10:14:41 +02:00
Michal Schmidt d1a34ae9c2 core: fix confusing logging of instantaneous jobs
For instantaneous jobs (e.g. starting of targets, sockets, slices, or
Type=simple services) the log shows the job completion
before starting:

        systemd[1]: Created slice -.slice.
        systemd[1]: Starting -.slice.
        systemd[1]: Created slice System Slice.
        systemd[1]: Starting System Slice.
        systemd[1]: Listening on Journal Audit Socket.
        systemd[1]: Starting Journal Audit Socket.
        systemd[1]: Reached target Timers.
        systemd[1]: Starting Timers.
        ...

The reason is that the job completes before the ->start() method returns
and only then does unit_start() print the "Starting ..." message.
The same thing happens when stopping units.

Rather than fixing the order of the messages, let's just not emit the
Starting/Stopping message at all when the job completes instantaneously.
The job completion message is sufficient in this case.
2015-07-21 15:09:12 +02:00
Lennart Poettering ed10fa8ce2 unit: drop support for pre-v44 job serialization
No distro ships that old systemd versions anyway, hence let's drop
support for live-upgrades for them. Offline updates are still supported.
And live-upgrades will only lose the job queue, hence basically still
work...
2015-05-19 16:41:14 +02:00
Lennart Poettering 67bfdc9771 core: also enforce ratelimiter if we stop a unit due to BindsTo=
This extends on bea355dac9, and extends
the ratelimiter to not only be used for StopWhenUnneeded=1 units but
also for units that have BindsTo= on a unit that is dead.

http://lists.freedesktop.org/archives/systemd-devel/2015-April/030224.html
2015-05-19 16:23:14 +02:00
Lennart Poettering f8a30ce524 core: use bitfield where possible 2015-05-19 16:03:01 +02:00
Lennart Poettering bea355dac9 core: enforce a ratelimiter when stopping units due to StopWhenUnneeded=1
Otherwise we might end up in an endless stop loop.

http://lists.freedesktop.org/archives/systemd-devel/2015-April/030224.html
2015-05-19 16:00:24 +02:00
Lennart Poettering 8b4305c735 unit: move unit_warn_if_dir_nonempty() and friend to unit.c
The call is only used by the mount and automount unit types, but that's
already enough to consider it generic unit functionality, hence move it
out of mount.c and into unit.c.
2015-05-11 22:28:52 +02:00
Lennart Poettering f2341e0a87 core,network: major per-object logging rework
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.

Also contains a couple of other logging improvements:

- Drops a couple of strerror() invocations in favour of using %m.

- Not only .mount units now warn if a symlinks exist for the mount
  point already, .automount units do that too, now.

- A few invocations of log_struct() that didn't actually pass any
  additional structured data have been replaced by simpler invocations
  of log_unit_info() and friends.

- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
  that works like LOG_MESSAGE() but prefixes the message with the unit
  name. Similar, there's now LOG_LINK_MESSAGE() and
  LOG_NETDEV_MESSAGE().

- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
  LOG_NETDEV_INTERFACE() macros have been added that generate the
  necessary per object fields. The old log_unit_struct() call has been
  removed in favour of these new macros used in raw log_struct()
  invocations. In addition to removing one more function call this
  allows generated structured log messages that contain two object
  fields, as necessary for example for network interfaces that are
  joined into another network interface, and whose messages shall be
  indexed by both.

- The LOG_ERRNO() macro has been removed, in favour of
  log_struct_errno(). The latter has the benefit of ensuring that %m in
  format strings is properly resolved to the specified error number.

- A number of logging messages have been converted to use
  log_unit_info() instead of log_info()

- The client code in sysv-generator no longer #includes core code from
  src/core/.

- log_unit_full_errno() has been removed, log_unit_full() instead takes
  an errno now, too.

- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
  avoid double evaluation of their parameters
2015-05-11 22:24:45 +02:00
Lennart Poettering 1c2e9646e4 core: simplify unit type detection logic
Introduce a new call unit_type_supported() and make use of it
everywhere.

Also, drop Manager parameter from per-type supported method prototype.
2015-04-30 01:29:00 +02:00
Lennart Poettering f78f265f40 core: always coldplug units that are triggered by other units before those
Let's make sure that we don't enqueue triggering jobs for units before
those units are actually fully loaded.

http://lists.freedesktop.org/archives/systemd-devel/2015-April/031176.html
https://bugs.freedesktop.org/show_bug.cgi?id=88401
2015-04-24 16:14:46 +02:00
Lennart Poettering be847e82cf Revert "core: do not spawn jobs or touch other units during coldplugging"
This reverts commit 6e392c9c45.

We really shouldn't invent external state keeping hashmaps, if we can
keep this state in the units themselves.
2015-04-24 15:51:10 +02:00
Lennart Poettering 4940c0b0b6 service: make kill operation mapping explicit 2015-04-21 02:17:01 +02:00
Ivan Shapovalov 6e392c9c45 core: do not spawn jobs or touch other units during coldplugging
Because the order of coldplugging is not defined, we can reference a
not-yet-coldplugged unit and read its state while it has not yet been
set to a meaningful value.

This way, already active units may get started again.

We fix this by deferring such actions until all units have been at
least somehow coldplugged.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401
2015-03-07 08:44:57 -05:00
Lennart Poettering 5ad096b3f1 core: expose consumed CPU time per unit
This adds support for showing the accumulated consumed CPU time per-unit
in the "systemctl status" output. The property is also readable via the
bus.
2015-03-02 12:15:25 +01:00
Thomas Hindoe Paaboel Andersen 2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Thomas Hindoe Paaboel Andersen c1ff5570f4 Add missing includes in header files
This fixes various issues found by globally reordering the include
sections of all .c files.
2015-02-12 20:44:32 +01:00
Lennart Poettering a354329f72 core: add new logic for services to store file descriptors in PID 1
With this change it is possible to send file descriptors to PID 1, via
sd_pid_notify_with_fds() which PID 1 will store individually for each
service, and pass via the usual fd passing logic on next invocation.
This is useful for enable daemon reload schemes where daemons serialize
their state to /run, push their fds into PID 1 and terminate, restoring
their state on next start from the data in /run and passed in from PID
1.

The fds are kept by PID 1 as long as no POLLHUP or POLLERR is seen on
them, and the service they belong to are either not dead or failed, or
have a job queued.
2015-01-06 03:16:39 +01:00
Lennart Poettering 0faacd470d unit: handle nicely of certain unit types are not supported on specific systems
Containers do not really support .device, .automount or .swap units;
Systems compiled without support for swap do not support .swap units;
Systems without kdbus do not support .busname units.

With this change attempts to start a unsupported unit types will result
in an immediate "unsupported" job result, which is a lot more
descriptive then before. Also, attempts to start device units in
containers will now immediately fail instead of causing jobs to be
enqueued that never go away.
2014-12-15 19:02:17 +01:00
Torstein Husebø ee33e53a70 core: correct spacing near eol in code comments 2014-12-11 15:09:51 +01:00
Lennart Poettering d2dc52dbc4 systemctl: show unit file preset state in "systemctl status" output" 2014-12-02 13:23:04 +01:00
Michal Schmidt b2dc4e44c5 core: add log_unit_*_errno() macros 2014-11-28 13:29:21 +01:00
Lennart Poettering e2cc6eca73 log: fix order of log_unit_struct() to match other logging calls
Also, while we are at it, introduce some syntactic sugar for creating
ERRNO= and MESSAGE= structured logging fields.
2014-11-28 02:18:46 +01:00
Lennart Poettering 79008bddf6 log: rearrange log function naming
- Rename log_meta() → log_internal(), to follow naming scheme of most
  other log functions that are usually invoked through macros, but never
  directly.

- Rename log_info_object() to log_object_info(), simply because the
  object should be before any other parameters, to follow OO-style
  programming style.
2014-11-27 22:05:24 +01:00
Lennart Poettering 086891e5c1 log: add an "error" parameter to all low-level logging calls and intrdouce log_error_errno() as log calls that take error numbers
This change has two benefits:

- The format string %m will now resolve to the specified error (or to
  errno if the specified error is 0. This allows getting rid of a ton of
  strerror() invocations, a function that is not thread-safe.

- The specified error can be passed to the journal in the ERRNO= field.

Now of course, we just need somebody to convert all cases of this:

        log_error("Something happened: %s", strerror(-r));

into thus:

        log_error_errno(-r, "Something happened: %m");
2014-11-27 22:05:23 +01:00
Lennart Poettering 134e56dcc5 shared: rename condition-util.[ch] to condition.[ch]
Now that we only have one file with condition implementations around, we
can drop the -util suffix and simplify things a bit.
2014-11-06 14:21:11 +01:00
Lennart Poettering 493657337a core: get rid of condition.c and move the remaining call into util.c
That way only one file with condition code remaining, in src/shared/,
rather than src/core/.

Next step: dropping the "-util" suffix from condition-util.[ch].
2014-11-06 14:21:11 +01:00
Lennart Poettering 59fccdc587 core: introduce the concept of AssertXYZ= similar to ConditionXYZ=, but fatal for a start job if not met 2014-11-06 14:21:11 +01:00
Umut Tezduyar Lindskog db2cb23b5b core: send sigabrt on watchdog timeout to get the stacktrace
if sigabrt doesn't do the job, follow regular shutdown
routine, sigterm > sigkill.
2014-10-28 17:37:39 +01:00
Lennart Poettering f189ab18de job: optionally, when a job timeout is hit, also execute a failure action 2014-10-28 02:19:55 +01:00
Zbigniew Jędrzejewski-Szmek 7c52a17b1a Rearrange Unit to make pahole happy
After all we have lots of those.
2014-10-25 15:34:48 -04:00
Lukas Nykryn cb87a73b45 unit: move UnitDependency to unit-name 2014-10-08 12:44:00 +02:00
Lennart Poettering 598459ceba core: rework context initialization/destruction logic
Let's automatically initialize the kill, exec and cgroup contexts of the
various unit types when the object is constructed, instead of
invididually in type-specific code.

Also, when PrivateDevices= is set, set DevicePolicy= to closed.
2014-03-19 21:06:53 +01:00
Lennart Poettering 085afe36cb core: add global settings for enabling CPUAccounting=, MemoryAccounting=, BlockIOAccounting= for all units at once 2014-02-24 23:50:10 +01:00
Lennart Poettering bc432dc7eb core: rework cgroup mask propagation
Previously a cgroup setting down tree would result in cgroup membership
additions being propagated up the tree and to the siblings, however a
unit could never lose cgroup memberships again. With this change we'll
make sure that both cgroup additions and removals propagate properly.
2014-02-17 15:49:21 +01:00
Lennart Poettering a911bb9ab2 core: watch SIGCHLD more closely to track processes of units with no reliable cgroup empty notifier
When a process dies that we can associate with a specific unit, start
watching all other processes of that unit, so that we can associate
those processes with the unit too.

Also, for service units start doing this as soon as we get the first
SIGCHLD for either control or main process, so that we can follow the
processes of the service from one to the other, as long as process that
remain are processes of the ones we watched that died and got reassigned
to us as parent.

Similar, for scope units start doing this as soon as the scope
controller abandons the unit, and thus management entirely reverts to
systemd. To abandon a unit introduce a new Abandon() scope unit method
call.
2014-02-07 15:14:36 +01:00
Zbigniew Jędrzejewski-Szmek 68db7a3bd9 core: add function to tell when job will time out
Things will continue when either the job timeout
or the unit timeout is reached. Add functionality to
access that info.
2014-01-27 01:23:16 -05:00
Lennart Poettering aec8de63b1 core: no need to list properties for PropertiesChanged messages anymore
Since the vtable includes this information anyway, let's just use that
2013-12-22 03:50:52 +01:00
Lennart Poettering e821075a23 bus: add .busname unit type to implement kdbus-style bus activation 2013-12-02 23:32:34 +01:00
Lennart Poettering 613b411c94 service: add the ability for units to join other unit's PrivateNetwork= and PrivateTmp= namespaces 2013-11-27 20:28:48 +01:00
Lennart Poettering d420282b28 core: replace OnFailureIsolate= setting by a more generic OnFailureJobMode= setting and make use of it where applicable 2013-11-26 02:26:31 +01:00
Lennart Poettering eeaedb7c26 core: include following set data in dump 2013-11-25 22:10:22 +01:00
David Strauss 6414b7c981 cgroups: Cache controller masks and optimize queues. 2013-11-22 11:22:47 +10:00
Lennart Poettering 718db96199 core: convert PID 1 to libsystemd-bus
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.

This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:

- Synthesizing of "Disconnected" messages when bus connections are
  severed.

- Support for attaching multiple vtables for the same interface on the
  same path.

This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.

As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
2013-11-20 20:52:36 +01:00
Lennart Poettering 9588bc3209 Remove dead code and unexport some calls
"make check-api-unused" informs us about code that is not used anymore
or that is exported but only used internally. Fix these all over the
place.
2013-11-08 18:12:45 +01:00
Lennart Poettering 44b601bc79 macro: clean up usage of gcc attributes
Always use our own macros, and name all our own macros the same style.
2013-10-16 06:14:59 +02:00
Lennart Poettering a57f7e2c82 core: rework how we match mount units against each other
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.

This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.

This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.

With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.

https://bugs.freedesktop.org/show_bug.cgi?id=69740
2013-09-26 20:20:30 +02:00