Commit graph

273 commits

Author SHA1 Message Date
Michal Koutný deb4e7080d service: Don't stop unneeded units needed by restarted service (#7526)
An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes.
If A stops before B's restart timeout expires, it'll be started again as part
of B's dependent jobs. However, if stopping takes longer than the timeout, B's
running stop job collides start job which also cancels B's start job. Result is
that neither A or B are active.

Currently, when a service with automatic restarting fails, it transitions
through following states:
        1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure,
        2) SERVICE_AUTO_RESTART while restart timer is running.

The StopWhenUnneeded= check takes place in service_enter_dead between the two
state mentioned above. We temporarily store the auto restart flag to query it
during the check. Because we don't return control to the main event loop, this
new service unit flag needn't be serialized.

This patch prevents the pathologic situation when the service with Restart=
won't restart automatically. As a side effect it also avoid restarting the
dependency unit with StopWhenUnneeded=yes.

Fixes: #7377
2017-12-05 16:51:19 +01:00
Lennart Poettering 2e59b241ca core: add proper escaping to writing of drop-ins/transient unit files
This majorly refactors the transient unit file and drop-in writing
logic, so that we properly C-escape and specifier-escape (% → %%)
everything we write out, so that when we read it back again, specifiers
are parsed that aren't supposed to be parsed.

This renames unit_write_drop_in() and friends by unit_write_setting().
The name change is supposed to clarify that the functions are not only
used to write drop-in files, but also transient unit files.

The previous "mode" parameter to this function is replaced by a more
generic "flags", which knows additional flags for implicit C-style and
specifier escaping before writing things out. This can cover most
properties where either form of escaping is defined. For the cases where
this isn't sufficient, we add helpers unit_escape_setting() and
unit_concat_strv() for escaping individual strings or strvs properly.

While we are at it, we also prettify generation of transient unit files:
we try to reduce the number of section headers written out: previously
we'd write the right section header our for each setting. With this
change we do so only if the setting lives in a different section than
the one before.

(This should also be considered preparation for when we add proper APIs
to systemd to write normal, persistant unit files through the bus API)
2017-11-29 12:34:12 +01:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Lennart Poettering 3c7416b6ca core: unify common code for preparing for forking off unit processes
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.

This allows us to neatly unify a bit of code between unit types and
shorten our code.
2017-11-21 11:54:08 +01:00
Lennart Poettering e7dfbb4e74 core: introduce SuccessAction= as unit file property
SuccessAction= is similar to FailureAction= but declares what to do on
success of a unit, rather than on failure. This is useful for running
commands in qemu/nspawn images, that shall power down on completion. We
frequently see "ExecStopPost=/usr/bin/systemctl poweroff" or so in unit
files like this. Offer a simple, more declarative alternative for this.

While we are at it, hook up failure action with unit_dump() and
transient units too.
2017-11-20 16:37:22 +01:00
Lennart Poettering 53c35a766f core: generalize FailureAction= move it from service to unit
All kinds of units can fail, hence it makes sense to offer this as
generic concept for all unit types.
2017-11-20 16:37:22 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering 5afe510c89 core: add a new unit file setting CollectMode= for tweaking the GC logic
Right now, the option only takes one of two possible values "inactive"
or "inactive-or-failed", the former being the default, and exposing same
behaviour as the status quo ante. If set to "inactive-or-failed" units
may be collected by the GC logic when in the "failed" state too.

This logic should be a nicer alternative to using the "-" modifier for
ExecStart= and friends, as the exit data is collected and logged about
and only removed when the GC comes along. This should be useful in
particular for per-connection socket-activated services, as well as
"systemd-run" command lines that shall leave no artifacts in the
system.

I was thinking about whether to expose this as a boolean, but opted for
an enum instead, as I have the suspicion other tweaks like this might be
a added later on, in which case we extend this setting instead of having
to add yet another one.

Also, let's add some documentation for the GC logic.
2017-11-16 14:38:36 +01:00
Lennart Poettering 7eb2a8a125 unit: rework a bit how we keep the service fdstore from being destroyed during service restart
When preparing for a restart we quickly go through the DEAD/INACTIVE
service state before entering AUTO_RESTART. When doing this, we need to
make sure we don't destroy the FD store. Previously this was done by
checking the failure state of the unit, and keeping the FD store around
when the unit failed, under the assumption that the restart logic will
then get into action.

This is not entirely correct howver, as there might be failure states
that will no result in restarts.

With this commit we slightly alter the logic: a ref counter for the fd
store is added, that is increased right before we handle the restart
logic, and decreased again right-after.

This should ensure that the fdstore lives exactly as long as it needs.

Follow-up for f0bfbfac43.
2017-11-16 14:37:33 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Lennart Poettering c999cf385a core: add internal API to remove dependencies again, based on dependency mask
let's make use of the dependency mask, and add internal API to remove
dependencies ago, based on bits in the dependency mask.
2017-11-10 19:45:29 +01:00
Lennart Poettering eef85c4a3f core: track why unit dependencies came to be
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.

The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.

Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.

Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:

1. We can fix UDEV_WANTS dependency generation: so far we kept adding
   dependencies configured that way, but if a device lost such a
   dependency we couldn't them again as there was no scheme for removing
   of dependencies in place.

2. We can implement "pin-pointed" reload of unit files. If we know what
   dependencies were created as result of configuration in a unit file,
   then we know what to flush out when we want to reload it.

3. It's useful for debugging: "systemd-analyze dump" now shows
   this information, helping substantially with understanding how
   systemd's dependency tree came to be the way it came to be.
2017-11-10 19:45:29 +01:00
Lennart Poettering eae51da36e unit: when JobTimeoutSec= is turned off, implicitly turn off JobRunningTimeoutSec= too
We added JobRunningTimeoutSec= late, and Dracut configured only
JobTimeoutSec= to turn of root device timeouts before. With this change
we'll propagate a reset of JobTimeoutSec= into JobRunningTimeoutSec=,
but only if the latter wasn't set explicitly.

This should restore compatibility with older systemd versions.

Fixes: #6402
2017-10-05 13:06:44 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Lennart Poettering 09e2465407 cgroup: after determining that a cgroup is empty, asynchronously dispatch this
This makes sure that if we learn via inotify or another event source
that a cgroup is empty, and we checked that this is indeed the case (as
we might get spurious notifications through inotify, as the inotify
logic through the "cgroups.event" is pretty unspecific and might be
trigger for a variety of reasons), then we'll enqueue a defer event for
it, at a priority lower than SIGCHLD handling, so that we know for sure
that if there's waitid() data for a process we used it before
considering the cgroup empty notification.

Fixes: #6608
2017-09-27 18:26:18 +02:00
Lennart Poettering 91a6073ef7 core: rename cgroup_queue → cgroup_realize_queue
We are about to add second cgroup-related queue, called
"cgroup_empty_queue", hence let's rename "cgroup_queue" to
"cgroup_realize_queue" (as that is its purpose) to minimize confusion
about the two queues.

Just a rename, no functional changes.
2017-09-27 17:59:25 +02:00
Lennart Poettering 6d330fef4d unit: remove unused fields from Unit structure 2017-09-27 17:59:25 +02:00
Lennart Poettering f1c50becda core: make sure to log invocation ID of units also when doing structured logging 2017-09-22 15:24:55 +02:00
Lennart Poettering 6b659ed87e core: serialize/deserialize IP accounting across daemon reload/reexec
Make sure the current IP accounting counters aren't lost during
reload/reexec.

Note that we destroy all BPF file objects during a reload: the BPF
programs, the access and the accounting maps. The former two need to be
regenerated anyway with the newly loaded configuration data, but the
latter one needs to survive reloads/reexec. In this implementation I
opted to only save/restore the accounting map content instead of the map
itself. While this opens a (theoretic) window where IP traffic is still
accounted to the old map after we read it out, and we thus miss a few
bytes this has the benefit that we can alter the map layout between
versions should the need arise.
2017-09-22 15:24:55 +02:00
Lennart Poettering a79279c7fd core: when creating the socket fds for a socket unit, join socket's cgroup first
Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny=
settings are in effect on all socket fds associated with it. In order to
make this happen we need to make sure the cgroup the fds are associated
with are the socket unit's cgroup. The only way to do that is invoking
socket()+accept() in them. Since we really don't want to migrate PID 1
around we do this by forking off a helper process, which invokes
socket()/accept() and sends the newly created fd to PID 1. Ugly, but
works, and there's apparently no better way right now.

This generalizes forking off per-unit helper processes in a new function
unit_fork_helper_process(), which is then also used by the NSS chown()
code of socket units.
2017-09-22 15:24:55 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Daniel Mack 6a48d82f02 cgroup: add fields to accommodate eBPF related details
Add pointers for compiled eBPF programs as well as list heads for allowed
and denied hosts for both directions.
2017-09-22 15:24:54 +02:00
Lennart Poettering f0d477979e core: introduce unit_set_exec_params()
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
2017-08-10 15:02:50 +02:00
Zbigniew Jędrzejewski-Szmek 0742986650 core: properly handle deserialization of unknown unit types (#6476)
We just abort startup, without printing any error. Make sure we always
print something, and when we cannot deserialize some unit, just ignore it and
continue.

Fixup for 4bc5d27b94. Without this, we would hang
in daemon-reexec after upgrade.
2017-07-31 08:05:35 +02:00
Zbigniew Jędrzejewski-Szmek 4bc5d27b94 Drop busname unit type
Since busname units are only useful with kdbus, they weren't actively
used. This was dead code, only compile-tested. If busname units are
ever added back, it'll be cleaner to start from scratch (possibly reverting
parts of this patch).
2017-07-23 09:29:02 -04:00
Michal Koutný a2df3ea4ae job: add JobRunningTimeoutSec for JOB_RUNNING state
Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The
introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts
running (e.g. it groups all Exec* commands of a service or spans waiting for a
device period.)

Unit.JobRunningTimeoutSec is intended to be used by default instead of
Unit.JobTimeoutSec for device units where such behavior causes less confusion
(consider a job for a _netdev mount device, with this change the timeout will
start ticking only after the network is ready).
2017-04-25 18:00:29 +02:00
Lennart Poettering 2e6dbc0fcd Merge pull request #4538 from fbuihuu/confirm-spawn-fixes
Confirm spawn fixes/enhancements
2016-11-18 11:08:06 +01:00
Franck Bui c891efaf8a core: confirm_spawn: always accept units with same_pgrp set for now
For some reasons units remaining in the same process group as PID 1
(same_pgrp=true) fail to acquire the console even if it's not taken by anyone.

So always accept for units with same_pgrp set for now.
2016-11-17 18:16:51 +01:00
Lennart Poettering c5a97ed132 core: GC redundant device jobs from the run queue
In contrast to all other unit types device units when queued just track
external state, they cannot effect state changes on their own. Hence unless a
client or other job waits for them there's no reason to keep them in the job
queue. This adds a concept of GC'ing jobs of this type as soon as no client or
other job waits for them anymore.

To ensure this works correctly we need to track which clients actually
reference a job (i.e. which ones enqueued it). Unfortunately that's pretty
nasty to do for direct connections, as sd_bus_track doesn't work for
them. For now, work around this, by simply remembering in a boolean that a job
was requested by a direct connection, and reset it when we notice the direct
connection is gone. This means the GC logic works fine, except that jobs are
not immediately removed when direct connections disconnect.

In the longer term, a rework of the bus logic should fix this properly. For now
this should be good enough, as GC works for fine all cases except this one, and
thus is a clear improvement over the previous behaviour.

Fixes: #1921
2016-11-16 15:03:26 +01:00
Zbigniew Jędrzejewski-Szmek 7fa6328cc4 Merge pull request #4481 from poettering/perpetual
Add "perpetual" unit concept, sysctl fixes, networkd fixes, systemctl color fixes, nspawn discard.
2016-11-02 21:03:26 -04:00
Lennart Poettering a581e45ae8 unit: unify some code with new unit_new_for_name() call 2016-11-02 11:29:59 -06:00
Lennart Poettering f5869324e3 core: rework the "no_gc" unit flag to become a more generic "perpetual" flag
So far "no_gc" was set on -.slice and init.scope, to units that are always
running, cannot be stopped and never exist in an "inactive" state. Since these
units are the only users of this flag, let's remodel it and rename it
"perpetual" and let's derive more funcitonality off it. Specifically, refuse
enqueing stop jobs for these units, and report that they are "unstoppable" in
the CanStop bus property.
2016-11-02 11:29:59 -06:00
Zbigniew Jędrzejewski-Szmek f0bfbfac43 core: when restarting services, don't close fds
We would close all the stored fds in service_release_resources(), which of
course broke the whole concept of storing fds over service restart.

Fixes #4408.
2016-11-01 21:20:21 -04:00
Lukas Nykryn 87a47f99bc failure-action: generalize failure action to emergency action 2016-10-21 15:13:50 +02:00
Lennart Poettering 4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
Evgeny Vereshchagin 6afe14ff5b Merge pull request #3984 from poettering/refcnt
permit bus clients to pin units to avoid automatic GC
2016-08-26 16:17:05 +03:00
Felipe Sateler 8dec4a9d2d core,network: Use const qualifiers for block-local variables in macro functions (#4019)
Prevents discard-qualifiers warnings when the passed variable was const
2016-08-23 12:29:30 +03:00
Lennart Poettering fe700f46ec core: cache last CPU usage counter, before destorying a cgroup
It is useful for clients to be able to read the last CPU usage counter value of
a unit even if the unit is already terminated. Hence, before destroying a
cgroup's cgroup cache the last CPU usage counter and return it if the cgroup is
gone.
2016-08-22 16:14:21 +02:00
Lennart Poettering 05a98afd3e core: add Ref()/Unref() bus calls for units
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface.
The two calls may be used by clients to pin a unit into memory, so that various
runtime properties aren't flushed out by the automatic GC. This is necessary
to permit clients to race-freely acquire runtime results (such as process exit
status/code or accumulated CPU time) on successful service termination.

Ref() and Unref() are fully recursive, hence act like the usual reference
counting concept in C. Taking a reference is a privileged operation, as this
allows pinning units into memory which consumes resources.

Transient units may also gain a reference at the time of creation, via the new
AddRef property (that is only defined for transient units at the time of
creation).
2016-08-22 16:14:21 +02:00
Lennart Poettering 00d9ef8560 core: add RemoveIPC= setting
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e.  all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.

This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.

In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
2016-08-19 00:37:25 +02:00
Lennart Poettering b4c990e91b unit: remove orphaned cgroup_netclass_id field 2016-08-18 22:49:48 +02:00
Tejun Heo 66ebf6c0a1 core: add cgroup CPU controller support on the unified hierarchy
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches.  The situation is explained in
the following documentation.

 https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu

While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads.  This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.

On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us".  [Startup]CPUWeight config
options are added with the usual compat translation.  CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.

v2: - Error in man page corrected.
    - CPU config application in cgroup_context_apply() refactored.
    - CPU accounting now works on unified hierarchy.
2016-08-07 09:45:39 -04:00
Lennart Poettering 29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00
Lennart Poettering 1d98fef17d core: when forcibly killing/aborting left-over unit processes log about it
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.

This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.

Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
2016-07-20 14:35:15 +02:00
Kyle Walker 36f20ae3b2 manager: Only invoke a single sigchld per unit within a cleanup cycle
By default, each iteration of manager_dispatch_sigchld() results in a unit level
sigchld event being invoked. For scope units, this results in a scope_sigchld_event()
which can seemingly stall for workloads that have a large number of PIDs within the
scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry
as a result of pid_is_unwaited().

v2:
This patch resolves this condition by only paying to cost of a sigchld in the underlying
scope unit once per sigchld iteration. A new "sigchldgen" member resides within the
Unit struct. The Manager is incremented via the sd event loop, accessed via
sd_event_get_iteration, and the Unit member is set to the same value as the manager each
time that a sigchld event is invoked. If the Manager iteration value and Unit member
match, the sigchld event is not invoked for that iteration.
2016-06-30 15:16:47 -04:00
Zbigniew Jędrzejewski-Szmek 74ad38ff0e Merge pull request #3160 from htejun/cgroup-fixes-rev2
Cgroup fixes.
2016-05-07 15:08:57 -04:00
Lennart Poettering 1ed7ebcfca Merge pull request #3170 from poettering/v230-preparation-fixes
make virtualization detection quieter, rework unit start limit logic, detect unit file drop-in changes correctly, fix autofs state propagation
2016-05-04 10:46:13 +02:00
Lennart Poettering 072993504e core: move enforcement of the start limit into per-unit-type code again
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.

The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.

Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.

Note that restores the "start-limit-hit" result code that existed before
6bf0f408e4 already in the service code. However,
it's not introduced for all units that have a result code concept.

Fixes #3166.
2016-05-02 13:08:00 +02:00
Zbigniew Jędrzejewski-Szmek ce99c68a33 Move no_instances information to shared/
This way it can be used in install.c in subsequent commit.
2016-05-01 19:58:59 -04:00
Zbigniew Jędrzejewski-Szmek 8a993b61d1 Move no_alias information to shared/
This way it can be used in install.c in subsequent commit.
2016-05-01 19:40:51 -04:00
Tejun Heo ccf78df1fc core: make unit_has_mask_realized() consider controller enable state
unit_has_mask_realized() determines whether the specified unit has its cgroups
set up properly given the desired target_mask; however, on the unified
hierarchy, controllers need to be enabled explicitly for children and the mask
of enabled controllers can deviate from target_mask.  Only considering
target_mask in unit_has_mask_realized() can lead to false positives and
skipping enabling the requested controllers.

This patch adds unit->cgroup_enabled_mask to track which controllers are
enabled and updates unit_has_mask_realized() to also consider enable_mask.

Signed-off-by: Tejun Heo <htejun@fb.com>
2016-04-30 16:12:54 -04:00
Lennart Poettering 291d565a04 core,systemctl: add bus API to retrieve processes of a unit
This adds a new GetProcesses() bus call to the Unit object which returns an
array consisting of all PIDs, their process names, as well as their full cgroup
paths. This is then used by "systemctl status" to show the per-unit process
tree.

This has the benefit that the client-side no longer needs to access the
cgroupfs directly to show the process tree of a unit. Instead, it now uses this
new API, which means it also works if -H or -M are used correctly, as the
information from the specific host is used, and not the one from the local
system.

Fixes: #2945
2016-04-22 16:06:20 +02:00
Lennart Poettering 4f4afc88ec core: rework how transient unit files and property drop-ins work
With this change the logic for placing transient unit files and drop-ins
generated via "systemctl set-property" is reworked.

The latter are now placed in the newly introduced "control" unit file
directory. The fomer are now placed in the "transient" unit file directory.

Note that the properties originally set when a transient unit was created will
be written to and stay in the transient unit file directory, while later
changes are done via drop-ins.

This is preparation for a later "systemctl revert" addition, where existing
drop-ins are flushed out, but the original transient definition is restored.
2016-04-12 13:43:32 +02:00
Martin Pitt 16a798deb3 Merge pull request #2569 from zonque/removals
Remove some old cruft
2016-02-10 14:01:46 +01:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Lennart Poettering 6bf0f408e4 core: make the StartLimitXYZ= settings generic and apply to any kind of unit, not just services
This moves the StartLimitBurst=, StartLimitInterval=, StartLimitAction=, RebootArgument= from the [Service] section
into the [Unit] section of unit files, and thus support it in all unit types, not just in services.

This way we can enforce the start limit much earlier, in particular before testing the unit conditions, so that
repeated start-up failure due to failed conditions is also considered for the start limit logic.

For compatibility the four options may also be configured in the [Service] section still, but we only document them in
their new section [Unit].

This also renamed the socket unit failure code "service-failed-permanent" into "service-start-limit-hit" to express
more clearly what it is about, after all it's only triggered through the start limit being hit.

Finally, the code in busname_trigger_notify() and socket_trigger_notify() is altered to become more alike.

Fixes: #2467
2016-02-10 13:26:56 +01:00
Lennart Poettering 7a7821c878 core: rework job_get_timeout() to use usec_t and handle USEC_INFINITY time events correctly 2016-02-04 00:35:43 +01:00
Lennart Poettering a483fb59a8 core: store for each unit when the last low-level unit state change took place
This adds a new timestamp field to the Unit struct, storing when the last low-level state change took place, and make
sure this is restored after a daemon reload. This new field is useful to allow restarting of per-state timers exactly
where they originally started.
2016-02-01 22:18:16 +01:00
Harald Hoyer 9d06297e26 core: Do not bind a mount unit to a device, if it was from mountinfo
If a mount unit is bound to a device, systemd tries to umount the
mount point, if it thinks the device has gone away.

Due to the uevent queue and inotify of /proc/self/mountinfo being two
different sources, systemd can never get the ordering reliably correct.

It can happen, that in the uevent queue ADD,REMOVE,ADD is queued
and an inotify of mountinfo (or libmount event) happend with the
device in question.

systemd cannot know, at which point of time the mount happend in the
ADD,REMOVE,ADD sequence.

The real ordering might have been ADD,REMOVE,ADD,mount
and systemd might think ADD,mount,REMOVE,ADD and would umount the
mountpoint.

A test script which triggered this behaviour is:
rm -f test-efi-disk.img
dd if=/dev/null of=test-efi-disk.img bs=1M seek=512 count=1
parted --script test-efi-disk.img \
  "mklabel gpt" \
  "mkpart ESP fat32 1MiB 511MiB" \
  "set 1 boot on"
LOOP=$(losetup --show -f -P test-efi-disk.img)
udevadm settle
mkfs.vfat -F32 ${LOOP}p1
mkdir -p mnt
mount ${LOOP}p1 mnt
... <dostuffwith mnt>

Without the "udevadm settle" systemd unmounted mnt while the script was
operating on mnt.

Of course the question is, why there was a REMOVE in the first place,
but this is not part of this patch.
2015-11-24 14:08:50 +01:00
Thomas Hindoe Paaboel Andersen 71d35b6b55 tree-wide: sort includes in *.h
This is a continuation of the previous include sort patch, which
only sorted for .c files.
2015-11-18 23:09:02 +01:00
Lennart Poettering 0f13f3bd79 core: move check whether a unit is suitable to become transient into unit.c
Lets introduce unit_is_pristine() that verifies whether a unit is
suitable to become a transient unit, by checking that it is no
referenced yet and has no data on disk assigned.
2015-11-17 17:32:49 +01:00
Lennart Poettering 702d4e6f14 core: now that .snapshot unit are gone, we don't need the per-type .no_gc bool anymore 2015-11-13 19:50:52 +01:00
Tom Gundersen 7042fc14ff Merge pull request #1837 from poettering/grabbag2
variety of fixes
2015-11-11 02:31:29 +01:00
Zbigniew Jędrzejewski-Szmek 36b4a7ba55 Remove snapshot unit type
Snapshots were never useful or used for anything. Many systemd
developers that I spoke to at systemd.conf2015, didn't even know they
existed, so it is fairly safe to assume that this type can be deleted
without harm.

The fundamental problem with snapshots is that the state of the system
is dynamic, devices come and go, users log in and out, timers fire...
and restoring all units to some state from the past would "undo"
those changes, which isn't really possible.

Tested by creating a snapshot, running the new binary, and checking
that the transition did not cause errors, and the snapshot is gone,
and snapshots cannot be created anymore.

New systemctl says:
Unknown operation snapshot.
Old systemctl says:
Failed to create snapshot: Support for snapshots has been removed.

IgnoreOnSnaphost settings are warned about and ignored:
Support for option IgnoreOnSnapshot= has been removed and it is ignored

http://lists.freedesktop.org/archives/systemd-devel/2015-November/034872.html
2015-11-10 19:33:06 -05:00
Lennart Poettering 9ff1a6f1d6 core: change type of distribute_fds() prototype to return void
We can't handle errors of thisc all sanely anyway, and we never actually
return any errors from the unit type that implements the call.  Hence,
let's make this void, in order to simplify things.
2015-11-10 21:03:49 +01:00
Lennart Poettering ba64af90ec core: change return value of the unit's enumerate() call to void
We cannot handle enumeration failures in a sensible way, hence let's try
hard to continue without making such failures fatal, and log about it
with precise error messages.
2015-11-10 21:03:49 +01:00
Thomas Hindoe Paaboel Andersen b250ea2fd6 tree-wide: remove unused functions 2015-10-19 21:46:01 +02:00
Lennart Poettering 9806e87da2 unit: allocate bus name match string on the stack
Let's use strjoina() rather than strjoin() for construct dbus match
strings.

Also, while we are at it, fix parameter ordering, so that our functions
always put the object first, like it is customary for OO-like
programming.
2015-10-17 16:48:21 +02:00
Lennart Poettering a34ceba66f core: add support for setting stdin/stdout/stderr for transient services
When starting a transient service, allow setting stdin/stdout/stderr fds
for it, by passing them in via the bus.

This also simplifies some of the serialization code for units.
2015-10-08 12:55:15 +02:00
Zbigniew Jędrzejewski-Szmek 978c8b6347 Move UnitActiveState to basic/
Preparation to allow systemctl to query the list of unit states.
2015-09-28 15:09:34 -04:00
Daniel Mack 32ee7d3309 cgroup: add support for net_cls controllers
Add a new config directive called NetClass= to CGroup enabled units.
Allowed values are positive numbers for fix assignments and "auto" for
picking a free value automatically, for which we need to keep track of
dynamically assigned net class IDs of units. Introduce a hash table for
this, and also record the last ID that was given out, so the allocator
can start its search for the next 'hole' from there. This could
eventually be optimized with something like an irb.

The class IDs up to 65536 are considered reserved and won't be
assigned automatically by systemd. This barrier can be made a config
directive in the future.

Values set in unit files are stored in the CGroupContext of the
unit and considered read-only. The actually assigned number (which
may have been chosen dynamically) is stored in the unit itself and
is guaranteed to remain stable as long as the unit is active.

In the CGroup controller, set the configured CGroup net class to
net_cls.classid. Multiple unit may share the same net class ID,
and those which do are linked together.
2015-09-16 00:21:55 +02:00
Lennart Poettering efdb02375b core: unified cgroup hierarchy support
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.

A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).

It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.

The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.

This patch also removes cg_delete() which is unused now.

On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.

This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.

This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.

The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.

To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.

This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.

When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
2015-09-01 23:52:27 +02:00
Lennart Poettering d79200e26e unit: unify how we assing slices to units
This adds a new call unit_set_slice(), and simplifies
unit_add_default_slice(). THis should make our code a bit more robust
and simpler.
2015-08-31 13:20:43 +02:00
Lennart Poettering 35b7ff80e2 unit: add new macros to test for unit contexts 2015-08-31 13:20:43 +02:00
Lennart Poettering 21b735e798 core: add unit_dbus_interface_from_type() to unit-name.h
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.

Also ports over machinectl to make use of this.
2015-08-28 02:10:10 +02:00
Daniel Mack bbc2908635 core: dbus: track bus names per unit
Currently, PID1 installs an unfiltered NameOwnerChanged signal match, and
dispatches the signals itself. This does not scale, as right now, PID1
wakes up every time a bus client connects.

To fix this, install individual matches once they are requested by
unit_watch_bus_name(), and remove the watches again through their slot in
unit_unwatch_bus_name().

If the bus is not available during unit_watch_bus_name(), just store
name in the 'watch_bus' hashmap, and let bus_setup_api() do the installing
later.
2015-08-06 10:14:41 +02:00
Michal Schmidt d1a34ae9c2 core: fix confusing logging of instantaneous jobs
For instantaneous jobs (e.g. starting of targets, sockets, slices, or
Type=simple services) the log shows the job completion
before starting:

        systemd[1]: Created slice -.slice.
        systemd[1]: Starting -.slice.
        systemd[1]: Created slice System Slice.
        systemd[1]: Starting System Slice.
        systemd[1]: Listening on Journal Audit Socket.
        systemd[1]: Starting Journal Audit Socket.
        systemd[1]: Reached target Timers.
        systemd[1]: Starting Timers.
        ...

The reason is that the job completes before the ->start() method returns
and only then does unit_start() print the "Starting ..." message.
The same thing happens when stopping units.

Rather than fixing the order of the messages, let's just not emit the
Starting/Stopping message at all when the job completes instantaneously.
The job completion message is sufficient in this case.
2015-07-21 15:09:12 +02:00
Lennart Poettering ed10fa8ce2 unit: drop support for pre-v44 job serialization
No distro ships that old systemd versions anyway, hence let's drop
support for live-upgrades for them. Offline updates are still supported.
And live-upgrades will only lose the job queue, hence basically still
work...
2015-05-19 16:41:14 +02:00
Lennart Poettering 67bfdc9771 core: also enforce ratelimiter if we stop a unit due to BindsTo=
This extends on bea355dac9, and extends
the ratelimiter to not only be used for StopWhenUnneeded=1 units but
also for units that have BindsTo= on a unit that is dead.

http://lists.freedesktop.org/archives/systemd-devel/2015-April/030224.html
2015-05-19 16:23:14 +02:00
Lennart Poettering f8a30ce524 core: use bitfield where possible 2015-05-19 16:03:01 +02:00
Lennart Poettering bea355dac9 core: enforce a ratelimiter when stopping units due to StopWhenUnneeded=1
Otherwise we might end up in an endless stop loop.

http://lists.freedesktop.org/archives/systemd-devel/2015-April/030224.html
2015-05-19 16:00:24 +02:00
Lennart Poettering 8b4305c735 unit: move unit_warn_if_dir_nonempty() and friend to unit.c
The call is only used by the mount and automount unit types, but that's
already enough to consider it generic unit functionality, hence move it
out of mount.c and into unit.c.
2015-05-11 22:28:52 +02:00
Lennart Poettering f2341e0a87 core,network: major per-object logging rework
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.

Also contains a couple of other logging improvements:

- Drops a couple of strerror() invocations in favour of using %m.

- Not only .mount units now warn if a symlinks exist for the mount
  point already, .automount units do that too, now.

- A few invocations of log_struct() that didn't actually pass any
  additional structured data have been replaced by simpler invocations
  of log_unit_info() and friends.

- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
  that works like LOG_MESSAGE() but prefixes the message with the unit
  name. Similar, there's now LOG_LINK_MESSAGE() and
  LOG_NETDEV_MESSAGE().

- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
  LOG_NETDEV_INTERFACE() macros have been added that generate the
  necessary per object fields. The old log_unit_struct() call has been
  removed in favour of these new macros used in raw log_struct()
  invocations. In addition to removing one more function call this
  allows generated structured log messages that contain two object
  fields, as necessary for example for network interfaces that are
  joined into another network interface, and whose messages shall be
  indexed by both.

- The LOG_ERRNO() macro has been removed, in favour of
  log_struct_errno(). The latter has the benefit of ensuring that %m in
  format strings is properly resolved to the specified error number.

- A number of logging messages have been converted to use
  log_unit_info() instead of log_info()

- The client code in sysv-generator no longer #includes core code from
  src/core/.

- log_unit_full_errno() has been removed, log_unit_full() instead takes
  an errno now, too.

- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
  avoid double evaluation of their parameters
2015-05-11 22:24:45 +02:00
Lennart Poettering 1c2e9646e4 core: simplify unit type detection logic
Introduce a new call unit_type_supported() and make use of it
everywhere.

Also, drop Manager parameter from per-type supported method prototype.
2015-04-30 01:29:00 +02:00
Lennart Poettering f78f265f40 core: always coldplug units that are triggered by other units before those
Let's make sure that we don't enqueue triggering jobs for units before
those units are actually fully loaded.

http://lists.freedesktop.org/archives/systemd-devel/2015-April/031176.html
https://bugs.freedesktop.org/show_bug.cgi?id=88401
2015-04-24 16:14:46 +02:00
Lennart Poettering be847e82cf Revert "core: do not spawn jobs or touch other units during coldplugging"
This reverts commit 6e392c9c45.

We really shouldn't invent external state keeping hashmaps, if we can
keep this state in the units themselves.
2015-04-24 15:51:10 +02:00
Lennart Poettering 4940c0b0b6 service: make kill operation mapping explicit 2015-04-21 02:17:01 +02:00
Ivan Shapovalov 6e392c9c45 core: do not spawn jobs or touch other units during coldplugging
Because the order of coldplugging is not defined, we can reference a
not-yet-coldplugged unit and read its state while it has not yet been
set to a meaningful value.

This way, already active units may get started again.

We fix this by deferring such actions until all units have been at
least somehow coldplugged.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401
2015-03-07 08:44:57 -05:00
Lennart Poettering 5ad096b3f1 core: expose consumed CPU time per unit
This adds support for showing the accumulated consumed CPU time per-unit
in the "systemctl status" output. The property is also readable via the
bus.
2015-03-02 12:15:25 +01:00
Thomas Hindoe Paaboel Andersen 2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Thomas Hindoe Paaboel Andersen c1ff5570f4 Add missing includes in header files
This fixes various issues found by globally reordering the include
sections of all .c files.
2015-02-12 20:44:32 +01:00
Lennart Poettering a354329f72 core: add new logic for services to store file descriptors in PID 1
With this change it is possible to send file descriptors to PID 1, via
sd_pid_notify_with_fds() which PID 1 will store individually for each
service, and pass via the usual fd passing logic on next invocation.
This is useful for enable daemon reload schemes where daemons serialize
their state to /run, push their fds into PID 1 and terminate, restoring
their state on next start from the data in /run and passed in from PID
1.

The fds are kept by PID 1 as long as no POLLHUP or POLLERR is seen on
them, and the service they belong to are either not dead or failed, or
have a job queued.
2015-01-06 03:16:39 +01:00
Lennart Poettering 0faacd470d unit: handle nicely of certain unit types are not supported on specific systems
Containers do not really support .device, .automount or .swap units;
Systems compiled without support for swap do not support .swap units;
Systems without kdbus do not support .busname units.

With this change attempts to start a unsupported unit types will result
in an immediate "unsupported" job result, which is a lot more
descriptive then before. Also, attempts to start device units in
containers will now immediately fail instead of causing jobs to be
enqueued that never go away.
2014-12-15 19:02:17 +01:00
Torstein Husebø ee33e53a70 core: correct spacing near eol in code comments 2014-12-11 15:09:51 +01:00
Lennart Poettering d2dc52dbc4 systemctl: show unit file preset state in "systemctl status" output" 2014-12-02 13:23:04 +01:00
Michal Schmidt b2dc4e44c5 core: add log_unit_*_errno() macros 2014-11-28 13:29:21 +01:00
Lennart Poettering e2cc6eca73 log: fix order of log_unit_struct() to match other logging calls
Also, while we are at it, introduce some syntactic sugar for creating
ERRNO= and MESSAGE= structured logging fields.
2014-11-28 02:18:46 +01:00
Lennart Poettering 79008bddf6 log: rearrange log function naming
- Rename log_meta() → log_internal(), to follow naming scheme of most
  other log functions that are usually invoked through macros, but never
  directly.

- Rename log_info_object() to log_object_info(), simply because the
  object should be before any other parameters, to follow OO-style
  programming style.
2014-11-27 22:05:24 +01:00
Lennart Poettering 086891e5c1 log: add an "error" parameter to all low-level logging calls and intrdouce log_error_errno() as log calls that take error numbers
This change has two benefits:

- The format string %m will now resolve to the specified error (or to
  errno if the specified error is 0. This allows getting rid of a ton of
  strerror() invocations, a function that is not thread-safe.

- The specified error can be passed to the journal in the ERRNO= field.

Now of course, we just need somebody to convert all cases of this:

        log_error("Something happened: %s", strerror(-r));

into thus:

        log_error_errno(-r, "Something happened: %m");
2014-11-27 22:05:23 +01:00
Lennart Poettering 134e56dcc5 shared: rename condition-util.[ch] to condition.[ch]
Now that we only have one file with condition implementations around, we
can drop the -util suffix and simplify things a bit.
2014-11-06 14:21:11 +01:00
Lennart Poettering 493657337a core: get rid of condition.c and move the remaining call into util.c
That way only one file with condition code remaining, in src/shared/,
rather than src/core/.

Next step: dropping the "-util" suffix from condition-util.[ch].
2014-11-06 14:21:11 +01:00
Lennart Poettering 59fccdc587 core: introduce the concept of AssertXYZ= similar to ConditionXYZ=, but fatal for a start job if not met 2014-11-06 14:21:11 +01:00
Umut Tezduyar Lindskog db2cb23b5b core: send sigabrt on watchdog timeout to get the stacktrace
if sigabrt doesn't do the job, follow regular shutdown
routine, sigterm > sigkill.
2014-10-28 17:37:39 +01:00
Lennart Poettering f189ab18de job: optionally, when a job timeout is hit, also execute a failure action 2014-10-28 02:19:55 +01:00
Zbigniew Jędrzejewski-Szmek 7c52a17b1a Rearrange Unit to make pahole happy
After all we have lots of those.
2014-10-25 15:34:48 -04:00
Lukas Nykryn cb87a73b45 unit: move UnitDependency to unit-name 2014-10-08 12:44:00 +02:00
Lennart Poettering 598459ceba core: rework context initialization/destruction logic
Let's automatically initialize the kill, exec and cgroup contexts of the
various unit types when the object is constructed, instead of
invididually in type-specific code.

Also, when PrivateDevices= is set, set DevicePolicy= to closed.
2014-03-19 21:06:53 +01:00
Lennart Poettering 085afe36cb core: add global settings for enabling CPUAccounting=, MemoryAccounting=, BlockIOAccounting= for all units at once 2014-02-24 23:50:10 +01:00
Lennart Poettering bc432dc7eb core: rework cgroup mask propagation
Previously a cgroup setting down tree would result in cgroup membership
additions being propagated up the tree and to the siblings, however a
unit could never lose cgroup memberships again. With this change we'll
make sure that both cgroup additions and removals propagate properly.
2014-02-17 15:49:21 +01:00
Lennart Poettering a911bb9ab2 core: watch SIGCHLD more closely to track processes of units with no reliable cgroup empty notifier
When a process dies that we can associate with a specific unit, start
watching all other processes of that unit, so that we can associate
those processes with the unit too.

Also, for service units start doing this as soon as we get the first
SIGCHLD for either control or main process, so that we can follow the
processes of the service from one to the other, as long as process that
remain are processes of the ones we watched that died and got reassigned
to us as parent.

Similar, for scope units start doing this as soon as the scope
controller abandons the unit, and thus management entirely reverts to
systemd. To abandon a unit introduce a new Abandon() scope unit method
call.
2014-02-07 15:14:36 +01:00
Zbigniew Jędrzejewski-Szmek 68db7a3bd9 core: add function to tell when job will time out
Things will continue when either the job timeout
or the unit timeout is reached. Add functionality to
access that info.
2014-01-27 01:23:16 -05:00
Lennart Poettering aec8de63b1 core: no need to list properties for PropertiesChanged messages anymore
Since the vtable includes this information anyway, let's just use that
2013-12-22 03:50:52 +01:00
Lennart Poettering e821075a23 bus: add .busname unit type to implement kdbus-style bus activation 2013-12-02 23:32:34 +01:00
Lennart Poettering 613b411c94 service: add the ability for units to join other unit's PrivateNetwork= and PrivateTmp= namespaces 2013-11-27 20:28:48 +01:00
Lennart Poettering d420282b28 core: replace OnFailureIsolate= setting by a more generic OnFailureJobMode= setting and make use of it where applicable 2013-11-26 02:26:31 +01:00
Lennart Poettering eeaedb7c26 core: include following set data in dump 2013-11-25 22:10:22 +01:00
David Strauss 6414b7c981 cgroups: Cache controller masks and optimize queues. 2013-11-22 11:22:47 +10:00
Lennart Poettering 718db96199 core: convert PID 1 to libsystemd-bus
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.

This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:

- Synthesizing of "Disconnected" messages when bus connections are
  severed.

- Support for attaching multiple vtables for the same interface on the
  same path.

This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.

As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
2013-11-20 20:52:36 +01:00
Lennart Poettering 9588bc3209 Remove dead code and unexport some calls
"make check-api-unused" informs us about code that is not used anymore
or that is exported but only used internally. Fix these all over the
place.
2013-11-08 18:12:45 +01:00
Lennart Poettering 44b601bc79 macro: clean up usage of gcc attributes
Always use our own macros, and name all our own macros the same style.
2013-10-16 06:14:59 +02:00
Lennart Poettering a57f7e2c82 core: rework how we match mount units against each other
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.

This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.

This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.

With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.

https://bugs.freedesktop.org/show_bug.cgi?id=69740
2013-09-26 20:20:30 +02:00
Lennart Poettering b9ec935936 core: simplify drop-in writing logic a bit
let's make use of some format string magic!
2013-07-11 21:29:33 +02:00
Lennart Poettering 6c12b52e19 core: add new "scope" unit type for making a unit of pre-existing processes
"Scope" units are very much like service units, however with the
difference that they are created from pre-existing processes, rather
than processes that systemd itself forks off. This means they are
generated programmatically via the bus API as transient units rather
than from static configuration read from disk. Also, they do not provide
execution-time parameters, as at the time systemd adds the processes to
the scope unit they already exist and the parameters cannot be applied
anymore.

The primary benefit of this new unit type is to create arbitrary cgroups
for worker-processes forked off an existing service.

This commit also adds a a new mode to "systemd-run" to run the specified
processes in a scope rather then a transient service.
2013-07-01 00:18:00 +02:00
Lennart Poettering c2756a6840 core: add transient units
Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).

Transient units require a unique name, that previously had no
configuration file on disk.

A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:

$ systemd-run /bin/ping www.heise.de

Will cause systemd to create a new transient service and run ping in it.
2013-06-28 04:12:58 +02:00
Lennart Poettering b42defe3b8 dbus: make more cgroup attributes runtime settable 2013-06-27 21:50:35 +02:00
Lennart Poettering 8e2af47840 dbus: add infrastructure for changing multiple properties at once on units and hook some cgroup attributes up to it
This introduces two bus calls to make runtime changes to selected bus
properties, optionally with persistence.

This currently hooks this up only for three cgroup atributes, but this
brings the infrastructure to add more changable attributes.

This allows setting multiple attributes at once, and takes an array
rather than a dictionary of properties, in order to implement simple
resetting of lists using the same approach as when they are sourced from
unit files. This means, that list properties are appended to by this
call, unless they are first reset via assigning the empty list.
2013-06-27 21:14:56 +02:00
Lennart Poettering 4ad490007b core: general cgroup rework
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).

This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.

This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
2013-06-27 04:17:34 +02:00
Lennart Poettering 9444b1f20e logind: add infrastructure to keep track of machines, and move to slices
- This changes all logind cgroup objects to use slice objects rather
  than fixed croup locations.

- logind can now collect minimal information about running
  VMs/containers. As fixed cgroup locations can no longer be used we
  need an entity that keeps track of machine cgroups in whatever slice
  they might be located. Since logind already keeps track of users,
  sessions and seats this is a trivial addition.

- nspawn will now register with logind and pass various bits of metadata
  along. A new option "--slice=" has been added to place the container
  in a specific slice.

- loginctl gained commands to list, introspect and terminate machines.

- user.slice and machine.slice will now be pulled in by logind.service,
  since only logind.service requires this slice.
2013-06-20 03:49:59 +02:00
Lennart Poettering a016b9228f core: add new .slice unit type for partitioning systems
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.

Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.

Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
2013-06-17 21:36:51 +02:00
Zbigniew Jędrzejewski-Szmek 44a6b1b680 Add __attribute__((const, pure, format)) in various places
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
2013-05-02 22:52:09 -04:00
Cristian Rodríguez b1e2b33c52 Add some extra __attribute__ ((format)) s 2013-04-25 21:50:48 -04:00
Lennart Poettering 31afa0a44c unit: rework stop pending logic
When a trigger unit wants to know if a stop is queued for it, we should
just check precisely that and do not check whether it is actually
stopped already. This is because we use these checks usually from state
change calls where the state variables are not updated yet.

This change splits unit_pending_inactive() into two calls
unit_inactive_or_pending() and unit_stop_pending(). The former checks
state and pending jobs, the latter only pending jobs.
2013-04-25 22:01:49 -03:00
Lennart Poettering 78edb35ab4 cgroup: always validate cgroup controller names
Let's better be safe than sorry.
2013-04-24 19:02:13 -03:00
Lennart Poettering 3ecaa09bcc unit: rework trigger dependency logic
Instead of having explicit type-specific callbacks that inform the
triggering unit when a triggered unit changes state, make this generic
so that state changes are forwarded betwee any triggered and triggering
unit.

Also, get rid of UnitRef references from automount, timer, path units,
to the units they trigger and rely exclsuively on UNIT_TRIGGER type
dendencies.
2013-04-23 16:00:32 -03:00
Zbigniew Jędrzejewski-Szmek e8e581bf25 Report about syntax errors with metadata
The information about the unit for which files are being parsed
is passed all the way down. This way messages land in the journal
with proper UNIT=... or USER_UNIT=... attribution.

'systemctl status' and 'journalctl -u' not displaying those messages
has been a source of confusion for users, since the journal entry for
a misspelt setting was often logged quite a bit earlier than the
failure to start a unit.

Based-on-a-patch-by: Oleksii Shevchuk <alxchk@gmail.com>
2013-04-17 00:09:16 -04:00
Oleksii Shevchuk ae7a7182da Introspect and monitor dropin configuration 2013-04-01 23:43:49 -04:00
Michal Schmidt 814cc56212 core: single unit_kill implementation for all unit types
There are very few differences in the implementations of the kill method in the
unit types that have one. Let's unify them.

This does not yet unify unit_kill() with unit_kill_context().
2013-03-13 17:21:53 +01:00
Michal Schmidt 49b1d37726 core: redefine unit_status_printf()
Take advantage of the fact that almost all callers want to pass unit
description as the last parameter. Those who don't can use the more
flexible manager_status_printf().
2013-02-28 02:23:21 +01:00
Michal Schmidt 25cee55076 core: add manager_status_printf()
unit_status_printf() checks the state of the manager, not of the unit
as such. Move it to manager.c and rename it to manager_status_printf().

Temporarily keep unit_status_printf as a wrapper macro.
2013-02-28 00:14:40 +01:00
Lennart Poettering 26d04f86a3 unit: rework resource management API
This introduces a new static list of known attributes and their special
semantics. This means that cgroup attribute values can now be
automatically translated from user to kernel notation for command line
set settings, too.

This also adds proper support for multi-line attributes.
2013-02-27 18:50:41 +01:00
Lennart Poettering cd2086fe65 core: unify kill code of mount, service, socket, swap units 2013-01-26 05:53:30 +01:00
Lennart Poettering 71645acac2 unit: optionally allow making cgroup attribute changes persistent 2013-01-19 01:02:30 +01:00
Mirco Tischler bbc9006e6b core: log USER_UNIT instead of UNIT if in user session 2013-01-18 11:14:00 -05:00
Lennart Poettering 246aa6dd9d core: add bus API and systemctl commands for altering cgroup parameters during runtime 2013-01-14 21:24:57 +01:00
Zbigniew Jędrzejewski-Szmek fdf9f9bbe4 journal: new logging macros to include UNIT=
Adding UNIT= to log lines allows them to be shown
in 'systemctl status' output, etc.

A new set of macros and functions is added. This allows for less
verbose notation than using log_struct() explicitly.

The set of logging functions is expanded to take a pair of arguments
(e.g. "UNIT=" and the RHS) which add an extra line to the structured
log entry. This can be used to add macros which add a different
identifier later on.
2013-01-06 13:52:48 -05:00
Lennart Poettering 01e10de3c2 socket: support socket activation of containers 2012-12-22 22:17:58 +01:00
Lennart Poettering 8742514c1a timer: recalculate next elapse for calendar timer units when the system clock is changed 2012-11-25 00:33:59 +01:00
Lennart Poettering 36697dc019 timer: implement calendar time events 2012-11-23 21:37:58 +01:00
Lennart Poettering 3ef63c3174 unit-printf: before resolving exec context specifiers check whether the object actually has an exec context 2012-09-18 11:40:01 +02:00