Commit graph

221 commits

Author SHA1 Message Date
Tejun Heo 6ae4283cb1 core: add IODeviceLatencyTargetSec
This adds support for the following proposed latency based IO control
mechanism.

  https://lkml.org/lkml/2018/6/5/428
2018-08-22 16:46:18 +02:00
Jon Ringle fbb48d4c66 Make final kill signal configurable
Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which
should create a core dump useful for debugging why the service didn't stop
with the SIGTERM
2018-07-23 13:44:54 +02:00
Tejun Heo 4842263577 core: add MemoryMin
The kernel added support for a new cgroup memory controller knob memory.min in
bf8d5d52ffe8 ("memcg: introduce memory.min") which was merged during v4.18
merge window.

Add MemoryMin to support memory.min.
2018-07-12 08:21:43 +02:00
Lennart Poettering 228af36fff core: add new PrivateMounts= unit setting
This new setting is supposed to be useful in most cases where
"MountFlags=slave" is currently used, i.e. as an explicit way to run a
service in its own mount namespace and decouple propagation from all
mounts of the new mount namespace towards the host.

The effect of MountFlags=slave and PrivateMounts=yes is mostly the same,
as both cause a CLONE_NEWNS namespace to be opened, and both will result
in all mounts within it to be mounted MS_SLAVE. The difference is mostly
on the conceptual/philosophical level: configuring the propagation mode
is nothing people should have to think about, in particular as the
matter is not precisely easyto grok. Moreover, MountFlags= allows configuration
of "private" and "slave" modes which don't really make much sense to use
in real-life and are quite confusing. In particular PrivateMounts=private means
mounts made on the host stay pinned for good by the service which is
particularly nasty for removable media mount. And PrivateMounts=shared
is in most ways a NOP when used a alone...

The main technical difference between setting only MountFlags=slave or
only PrivateMounts=yes in a unit file is that the former remounts all
mounts to MS_SLAVE and leaves them there, while that latter remounts
them to MS_SHARED again right after. The latter is generally a nicer
approach, since it disables propagation, while MS_SHARED is afterwards
in effect, which is really nice as that means further namespacing down
the tree will get MS_SHARED logic by default and we unify how
applications see our mounts as we always pass them as MS_SHARED
regardless whether any mount namespacing is used or not.

The effect of PrivateMounts=yes was implied already by all the other
mount namespacing options. With this new option we add an explicit knob
for it, to request it without any other option used as well.

See: #4393
2018-06-12 16:12:10 +02:00
Yu Watanabe 984faf29da load-fragment: use DEFINE_CONFIG_PARSE_*() macros 2018-05-31 11:09:41 +09:00
Yu Watanabe 00463fbf0d load-fragment: make SocketProtocol= accept the empty string 2018-05-31 11:09:41 +09:00
Yu Watanabe fa65c28176 namespace: rename parse_protect_{home,system}_or_bool() to protect_{home,system}_or_bool_to_string()
Hence, we can define config_parse_protect_{home,system}() by using
DEFINE_CONFIG_PARSE_ENUM() macro.
2018-05-31 11:09:41 +09:00
Yu Watanabe b54e98ef8e socket-util: rename parse_socket_address_bind_ipv6_only_or_bool() to socket_address_bind_ipv6_only_or_bool_from_string()
Hence, we can define config_parse_socket_bind() by using
DEFINE_CONFIG_PARSE_ENUM() macro.
2018-05-31 11:09:41 +09:00
Yu Watanabe 0a9e363870 load-fragment: drop config_parse_no_new_privileges() and use config_parse_bool() instead 2018-05-31 11:09:41 +09:00
Yu Watanabe 6c58305ac3 load-fragment: use config_parse_sec_fix_0() for TimeoutStopSec= 2018-05-31 11:09:41 +09:00
Lennart Poettering 4f424df760 core: move config_parse_limit() to the generic conf-parser.[ch]
That way we can use it in nspawn.

Also, while we are at it, let's rename the call config_parse_rlimit(),
i.e. insert the "r", to clarify what kind of limit this is about.
2018-05-17 20:36:52 +02:00
Felipe Sateler 57b7a260c2 core: undo the dependency inversion between unit.h and all unit types 2018-05-15 14:24:34 -04:00
Yu Watanabe 2abd4e388a core: add new setting TemporaryFileSystem=
This introduces a new setting TemporaryFileSystem=. This is useful
to hide files not relevant to the processes invoked by unit, while
necessary files or directories can be still accessed by combining
with Bind{,ReadOnly}Paths=.
2018-02-21 09:17:52 +09:00
Yu Watanabe 8ab3934766 load-fragment: obsolete OnFailureIsolate= 2018-01-02 02:23:17 +09:00
Lennart Poettering 5022f08a23 core,udev,networkd: add ConditionKernelVersion=
This adds a simple condition/assert/match to the service manager, to
udev's .link handling and to networkd, for matching the kernel version
string.

In this version we only do fnmatch() based globbing, but we might want
to extend that to version comparisons later on, if we like, by slightly
extending the syntax with ">=", "<=", ">", "<" and "==" expressions.
2017-12-26 17:39:44 +01:00
Chris Down e16647c39d condition: Create AssertControlGroupController (#7630)
Up until now, the behaviour in systemd has (mostly) been to silently
ignore failures to action unit directives that refer to an unavailble
controller. The addition of AssertControlGroupController and its
conditional counterpart allow explicit specification of the desired
behaviour when such a situation occurs.

As for how this can happen, it is possible that a particular controller
is not available in the cgroup hierarchy. One possible reason for this
is that, in the running kernel, the controller simply doesn't exist --
for example, the CPU controller in cgroup v2 has only recently been
merged and was out of tree until then. Another possibility is that the
controller exists, but has been forcibly disabled by `cgroup_disable=`
on the kernel command line.

In future this will also support whatever comes out of issue #7624,
`DefaultXAccounting=never`, or similar.
2017-12-18 08:53:29 +01:00
Lennart Poettering b238be1e0d core: enable specifier expansion for What=/Where=/Type=/SourcePath= too
Using specifiers in these settings isn't particularly useful by itself,
but it unifies behaviour a bit. It's kinda surprising that What= in
mount units resolves specifies, but Where= does not. Hence let's add
that too. Also, it's surprising Where=/What= in mount units behaves
differently than in automount and swap units, hence resolve specifiers
there too. Then, Type= in mount units is nowadays an arbitrary,
sometimes non-trivial string (think fuse!), hence let's also expand
specifiers there, to match the rest of the mount settings.

This has the benefit that when writing code that generates unit files,
less care has to be taken to check whether escaping of specifiers is
necessary or not: broadly everything that takes arbitrary user strings
now does specifier expansion, while enums/numerics/booleans do not.
2017-11-29 12:32:57 +01:00
Lennart Poettering 613613f1ee core: use config_parse_unit_string_printf() for decoding RebootArgument=
All other cases where we accept a reboot argument are decoded with
config_parse_unit_string_printf() rather than
config_parse_unit_path_printf(), and that's really the only thing what
makes sense here, hence adjust this here, too.
2017-11-29 12:32:56 +01:00
Zbigniew Jędrzejewski-Szmek 82a27ba821
Merge pull request #7389 from shawnl/warning
tree-wide: adjust fall through comments so that gcc is happy
2017-11-22 07:38:51 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Lennart Poettering e7dfbb4e74 core: introduce SuccessAction= as unit file property
SuccessAction= is similar to FailureAction= but declares what to do on
success of a unit, rather than on failure. This is useful for running
commands in qemu/nspawn images, that shall power down on completion. We
frequently see "ExecStopPost=/usr/bin/systemctl poweroff" or so in unit
files like this. Offer a simple, more declarative alternative for this.

While we are at it, hook up failure action with unit_dump() and
transient units too.
2017-11-20 16:37:22 +01:00
Lennart Poettering 53c35a766f core: generalize FailureAction= move it from service to unit
All kinds of units can fail, hence it makes sense to offer this as
generic concept for all unit types.
2017-11-20 16:37:22 +01:00
Lennart Poettering 08f3be7a38 core: add two new unit file settings: StandardInputData= + StandardInputText=
Both permit configuring data to pass through STDIN to an invoked
process. StandardInputText= accepts a line of text (possibly with
embedded C-style escapes as well as unit specifiers), which is appended
to the buffer to pass as stdin, followed by a single newline.
StandardInputData= is similar, but accepts arbitrary base64 encoded
data, and will not resolve specifiers or C-style escapes, nor append
newlines.

This may be used to pass input/configuration data to services, directly
in-line from unit files, either in a cooked or in a more raw format.
2017-11-17 11:13:44 +01:00
Lennart Poettering 5afe510c89 core: add a new unit file setting CollectMode= for tweaking the GC logic
Right now, the option only takes one of two possible values "inactive"
or "inactive-or-failed", the former being the default, and exposing same
behaviour as the status quo ante. If set to "inactive-or-failed" units
may be collected by the GC logic when in the "failed" state too.

This logic should be a nicer alternative to using the "-" modifier for
ExecStart= and friends, as the exit data is collected and logged about
and only removed when the GC comes along. This should be useful in
particular for per-connection socket-activated services, as well as
"systemd-run" command lines that shall leave no artifacts in the
system.

I was thinking about whether to expose this as a boolean, but opted for
an enum instead, as I have the suspicion other tweaks like this might be
a added later on, in which case we extend this setting instead of having
to add yet another one.

Also, let's add some documentation for the GC logic.
2017-11-16 14:38:36 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Lennart Poettering 0263828039 core: rework the Delegate= unit file setting to take a list of controller names
Previously it was not possible to select which controllers to enable for
a unit where Delegate=yes was set, as all controllers were enabled. With
this change, this is made configurable, and thus delegation units can
pick specifically what they want to manage themselves, and what they
don't care about.
2017-11-13 10:49:15 +01:00
Yu Watanabe c54515b1e4 core/load-fragment: add RemoveIPC= (#7288)
PR #3865 introduced RemoveIPC= but the option is not listed in
load-fragment-gperf.gperf. So, the option could be used only via d-bus.
This adds RemoveIPC= in load-fragment-gperf.gperf. Then, now we can
set the option in unit files.

Fixes #7281.
2017-11-10 10:15:55 +01:00
Yu Watanabe 2bf13bd51e core/load-fragment: fix alignment 2017-11-08 15:49:22 +09:00
Lennart Poettering eae51da36e unit: when JobTimeoutSec= is turned off, implicitly turn off JobRunningTimeoutSec= too
We added JobRunningTimeoutSec= late, and Dracut configured only
JobTimeoutSec= to turn of root device timeouts before. With this change
we'll propagate a reset of JobTimeoutSec= into JobRunningTimeoutSec=,
but only if the latter wasn't set explicitly.

This should restore compatibility with older systemd versions.

Fixes: #6402
2017-10-05 13:06:44 +02:00
Zbigniew Jędrzejewski-Szmek f9fa32f09c build-sys: s/HAVE_SMACK/ENABLE_SMACK/
Same justification as for HAVE_UTMP.
2017-10-04 12:09:50 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Lennart Poettering b1edf4456e core: add new per-unit setting KeyringMode= for controlling kernel keyring setup
Usually, it's a good thing that we isolate the kernel session keyring
for the various services and disconnect them from the user keyring.
However, in case of the cryptsetup key caching we actually want that
multiple instances of the cryptsetup service can share the keys in the
root user's user keyring, hence we need to be able to disable this logic
for them.

This adds KeyringMode=inherit|private|shared:

    inherit: don't do any keyring magic (this is the default in systemd --user)
    private: a private keyring as before (default in systemd --system)
    shared: the new setting
2017-09-15 16:53:35 +02:00
Lennart Poettering 00819cc151 core: add new UnsetEnvironment= setting for unit files
With this setting we can explicitly unset specific variables for
processes of a unit, as last step of assembling the environment block
for them. This is useful to fix #6407.

While we are at it, greatly expand the documentation on how the
environment block for forked off processes is assembled.
2017-09-14 15:17:40 +02:00
Topi Miettinen 78e864e5b3 seccomp: LockPersonality boolean (#6193)
Add LockPersonality boolean to allow locking down personality(2)
system call so that the execution domain can't be changed.
This may be useful to improve security because odd emulations
may be poorly tested and source of vulnerabilities, while
system services shouldn't need any weird personalities.
2017-08-29 15:54:50 +02:00
Zbigniew Jędrzejewski-Szmek 4bc5d27b94 Drop busname unit type
Since busname units are only useful with kdbus, they weren't actively
used. This was dead code, only compile-tested. If busname units are
ever added back, it'll be cleaner to start from scratch (possibly reverting
parts of this patch).
2017-07-23 09:29:02 -04:00
Yu Watanabe 8ae12e733c core: fix typo (#6417) 2017-07-21 10:36:39 +02:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Lennart Poettering e758bc9132 Merge pull request #6387 from keszybz/fix-timeout-0
Fix x-systemd.timeout=0 in fstab
2017-07-18 00:04:24 +02:00
Zbigniew Jędrzejewski-Szmek 4a06cbf838 Use config_parse_sec_fix_0() also for JobRunningTimeoutSec
2d79a0bbb9 did that for TimeoutSec=,
89beff89ed did that for JobTimeoutSec=,
and 0004f698df did that for
x-systemd.device-timeout=. But after parsing x-systemd.device-timeout=xxx
we write it out as JobRunningTimeoutSec=xxx. Two options:
- write out JobRunningTimeoutSec=<a very big number>,
- change JobRunningTimeoutSec= to behave like the other options.

I think it would be confusing for JobRunningTimeoutSec= to have different
syntax then TimeoutSec= and JobTimeoutSec=, so this patch implements the
second option.

Fixes #6264, https://bugzilla.redhat.com/show_bug.cgi?id=1462378.
2017-07-17 16:03:49 -04:00
Yu Watanabe 53f47dfc7b core: allow preserving contents of RuntimeDirectory= over process restart
This introduces RuntimeDirectoryPreserve= option which takes a boolean
argument or 'restart'.

Closes #6087.
2017-07-17 16:22:25 +09:00
Zbigniew Jędrzejewski-Szmek 2c75fb7330 core/load-fragment: refuse units with errors in RootDirectory/RootImage/DynamicUser
Behaviour of the service is completely different with the option off, so the
service would probably mess up state on disk and do unexpected things.
2017-07-11 13:38:13 -04:00
Lennart Poettering defdbbb6dc Merge pull request #5926 from fsateler/condition-uid
core: add ConditionUID and ConditionGID
2017-05-29 15:18:38 +02:00
Felipe Sateler c465a29f24 core: add ConditionUser and ConditionGroup
This adds two options that are useful for user units. In particular, it
is useful to check ConditionUser=!0 to not start for the root user.

Closes: #5187
2017-05-26 09:42:44 -04:00
NeilBrown 2d79a0bbb9 Allow TimeoutSec=0 to work as documented in mount units and elsewhere (#6013)
Since commit 36c16a7cdd ("core: rework unit timeout handling, and add
new setting RuntimeMaxSec=") TimeoutSec=0 in mount units has
cause the mount to timeout immediately instead of never as documented.

There is a similar problem with Socket.TimeoutSec and Swap.TimeoutSec.

These are easily fixed using config_parse_sec_fix_0().

Automount.TimeoutIdleSec looks like it could have the same problem,
but doesn't because the kernel treats '0' as 'no timeout'.
It handle USEC_INFINITY correctly only because that constant has
the value '-1', and when round up, it becomes zero.
To avoid possible confusion, use config_parse_sec_fix_0() as well, and
explicitly handle USEC_INFINITY.
2017-05-23 09:42:26 +02:00
Michal Koutný a2df3ea4ae job: add JobRunningTimeoutSec for JOB_RUNNING state
Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The
introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts
running (e.g. it groups all Exec* commands of a service or spans waiting for a
device period.)

Unit.JobRunningTimeoutSec is intended to be used by default instead of
Unit.JobTimeoutSec for device units where such behavior causes less confusion
(consider a job for a _netdev mount device, with this change the timeout will
start ticking only after the network is ready).
2017-04-25 18:00:29 +02:00
Lennart Poettering 915e6d1676 core: add RootImage= setting for using a specific image file as root directory for a service
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.

This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
2017-02-07 12:19:42 +01:00
Lennart Poettering 5d997827e2 core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory=
This adds a boolean unit file setting MountAPIVFS=. If set, the three
main API VFS mounts will be mounted for the service. This only has an
effect on RootDirectory=, which it makes a ton times more useful.

(This is basically the /dev + /proc + /sys mounting code posted in the
original #4727, but rebased on current git, and with the automatic logic
replaced by explicit logic controlled by a unit file setting)
2017-02-07 11:22:05 +01:00
Lennart Poettering d2d6c096f6 core: add ability to define arbitrary bind mounts for services
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.

The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).

Fixes: #3439
2016-12-14 00:54:10 +01:00
Lennart Poettering d107589cd2 core: turn on specifier expansion for more unit file settings
Let's permit specifier expansion at a numbre of additional fields, where
arbitrary strings might be passed where this might be useful one day. (Or at
least where there's no clear reason where it wouldn't make sense to have.)
2016-12-07 18:47:32 +01:00
Djalal Harouni d6299d613f core:gperf: pass the exec_context struct directly to parse restrict namespaces
The RestrictNamespaces= takes yes, no or a list of namespaces types,
therefor config_parse_restrict_namespaces() is a bit complex and it
operates on the ExecContext, fix this by passing the offset of
ExecContext directly otherwise restricting namespaces won't work.
2016-11-15 15:04:43 +01:00