Systemd

Author	SHA1	Message	Date
Anita Zhang	31cd5f63ce	core: ExecCondition= for services Closes #10596	2019-07-17 11:35:02 +02:00
Lennart Poettering	4c2f584230	core: hook up service unit type with the new clean operation The implementation is pretty straight-foward: when we get a request to clean some type of resources we fork off a process doing that, and while it is running we are in the "cleaning" state.	2019-07-11 12:18:51 +02:00
Zbigniew Jędrzejewski-Szmek	edfea9fe0d	analyze: add 'condition' verb We didn't have a straightforward way to parse and evaluate those strings. Prompted by #12881.	2019-06-27 10:54:37 +02:00
Kai Lüke	fab347489f	bpf-firewall: custom BPF programs through IP(Ingress\|Egress)FilterPath= Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be specified multiple times. An empty assignment resets all previous filters. Closes https://github.com/systemd/systemd/issues/10227	2019-06-25 09:56:16 +02:00
Michal Sekletar	b070c7c0e1	core: introduce NUMAPolicy and NUMAMask options Make possible to set NUMA allocation policy for manager. Manager's policy is by default inherited to all forked off processes. However, it is possible to override the policy on per-service basis. Currently we support, these policies: default, prefer, bind, interleave, local. See man 2 set_mempolicy for details on each policy. Overall NUMA policy actually consists of two parts. Policy itself and bitmask representing NUMA nodes where is policy effective. Node mask can be specified using related option, NUMAMask. Default mask can be overwritten on per-service level.	2019-06-24 16:58:54 +02:00
Chris Down	7e7223b3d5	cgroup: Readd some plumbing for DefaultMemoryMin Somehow these got lost in the previous PR, rendering DefaultMemoryMin not very useful.	2019-05-08 12:06:32 +01:00
Jan Klötzke	dc653bf487	service: handle abort stops with dedicated timeout When shooting down a service with SIGABRT the user might want to have a much longer stop timeout than on regular stops/shutdowns. Especially in the face of short stop timeouts the time might not be sufficient to write huge core dumps before the service is killed. This commit adds a dedicated (Default)TimeoutAbortSec= timer that is used when stopping a service via SIGABRT. In all other cases the existing TimeoutStopSec= is used. The timer value is unset by default to skip the special handling and use TimeoutStopSec= for state 'stop-watchdog' to keep the old behaviour. If the service is in state 'stop-watchdog' and the service should be stopped explicitly we still go to 'stop-sigterm' and re-apply the usual TimeoutStopSec= timeout.	2019-04-12 17:32:52 +02:00
Chris Down	c52db42b78	cgroup: Implement default propagation of MemoryLow with DefaultMemoryLow In cgroup v2 we have protection tunables -- currently MemoryLow and MemoryMin (there will be more in future for other resources, too). The design of these protection tunables requires not only intermediate cgroups to propagate protections, but also the units at the leaf of that resource's operation to accept it (by setting MemoryLow or MemoryMin). This makes sense from an low-level API design perspective, but it's a good idea to also have a higher-level abstraction that can, by default, propagate these resources to children recursively. In this patch, this happens by having descendants set memory.low to N if their ancestor has DefaultMemoryLow=N -- assuming they don't set a separate MemoryLow value. Any affected unit can opt out of this propagation by manually setting `MemoryLow` to some value in its unit configuration. A unit can also stop further propagation by setting `DefaultMemoryLow=` with no argument. This removes further propagation in the subtree, but has no effect on the unit itself (for that, use `MemoryLow=0`). Our use case in production is simplifying the configuration of machines which heavily rely on memory protection tunables, but currently require tweaking a huge number of unit files to make that a reality. This directive makes that significantly less fragile, and decreases the risk of misconfiguration. After this patch is merged, I will implement DefaultMemoryMin= using the same principles.	2019-04-12 17:23:58 +02:00
Lennart Poettering	afcfaa695c	core: implement OOMPolicy= and watch cgroups for OOM killings This adds a new per-service OOMPolicy= (along with a global DefaultOOMPolicy=) that controls what to do if a process of the service is killed by the kernel's OOM killer. It has three different values: "continue" (old behaviour), "stop" (terminate the service), "kill" (let the kernel kill all the service's processes). On top of that, track OOM killer events per unit: generate a per-unit structured, recognizable log message when we see an OOM killer event, and put the service in a failure state if an OOM killer event was seen and the selected policy was not "continue". A new "result" is defined for this case: "oom-kill". All of this relies on new cgroupv2 kernel functionality: the "memory.events" notification interface and the "memory.oom.group" attribute (which makes the kernel kill all cgroup processes automatically).	2019-04-09 11:17:58 +02:00
Davide Cavalca	639dd43a36	core: fix build failure if seccomp is disabled	2019-04-03 13:46:32 +09:00
Lennart Poettering	f69567cbe2	core: expose SUID/SGID restriction as new unit setting RestrictSUIDSGID=	2019-04-02 16:56:48 +02:00
Lennart Poettering	efebb613c7	core: optionally, trigger .timer units on timezone and clock changes Fixes: #6228	2019-04-02 08:20:10 +02:00
Lennart Poettering	25a04ae55e	core: simply timer expression parsing by using ".ltype" field of conf-parser logic No change of behaviour. Let's just not parse the lvalue all the time with timer_base_from_string() if we can already pass it in parsed.	2019-04-01 18:25:43 +02:00
Lennart Poettering	a8d08f39d1	core: add new setting NetworkNamespacePath= for configuring a netns by path for a service Fixes: #2741	2019-03-07 16:55:23 +01:00
Lennart Poettering	eb5149ba74	Merge pull request #11682 from topimiettinen/private-utsname core: ProtectHostname feature	2019-02-20 14:12:15 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00
Filipe Brandenburger	10f2864111	core: add CPUQuotaPeriodSec= This new setting allows configuration of CFS period on the CPU cgroup, instead of using a hardcoded default of 100ms. Tested: - Legacy cgroup + Unified cgroup - systemctl set-property - systemctl show - Confirmed that the cgroup settings (such as cpu.cfs_period_ns) were set appropriately, including updating the CPU quota (cpu.cfs_quota_ns) when CPUQuotaPeriodSec= is updated. - Checked that clamping works properly when either period or (quota * period) are below the resolution of 1ms, or if period is above the max of 1s.	2019-02-14 11:04:42 -08:00
Chris Down	c72703e26d	cgroup: Add DisableControllers= directive to disable controller in subtree Some controllers (like the CPU controller) have a performance cost that is non-trivial on certain workloads. While this can be mitigated and improved to an extent, there will for some controllers always be some overheads associated with the benefits gained from the controller. Inside Facebook, the fix applied has been to disable the CPU controller forcibly with `cgroup_disable=cpu` on the kernel command line. This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes. Currently systemd provides some configuration knobs for these in the form of `[Default]CPUAccounting`, `[Default]MemoryAccounting`, and the like. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. `CPUWeight`), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units. This commit presents a more direct solution -- a DisableControllers= directive that forcibly disallows a controller from being enabled within a subtree.	2018-12-03 15:40:31 +00:00
Lennart Poettering	7af67e9a8b	core: allow to set exit status when using SuccessAction=/FailureAction=exit in units This adds SuccessActionExitStatus= and FailureActionExitStatus= that may be used to configure the exit status to propagate in when SuccessAction=exit or FailureAction=exit is used. When not specified let's also propagate the exit status of the main process we fork off for the unit.	2018-11-27 09:44:40 +01:00
Lennart Poettering	a9353a5c5b	core: log about /var/run/ prefix used in PIDFile=, patch it to be /run instead In a way this is a follow-up for `a2d1fb882c`, but adds a similar warning for PIDFile=. There's a much stronger case for doing this kind of notification in tmpfiles.d (since it helps relating lines to each other for the purpose of merging them). Doing this for PIDFile= is mostly about being systematic and copying tmpfiles.d/ behaviour here. While we are at it, let's also support relative filenames in PIDFile= now, and prefix them with /run, to make them absolute. Fixes: #10657	2018-11-10 19:17:00 +01:00
Anita Zhang	90fc172e19	core: implement per unit journal rate limiting Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for services. If provided, these values get passed to the journald client context, and those values are used in the rate limiting function in the journal over the the journald.conf values. Part of #10230	2018-10-18 09:56:20 +02:00
Anita Zhang	c87700a133	Make Watchdog Signal Configurable Allows configuring the watchdog signal (with a default of SIGABRT). This allows an alternative to SIGABRT when coredumps are not desirable. Appropriate references to SIGABRT or aborting were renamed to reflect more liberal watchdog signals. Closes #8658	2018-09-26 16:14:29 +02:00
Tejun Heo	6ae4283cb1	core: add IODeviceLatencyTargetSec This adds support for the following proposed latency based IO control mechanism. https://lkml.org/lkml/2018/6/5/428	2018-08-22 16:46:18 +02:00
Jon Ringle	fbb48d4c66	Make final kill signal configurable Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which should create a core dump useful for debugging why the service didn't stop with the SIGTERM	2018-07-23 13:44:54 +02:00
Tejun Heo	4842263577	core: add MemoryMin The kernel added support for a new cgroup memory controller knob memory.min in bf8d5d52ffe8 ("memcg: introduce memory.min") which was merged during v4.18 merge window. Add MemoryMin to support memory.min.	2018-07-12 08:21:43 +02:00
Lennart Poettering	228af36fff	core: add new PrivateMounts= unit setting This new setting is supposed to be useful in most cases where "MountFlags=slave" is currently used, i.e. as an explicit way to run a service in its own mount namespace and decouple propagation from all mounts of the new mount namespace towards the host. The effect of MountFlags=slave and PrivateMounts=yes is mostly the same, as both cause a CLONE_NEWNS namespace to be opened, and both will result in all mounts within it to be mounted MS_SLAVE. The difference is mostly on the conceptual/philosophical level: configuring the propagation mode is nothing people should have to think about, in particular as the matter is not precisely easyto grok. Moreover, MountFlags= allows configuration of "private" and "slave" modes which don't really make much sense to use in real-life and are quite confusing. In particular PrivateMounts=private means mounts made on the host stay pinned for good by the service which is particularly nasty for removable media mount. And PrivateMounts=shared is in most ways a NOP when used a alone... The main technical difference between setting only MountFlags=slave or only PrivateMounts=yes in a unit file is that the former remounts all mounts to MS_SLAVE and leaves them there, while that latter remounts them to MS_SHARED again right after. The latter is generally a nicer approach, since it disables propagation, while MS_SHARED is afterwards in effect, which is really nice as that means further namespacing down the tree will get MS_SHARED logic by default and we unify how applications see our mounts as we always pass them as MS_SHARED regardless whether any mount namespacing is used or not. The effect of PrivateMounts=yes was implied already by all the other mount namespacing options. With this new option we add an explicit knob for it, to request it without any other option used as well. See: #4393	2018-06-12 16:12:10 +02:00
Yu Watanabe	984faf29da	load-fragment: use DEFINE_CONFIG_PARSE_*() macros	2018-05-31 11:09:41 +09:00
Yu Watanabe	00463fbf0d	load-fragment: make SocketProtocol= accept the empty string	2018-05-31 11:09:41 +09:00
Yu Watanabe	fa65c28176	namespace: rename parse_protect_{home,system}_or_bool() to protect_{home,system}_or_bool_to_string() Hence, we can define config_parse_protect_{home,system}() by using DEFINE_CONFIG_PARSE_ENUM() macro.	2018-05-31 11:09:41 +09:00
Yu Watanabe	b54e98ef8e	socket-util: rename parse_socket_address_bind_ipv6_only_or_bool() to socket_address_bind_ipv6_only_or_bool_from_string() Hence, we can define config_parse_socket_bind() by using DEFINE_CONFIG_PARSE_ENUM() macro.	2018-05-31 11:09:41 +09:00
Yu Watanabe	0a9e363870	load-fragment: drop config_parse_no_new_privileges() and use config_parse_bool() instead	2018-05-31 11:09:41 +09:00
Yu Watanabe	6c58305ac3	load-fragment: use config_parse_sec_fix_0() for TimeoutStopSec=	2018-05-31 11:09:41 +09:00
Lennart Poettering	4f424df760	core: move config_parse_limit() to the generic conf-parser.[ch] That way we can use it in nspawn. Also, while we are at it, let's rename the call config_parse_rlimit(), i.e. insert the "r", to clarify what kind of limit this is about.	2018-05-17 20:36:52 +02:00
Felipe Sateler	57b7a260c2	core: undo the dependency inversion between unit.h and all unit types	2018-05-15 14:24:34 -04:00
Yu Watanabe	2abd4e388a	core: add new setting TemporaryFileSystem= This introduces a new setting TemporaryFileSystem=. This is useful to hide files not relevant to the processes invoked by unit, while necessary files or directories can be still accessed by combining with Bind{,ReadOnly}Paths=.	2018-02-21 09:17:52 +09:00
Yu Watanabe	8ab3934766	load-fragment: obsolete OnFailureIsolate=	2018-01-02 02:23:17 +09:00
Lennart Poettering	5022f08a23	core,udev,networkd: add ConditionKernelVersion= This adds a simple condition/assert/match to the service manager, to udev's .link handling and to networkd, for matching the kernel version string. In this version we only do fnmatch() based globbing, but we might want to extend that to version comparisons later on, if we like, by slightly extending the syntax with ">=", "<=", ">", "<" and "==" expressions.	2017-12-26 17:39:44 +01:00
Chris Down	e16647c39d	condition: Create AssertControlGroupController (#7630 ) Up until now, the behaviour in systemd has (mostly) been to silently ignore failures to action unit directives that refer to an unavailble controller. The addition of AssertControlGroupController and its conditional counterpart allow explicit specification of the desired behaviour when such a situation occurs. As for how this can happen, it is possible that a particular controller is not available in the cgroup hierarchy. One possible reason for this is that, in the running kernel, the controller simply doesn't exist -- for example, the CPU controller in cgroup v2 has only recently been merged and was out of tree until then. Another possibility is that the controller exists, but has been forcibly disabled by `cgroup_disable=` on the kernel command line. In future this will also support whatever comes out of issue #7624, `DefaultXAccounting=never`, or similar.	2017-12-18 08:53:29 +01:00
Lennart Poettering	b238be1e0d	core: enable specifier expansion for What=/Where=/Type=/SourcePath= too Using specifiers in these settings isn't particularly useful by itself, but it unifies behaviour a bit. It's kinda surprising that What= in mount units resolves specifies, but Where= does not. Hence let's add that too. Also, it's surprising Where=/What= in mount units behaves differently than in automount and swap units, hence resolve specifiers there too. Then, Type= in mount units is nowadays an arbitrary, sometimes non-trivial string (think fuse!), hence let's also expand specifiers there, to match the rest of the mount settings. This has the benefit that when writing code that generates unit files, less care has to be taken to check whether escaping of specifiers is necessary or not: broadly everything that takes arbitrary user strings now does specifier expansion, while enums/numerics/booleans do not.	2017-11-29 12:32:57 +01:00
Lennart Poettering	613613f1ee	core: use config_parse_unit_string_printf() for decoding RebootArgument= All other cases where we accept a reboot argument are decoded with config_parse_unit_string_printf() rather than config_parse_unit_path_printf(), and that's really the only thing what makes sense here, hence adjust this here, too.	2017-11-29 12:32:56 +01:00
Zbigniew Jędrzejewski-Szmek	82a27ba821	Merge pull request #7389 from shawnl/warning tree-wide: adjust fall through comments so that gcc is happy	2017-11-22 07:38:51 +01:00
Shawn Landden	4831981d89	tree-wide: adjust fall through comments so that gcc is happy Distcc removes comments, making the comment silencing not work. I know there was a decision against a macro in commit `ec251fe7d5`	2017-11-20 13:06:25 -08:00
Lennart Poettering	e7dfbb4e74	core: introduce SuccessAction= as unit file property SuccessAction= is similar to FailureAction= but declares what to do on success of a unit, rather than on failure. This is useful for running commands in qemu/nspawn images, that shall power down on completion. We frequently see "ExecStopPost=/usr/bin/systemctl poweroff" or so in unit files like this. Offer a simple, more declarative alternative for this. While we are at it, hook up failure action with unit_dump() and transient units too.	2017-11-20 16:37:22 +01:00
Lennart Poettering	53c35a766f	core: generalize FailureAction= move it from service to unit All kinds of units can fail, hence it makes sense to offer this as generic concept for all unit types.	2017-11-20 16:37:22 +01:00
Lennart Poettering	08f3be7a38	core: add two new unit file settings: StandardInputData= + StandardInputText= Both permit configuring data to pass through STDIN to an invoked process. StandardInputText= accepts a line of text (possibly with embedded C-style escapes as well as unit specifiers), which is appended to the buffer to pass as stdin, followed by a single newline. StandardInputData= is similar, but accepts arbitrary base64 encoded data, and will not resolve specifiers or C-style escapes, nor append newlines. This may be used to pass input/configuration data to services, directly in-line from unit files, either in a cooked or in a more raw format.	2017-11-17 11:13:44 +01:00
Lennart Poettering	5afe510c89	core: add a new unit file setting CollectMode= for tweaking the GC logic Right now, the option only takes one of two possible values "inactive" or "inactive-or-failed", the former being the default, and exposing same behaviour as the status quo ante. If set to "inactive-or-failed" units may be collected by the GC logic when in the "failed" state too. This logic should be a nicer alternative to using the "-" modifier for ExecStart= and friends, as the exit data is collected and logged about and only removed when the GC comes along. This should be useful in particular for per-connection socket-activated services, as well as "systemd-run" command lines that shall leave no artifacts in the system. I was thinking about whether to expose this as a boolean, but opted for an enum instead, as I have the suspicion other tweaks like this might be a added later on, in which case we extend this setting instead of having to add yet another one. Also, let's add some documentation for the GC logic.	2017-11-16 14:38:36 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Lennart Poettering	0263828039	core: rework the Delegate= unit file setting to take a list of controller names Previously it was not possible to select which controllers to enable for a unit where Delegate=yes was set, as all controllers were enabled. With this change, this is made configurable, and thus delegation units can pick specifically what they want to manage themselves, and what they don't care about.	2017-11-13 10:49:15 +01:00
Yu Watanabe	c54515b1e4	core/load-fragment: add RemoveIPC= (#7288 ) PR #3865 introduced RemoveIPC= but the option is not listed in load-fragment-gperf.gperf. So, the option could be used only via d-bus. This adds RemoveIPC= in load-fragment-gperf.gperf. Then, now we can set the option in unit files. Fixes #7281.	2017-11-10 10:15:55 +01:00
Yu Watanabe	2bf13bd51e	core/load-fragment: fix alignment	2017-11-08 15:49:22 +09:00

1 2 3 4 5

243 commits