Systemd

Author	SHA1	Message	Date
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00
Luca Boccassi	d4d55b0d13	core: add RootHashSignature service parameter Allow to explicitly pass root hash signature as a unit option. Takes precedence over implicit checks.	2020-06-25 08:45:21 +01:00
Luca Boccassi	0389f4fa81	core: add RootHash and RootVerity service parameters Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.	2020-06-23 10:50:09 +02:00
Jan Klötzke	bf76080180	core: let user define start-/stop-timeout behaviour The usual behaviour when a timeout expires is to terminate/kill the service. This is what user usually want in production systems. To debug services that fail to start/stop (especially sporadic failures) it might be necessary to trigger the watchdog machinery and write core dumps, though. Likewise, it is usually just a waste of time to gracefully stop a stuck service. Instead it might save time to go directly into kill mode. This commit adds two new options to services: TimeoutStartFailureMode= and TimeoutStopFailureMode=. Both take the same values and tweak the behavior of systemd when a start/stop timeout expires: * 'terminate': is the default behaviour as it has always been, * 'abort': triggers the watchdog machinery and will send SIGABRT (unless WatchdogSignal was changed) and * 'kill' will directly send SIGKILL. To handle the stop failure mode in stop-post state too a new final-watchdog state needs to be introduced.	2020-06-09 10:04:57 +02:00
Lennart Poettering	a3d19f5d99	core: add new PassPacketInfo= socket unit property	2020-05-27 22:40:38 +02:00
Martin Hundebøll	c600357ba6	mount: add ReadWriteOnly property to fail on read-only mounts Systems where a mount point is expected to be read-write needs a way to fail mount units that fallback as read-only. Add a property to allow setting the -w option when calling mount(8).	2020-05-01 13:23:30 +02:00
Zbigniew Jędrzejewski-Szmek	ad21e542b2	manager: add CoredumpFilter= setting Fixes #6685.	2020-04-09 14:08:48 +02:00
Lennart Poettering	91dd5f7cbe	core: add new LogNamespace= execution setting	2020-01-31 15:01:43 +01:00
Kevin Kuehler	fc64760dda	core: shared: Add ProtectClock= to systemd.exec	2020-01-26 12:23:33 -08:00
Lennart Poettering	eb34a981d6	core: initialize priority_set when parsing swap unit files Fixes: #14524	2020-01-09 17:08:31 +01:00
Zbigniew Jędrzejewski-Szmek	0b8d307587	pid1: fix the names of AllowedCPUs= and AllowedMemoryNodes= The original PR was submitted with CPUSetCpus and CPUSetMems, which was later changed to AllowedCPUs and AllowedMemmoryNodes everywhere (including the parser used by systemd-run), but not in the parser for unit files. Since we already released -rc1, let's keep support for the old names. I think we can remove it in a release or two if anyone remembers to do that. Fixes #14126. Follow-up for `047f5d63d7`.	2019-11-25 14:02:14 +01:00
Kevin Kuehler	8470304018	core: Add ProtectKernelLogs If seccomp is enabled, load the SYSCALL_FILTER_SET_SYSLOG into the seccomp filter set. Drop the CAP_SYSLOG capability.	2019-11-11 12:12:02 -08:00
Yu Watanabe	f5947a5e92	tree-wide: drop missing.h	2019-10-31 17:57:03 +09:00
Zbigniew Jędrzejewski-Szmek	a5f6f346d3	Merge pull request #13423 from pwithnall/12035-session-time-limits Add `RuntimeMaxSec=` support to scope units (time-limited login sessions)	2019-10-28 14:57:00 +01:00
Philip Withnall	9ed7de605d	scope: Support RuntimeMaxSec= directive in scope units Just as `RuntimeMaxSec=` is supported for service units, add support for it to scope units. This will gracefully kill a scope after the timeout expires from the moment the scope enters the running state. This could be used for time-limited login sessions, for example. Signed-off-by: Philip Withnall <withnall@endlessm.com> Fixes: #12035	2019-10-28 09:44:31 +01:00
Zbigniew Jędrzejewski-Szmek	a232ebcc2c	core: add support for RestartKillSignal= to override signal used for restart jobs v2: - if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary to preserve backwards compatibility (and keep KillSignal= generally useful).	2019-10-02 14:01:25 +02:00
Pavel Hrdina	047f5d63d7	cgroup: introduce support for cgroup v2 CPUSET controller Introduce support for configuring cpus and mems for processes using cgroup v2 CPUSET controller. This allows users to limit which cpus and memory NUMA nodes can be used by processes to better utilize system resources. The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems where the requested configuration is written. However, it doesn't mean that the requested configuration will be actually used as parent cgroup may limit the cpus or mems as well. In order to reflect the real configuration cgroup v2 provides read-only files cpuset.cpus.effective and cpuset.mems.effective which are exported to users as well.	2019-09-24 15:16:07 +02:00
Zbigniew Jędrzejewski-Szmek	5ac1530eca	tree-wide: say "ratelimit" not "rate_limit" "ratelimit" is a real word, so we don't need to use the other form anywhere. We had both forms in various places, let's standarize on the shorter and more correct one.	2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek	7bf081a1e5	pid1: rename start_limit to start_ratelimit This way it is clearer what the type is. We also have auto_stop_ratelimit adjacent, and it feels ugly to have a different suffix for those two.	2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek	6b4f7fb08c	Merge pull request #13385 from yuwata/core-remove-private-directories-13355 core: also remove private directories by systemctl clean	2019-08-31 09:28:39 +02:00
Yu Watanabe	12213aed12	core: move timeout_clean_usec from Service to ExecContext	2019-08-28 23:09:54 +09:00
Zbigniew Jędrzejewski-Szmek	ae480f0b09	shared/user-util: allow usernames with dots in specific fields People do have usernames with dots, and it makes them very unhappy that systemd doesn't like their that. It seems that there is no actual problem with allowing dots in the username. In particular chown declares ":" as the official separator, and internally in systemd we never rely on "." as the seperator between user and group (nor do we call chown directly). Using dots in the name is probably not a very good idea, but we don't need to care. Debian tools (adduser) do not allow users with dots to be created. This patch allows existing names with dots to be used in User, Group, SupplementaryGroups, SocketUser, SocketGroup fields, both in unit files and on the command line. DynamicUsers and sysusers still follow the strict policy. user@.service and tmpfiles already allowed arbitrary user names, and this remains unchanged. Fixes #12754.	2019-08-19 21:19:13 +02:00
Anita Zhang	31cd5f63ce	core: ExecCondition= for services Closes #10596	2019-07-17 11:35:02 +02:00
Lennart Poettering	4c2f584230	core: hook up service unit type with the new clean operation The implementation is pretty straight-foward: when we get a request to clean some type of resources we fork off a process doing that, and while it is running we are in the "cleaning" state.	2019-07-11 12:18:51 +02:00
Zbigniew Jędrzejewski-Szmek	edfea9fe0d	analyze: add 'condition' verb We didn't have a straightforward way to parse and evaluate those strings. Prompted by #12881.	2019-06-27 10:54:37 +02:00
Kai Lüke	fab347489f	bpf-firewall: custom BPF programs through IP(Ingress\|Egress)FilterPath= Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be specified multiple times. An empty assignment resets all previous filters. Closes https://github.com/systemd/systemd/issues/10227	2019-06-25 09:56:16 +02:00
Michal Sekletar	b070c7c0e1	core: introduce NUMAPolicy and NUMAMask options Make possible to set NUMA allocation policy for manager. Manager's policy is by default inherited to all forked off processes. However, it is possible to override the policy on per-service basis. Currently we support, these policies: default, prefer, bind, interleave, local. See man 2 set_mempolicy for details on each policy. Overall NUMA policy actually consists of two parts. Policy itself and bitmask representing NUMA nodes where is policy effective. Node mask can be specified using related option, NUMAMask. Default mask can be overwritten on per-service level.	2019-06-24 16:58:54 +02:00
Chris Down	7e7223b3d5	cgroup: Readd some plumbing for DefaultMemoryMin Somehow these got lost in the previous PR, rendering DefaultMemoryMin not very useful.	2019-05-08 12:06:32 +01:00
Jan Klötzke	dc653bf487	service: handle abort stops with dedicated timeout When shooting down a service with SIGABRT the user might want to have a much longer stop timeout than on regular stops/shutdowns. Especially in the face of short stop timeouts the time might not be sufficient to write huge core dumps before the service is killed. This commit adds a dedicated (Default)TimeoutAbortSec= timer that is used when stopping a service via SIGABRT. In all other cases the existing TimeoutStopSec= is used. The timer value is unset by default to skip the special handling and use TimeoutStopSec= for state 'stop-watchdog' to keep the old behaviour. If the service is in state 'stop-watchdog' and the service should be stopped explicitly we still go to 'stop-sigterm' and re-apply the usual TimeoutStopSec= timeout.	2019-04-12 17:32:52 +02:00
Chris Down	c52db42b78	cgroup: Implement default propagation of MemoryLow with DefaultMemoryLow In cgroup v2 we have protection tunables -- currently MemoryLow and MemoryMin (there will be more in future for other resources, too). The design of these protection tunables requires not only intermediate cgroups to propagate protections, but also the units at the leaf of that resource's operation to accept it (by setting MemoryLow or MemoryMin). This makes sense from an low-level API design perspective, but it's a good idea to also have a higher-level abstraction that can, by default, propagate these resources to children recursively. In this patch, this happens by having descendants set memory.low to N if their ancestor has DefaultMemoryLow=N -- assuming they don't set a separate MemoryLow value. Any affected unit can opt out of this propagation by manually setting `MemoryLow` to some value in its unit configuration. A unit can also stop further propagation by setting `DefaultMemoryLow=` with no argument. This removes further propagation in the subtree, but has no effect on the unit itself (for that, use `MemoryLow=0`). Our use case in production is simplifying the configuration of machines which heavily rely on memory protection tunables, but currently require tweaking a huge number of unit files to make that a reality. This directive makes that significantly less fragile, and decreases the risk of misconfiguration. After this patch is merged, I will implement DefaultMemoryMin= using the same principles.	2019-04-12 17:23:58 +02:00
Lennart Poettering	afcfaa695c	core: implement OOMPolicy= and watch cgroups for OOM killings This adds a new per-service OOMPolicy= (along with a global DefaultOOMPolicy=) that controls what to do if a process of the service is killed by the kernel's OOM killer. It has three different values: "continue" (old behaviour), "stop" (terminate the service), "kill" (let the kernel kill all the service's processes). On top of that, track OOM killer events per unit: generate a per-unit structured, recognizable log message when we see an OOM killer event, and put the service in a failure state if an OOM killer event was seen and the selected policy was not "continue". A new "result" is defined for this case: "oom-kill". All of this relies on new cgroupv2 kernel functionality: the "memory.events" notification interface and the "memory.oom.group" attribute (which makes the kernel kill all cgroup processes automatically).	2019-04-09 11:17:58 +02:00
Davide Cavalca	639dd43a36	core: fix build failure if seccomp is disabled	2019-04-03 13:46:32 +09:00
Lennart Poettering	f69567cbe2	core: expose SUID/SGID restriction as new unit setting RestrictSUIDSGID=	2019-04-02 16:56:48 +02:00
Lennart Poettering	efebb613c7	core: optionally, trigger .timer units on timezone and clock changes Fixes: #6228	2019-04-02 08:20:10 +02:00
Lennart Poettering	25a04ae55e	core: simply timer expression parsing by using ".ltype" field of conf-parser logic No change of behaviour. Let's just not parse the lvalue all the time with timer_base_from_string() if we can already pass it in parsed.	2019-04-01 18:25:43 +02:00
Lennart Poettering	a8d08f39d1	core: add new setting NetworkNamespacePath= for configuring a netns by path for a service Fixes: #2741	2019-03-07 16:55:23 +01:00
Lennart Poettering	eb5149ba74	Merge pull request #11682 from topimiettinen/private-utsname core: ProtectHostname feature	2019-02-20 14:12:15 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00
Filipe Brandenburger	10f2864111	core: add CPUQuotaPeriodSec= This new setting allows configuration of CFS period on the CPU cgroup, instead of using a hardcoded default of 100ms. Tested: - Legacy cgroup + Unified cgroup - systemctl set-property - systemctl show - Confirmed that the cgroup settings (such as cpu.cfs_period_ns) were set appropriately, including updating the CPU quota (cpu.cfs_quota_ns) when CPUQuotaPeriodSec= is updated. - Checked that clamping works properly when either period or (quota * period) are below the resolution of 1ms, or if period is above the max of 1s.	2019-02-14 11:04:42 -08:00
Chris Down	c72703e26d	cgroup: Add DisableControllers= directive to disable controller in subtree Some controllers (like the CPU controller) have a performance cost that is non-trivial on certain workloads. While this can be mitigated and improved to an extent, there will for some controllers always be some overheads associated with the benefits gained from the controller. Inside Facebook, the fix applied has been to disable the CPU controller forcibly with `cgroup_disable=cpu` on the kernel command line. This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes. Currently systemd provides some configuration knobs for these in the form of `[Default]CPUAccounting`, `[Default]MemoryAccounting`, and the like. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. `CPUWeight`), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units. This commit presents a more direct solution -- a DisableControllers= directive that forcibly disallows a controller from being enabled within a subtree.	2018-12-03 15:40:31 +00:00
Lennart Poettering	7af67e9a8b	core: allow to set exit status when using SuccessAction=/FailureAction=exit in units This adds SuccessActionExitStatus= and FailureActionExitStatus= that may be used to configure the exit status to propagate in when SuccessAction=exit or FailureAction=exit is used. When not specified let's also propagate the exit status of the main process we fork off for the unit.	2018-11-27 09:44:40 +01:00
Lennart Poettering	a9353a5c5b	core: log about /var/run/ prefix used in PIDFile=, patch it to be /run instead In a way this is a follow-up for `a2d1fb882c`, but adds a similar warning for PIDFile=. There's a much stronger case for doing this kind of notification in tmpfiles.d (since it helps relating lines to each other for the purpose of merging them). Doing this for PIDFile= is mostly about being systematic and copying tmpfiles.d/ behaviour here. While we are at it, let's also support relative filenames in PIDFile= now, and prefix them with /run, to make them absolute. Fixes: #10657	2018-11-10 19:17:00 +01:00
Anita Zhang	90fc172e19	core: implement per unit journal rate limiting Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for services. If provided, these values get passed to the journald client context, and those values are used in the rate limiting function in the journal over the the journald.conf values. Part of #10230	2018-10-18 09:56:20 +02:00
Anita Zhang	c87700a133	Make Watchdog Signal Configurable Allows configuring the watchdog signal (with a default of SIGABRT). This allows an alternative to SIGABRT when coredumps are not desirable. Appropriate references to SIGABRT or aborting were renamed to reflect more liberal watchdog signals. Closes #8658	2018-09-26 16:14:29 +02:00
Tejun Heo	6ae4283cb1	core: add IODeviceLatencyTargetSec This adds support for the following proposed latency based IO control mechanism. https://lkml.org/lkml/2018/6/5/428	2018-08-22 16:46:18 +02:00
Jon Ringle	fbb48d4c66	Make final kill signal configurable Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which should create a core dump useful for debugging why the service didn't stop with the SIGTERM	2018-07-23 13:44:54 +02:00
Tejun Heo	4842263577	core: add MemoryMin The kernel added support for a new cgroup memory controller knob memory.min in bf8d5d52ffe8 ("memcg: introduce memory.min") which was merged during v4.18 merge window. Add MemoryMin to support memory.min.	2018-07-12 08:21:43 +02:00
Lennart Poettering	228af36fff	core: add new PrivateMounts= unit setting This new setting is supposed to be useful in most cases where "MountFlags=slave" is currently used, i.e. as an explicit way to run a service in its own mount namespace and decouple propagation from all mounts of the new mount namespace towards the host. The effect of MountFlags=slave and PrivateMounts=yes is mostly the same, as both cause a CLONE_NEWNS namespace to be opened, and both will result in all mounts within it to be mounted MS_SLAVE. The difference is mostly on the conceptual/philosophical level: configuring the propagation mode is nothing people should have to think about, in particular as the matter is not precisely easyto grok. Moreover, MountFlags= allows configuration of "private" and "slave" modes which don't really make much sense to use in real-life and are quite confusing. In particular PrivateMounts=private means mounts made on the host stay pinned for good by the service which is particularly nasty for removable media mount. And PrivateMounts=shared is in most ways a NOP when used a alone... The main technical difference between setting only MountFlags=slave or only PrivateMounts=yes in a unit file is that the former remounts all mounts to MS_SLAVE and leaves them there, while that latter remounts them to MS_SHARED again right after. The latter is generally a nicer approach, since it disables propagation, while MS_SHARED is afterwards in effect, which is really nice as that means further namespacing down the tree will get MS_SHARED logic by default and we unify how applications see our mounts as we always pass them as MS_SHARED regardless whether any mount namespacing is used or not. The effect of PrivateMounts=yes was implied already by all the other mount namespacing options. With this new option we add an explicit knob for it, to request it without any other option used as well. See: #4393	2018-06-12 16:12:10 +02:00
Yu Watanabe	984faf29da	load-fragment: use DEFINE_CONFIG_PARSE_*() macros	2018-05-31 11:09:41 +09:00
Yu Watanabe	00463fbf0d	load-fragment: make SocketProtocol= accept the empty string	2018-05-31 11:09:41 +09:00

1 2 3 4 5 ...

265 commits