Systemd

Author	SHA1	Message	Date
Anita Zhang	4d824a4e0b	core: add ManagedOOM*= properties to configure systemd-oomd on the unit This adds the hook ups so it can be read with the usual systemd utilities. Used in later commits by sytemd-oomd.	2020-10-07 16:17:23 -07:00
Zbigniew Jędrzejewski-Szmek	5e98086d16	core: remember when we set ExecContext.mount_apivfs No functional change intended so far.	2020-09-24 10:03:18 +02:00
Topi Miettinen	9df2cdd8ec	exec: SystemCallLog= directive With new directive SystemCallLog= it's possible to list system calls to be logged. This can be used for auditing or temporarily when constructing system call filters. --- v5: drop intermediary, update HASHMAP_FOREACH_KEY() use v4: skip useless debug messages, actually parse directive v3: don't declare unused variables with old libseccomp v2: fix build without seccomp or old libseccomp	2020-09-15 12:54:17 +03:00
Lennart Poettering	bb0c0d6f29	core: add credentials logic Fixes: #15778 #16060	2020-08-25 19:45:35 +02:00
Lennart Poettering	4e39995371	core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options Kernel 5.8 gained a hidepid= implementation that is truly per procfs, which allows us to mount a distinct once into every unit, with individual hidepid= settings. Let's expose this via two new settings: ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=). Replaces: #11670	2020-08-24 20:11:02 +02:00
Lennart Poettering	476cfe626d	core: remove support for ConditionNull= The concept is flawed, and mostly useless. Let's finally remove it. It has been deprecated since `90a2ec10f2` (6 years ago) and we started to warn since `55dadc5c57` (1.5 years ago). Let's get rid of it altogether.	2020-08-20 14:01:25 +02:00
Luca Boccassi	b3d133148e	core: new feature MountImages Follows the same pattern and features as RootImage, but allows an arbitrary mount point under / to be specified by the user, and multiple values - like BindPaths. Original implementation by @topimiettinen at: https://github.com/systemd/systemd/pull/14451 Reworked to use dissect's logic instead of bare libmount() calls and other review comments. Thanks Topi for the initial work to come up with and implement this useful feature.	2020-08-05 21:34:55 +01:00
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00
Luca Boccassi	d4d55b0d13	core: add RootHashSignature service parameter Allow to explicitly pass root hash signature as a unit option. Takes precedence over implicit checks.	2020-06-25 08:45:21 +01:00
Luca Boccassi	0389f4fa81	core: add RootHash and RootVerity service parameters Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.	2020-06-23 10:50:09 +02:00
Jan Klötzke	bf76080180	core: let user define start-/stop-timeout behaviour The usual behaviour when a timeout expires is to terminate/kill the service. This is what user usually want in production systems. To debug services that fail to start/stop (especially sporadic failures) it might be necessary to trigger the watchdog machinery and write core dumps, though. Likewise, it is usually just a waste of time to gracefully stop a stuck service. Instead it might save time to go directly into kill mode. This commit adds two new options to services: TimeoutStartFailureMode= and TimeoutStopFailureMode=. Both take the same values and tweak the behavior of systemd when a start/stop timeout expires: * 'terminate': is the default behaviour as it has always been, * 'abort': triggers the watchdog machinery and will send SIGABRT (unless WatchdogSignal was changed) and * 'kill' will directly send SIGKILL. To handle the stop failure mode in stop-post state too a new final-watchdog state needs to be introduced.	2020-06-09 10:04:57 +02:00
Zbigniew Jędrzejewski-Szmek	ad21e542b2	manager: add CoredumpFilter= setting Fixes #6685.	2020-04-09 14:08:48 +02:00
Lennart Poettering	91dd5f7cbe	core: add new LogNamespace= execution setting	2020-01-31 15:01:43 +01:00
Lennart Poettering	eb34a981d6	core: initialize priority_set when parsing swap unit files Fixes: #14524	2020-01-09 17:08:31 +01:00
Zbigniew Jędrzejewski-Szmek	0b8d307587	pid1: fix the names of AllowedCPUs= and AllowedMemoryNodes= The original PR was submitted with CPUSetCpus and CPUSetMems, which was later changed to AllowedCPUs and AllowedMemmoryNodes everywhere (including the parser used by systemd-run), but not in the parser for unit files. Since we already released -rc1, let's keep support for the old names. I think we can remove it in a release or two if anyone remembers to do that. Fixes #14126. Follow-up for `047f5d63d7`.	2019-11-25 14:02:14 +01:00
Zbigniew Jędrzejewski-Szmek	fa036b6114	core: remove unused prototypes	2019-10-01 14:25:10 +02:00
Pavel Hrdina	047f5d63d7	cgroup: introduce support for cgroup v2 CPUSET controller Introduce support for configuring cpus and mems for processes using cgroup v2 CPUSET controller. This allows users to limit which cpus and memory NUMA nodes can be used by processes to better utilize system resources. The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems where the requested configuration is written. However, it doesn't mean that the requested configuration will be actually used as parent cgroup may limit the cpus or mems as well. In order to reflect the real configuration cgroup v2 provides read-only files cpuset.cpus.effective and cpuset.mems.effective which are exported to users as well.	2019-09-24 15:16:07 +02:00
Maciej Stanczew	6327aa9f6c	core: Fix setting StatusUnitFormat from config files	2019-09-17 15:21:21 +09:00
Zbigniew Jędrzejewski-Szmek	ae480f0b09	shared/user-util: allow usernames with dots in specific fields People do have usernames with dots, and it makes them very unhappy that systemd doesn't like their that. It seems that there is no actual problem with allowing dots in the username. In particular chown declares ":" as the official separator, and internally in systemd we never rely on "." as the seperator between user and group (nor do we call chown directly). Using dots in the name is probably not a very good idea, but we don't need to care. Debian tools (adduser) do not allow users with dots to be created. This patch allows existing names with dots to be used in User, Group, SupplementaryGroups, SocketUser, SocketGroup fields, both in unit files and on the command line. DynamicUsers and sysusers still follow the strict policy. user@.service and tmpfiles already allowed arbitrary user names, and this remains unchanged. Fixes #12754.	2019-08-19 21:19:13 +02:00
Frantisek Sumsal	a07a7324ad	core: move config_parse_* functions to a shared module Apart from making the code a little bit more clean, it should allow us to write a fuzzer around the config-parsing functions in the future	2019-06-25 22:35:02 +09:00
Kai Lüke	fab347489f	bpf-firewall: custom BPF programs through IP(Ingress\|Egress)FilterPath= Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be specified multiple times. An empty assignment resets all previous filters. Closes https://github.com/systemd/systemd/issues/10227	2019-06-25 09:56:16 +02:00
Michal Sekletar	b070c7c0e1	core: introduce NUMAPolicy and NUMAMask options Make possible to set NUMA allocation policy for manager. Manager's policy is by default inherited to all forked off processes. However, it is possible to override the policy on per-service basis. Currently we support, these policies: default, prefer, bind, interleave, local. See man 2 set_mempolicy for details on each policy. Overall NUMA policy actually consists of two parts. Policy itself and bitmask representing NUMA nodes where is policy effective. Node mask can be specified using related option, NUMAMask. Default mask can be overwritten on per-service level.	2019-06-24 16:58:54 +02:00
Jan Klötzke	dc653bf487	service: handle abort stops with dedicated timeout When shooting down a service with SIGABRT the user might want to have a much longer stop timeout than on regular stops/shutdowns. Especially in the face of short stop timeouts the time might not be sufficient to write huge core dumps before the service is killed. This commit adds a dedicated (Default)TimeoutAbortSec= timer that is used when stopping a service via SIGABRT. In all other cases the existing TimeoutStopSec= is used. The timer value is unset by default to skip the special handling and use TimeoutStopSec= for state 'stop-watchdog' to keep the old behaviour. If the service is in state 'stop-watchdog' and the service should be stopped explicitly we still go to 'stop-sigterm' and re-apply the usual TimeoutStopSec= timeout.	2019-04-12 17:32:52 +02:00
Lennart Poettering	afcfaa695c	core: implement OOMPolicy= and watch cgroups for OOM killings This adds a new per-service OOMPolicy= (along with a global DefaultOOMPolicy=) that controls what to do if a process of the service is killed by the kernel's OOM killer. It has three different values: "continue" (old behaviour), "stop" (terminate the service), "kill" (let the kernel kill all the service's processes). On top of that, track OOM killer events per unit: generate a per-unit structured, recognizable log message when we see an OOM killer event, and put the service in a failure state if an OOM killer event was seen and the selected policy was not "continue". A new "result" is defined for this case: "oom-kill". All of this relies on new cgroupv2 kernel functionality: the "memory.events" notification interface and the "memory.oom.group" attribute (which makes the kernel kill all cgroup processes automatically).	2019-04-09 11:17:58 +02:00
Chris Down	c72703e26d	cgroup: Add DisableControllers= directive to disable controller in subtree Some controllers (like the CPU controller) have a performance cost that is non-trivial on certain workloads. While this can be mitigated and improved to an extent, there will for some controllers always be some overheads associated with the benefits gained from the controller. Inside Facebook, the fix applied has been to disable the CPU controller forcibly with `cgroup_disable=cpu` on the kernel command line. This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes. Currently systemd provides some configuration knobs for these in the form of `[Default]CPUAccounting`, `[Default]MemoryAccounting`, and the like. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. `CPUWeight`), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units. This commit presents a more direct solution -- a DisableControllers= directive that forcibly disallows a controller from being enabled within a subtree.	2018-12-03 15:40:31 +00:00
Lennart Poettering	7af67e9a8b	core: allow to set exit status when using SuccessAction=/FailureAction=exit in units This adds SuccessActionExitStatus= and FailureActionExitStatus= that may be used to configure the exit status to propagate in when SuccessAction=exit or FailureAction=exit is used. When not specified let's also propagate the exit status of the main process we fork off for the unit.	2018-11-27 09:44:40 +01:00
Lennart Poettering	46f2d09f31	conf-parse: drop unused prototype	2018-11-17 08:47:27 +01:00
Lennart Poettering	a9353a5c5b	core: log about /var/run/ prefix used in PIDFile=, patch it to be /run instead In a way this is a follow-up for `a2d1fb882c`, but adds a similar warning for PIDFile=. There's a much stronger case for doing this kind of notification in tmpfiles.d (since it helps relating lines to each other for the purpose of merging them). Doing this for PIDFile= is mostly about being systematic and copying tmpfiles.d/ behaviour here. While we are at it, let's also support relative filenames in PIDFile= now, and prefix them with /run, to make them absolute. Fixes: #10657	2018-11-10 19:17:00 +01:00
Tejun Heo	6ae4283cb1	core: add IODeviceLatencyTargetSec This adds support for the following proposed latency based IO control mechanism. https://lkml.org/lkml/2018/6/5/428	2018-08-22 16:46:18 +02:00
Jon Ringle	fbb48d4c66	Make final kill signal configurable Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which should create a core dump useful for debugging why the service didn't stop with the SIGTERM	2018-07-23 13:44:54 +02:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Yu Watanabe	984faf29da	load-fragment: use DEFINE_CONFIG_PARSE_*() macros	2018-05-31 11:09:41 +09:00
Yu Watanabe	0a9e363870	load-fragment: drop config_parse_no_new_privileges() and use config_parse_bool() instead	2018-05-31 11:09:41 +09:00
Yu Watanabe	47544ea1cb	load-fragment: drop unused function config_parse_sysv_priority()	2018-05-31 11:09:41 +09:00
Lennart Poettering	a210692525	tree-wide: port over all code to the new CONFIG_PARSER_PROTOTYPE() macro This makes most header files easier to look at. Also Emacs gets really slow when browsing through large sections of overly long prototypes, which is much improved by this macro. We should probably not do something similar with too many other cases, as macros like this might help readability for some, but make it worse for others. But I think given the complexity of this specific prototype and how often we use it, it's worth doing.	2018-05-22 13:18:44 +02:00
Lennart Poettering	4f424df760	core: move config_parse_limit() to the generic conf-parser.[ch] That way we can use it in nspawn. Also, while we are at it, let's rename the call config_parse_rlimit(), i.e. insert the "r", to clarify what kind of limit this is about.	2018-05-17 20:36:52 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Rosen Penev	1e35c5ab27	systemd-link: Remove UDP Fragmentation Offload support. (#8183 ) Support was killed in kernel 4.15 as well as ethtool 4.13. Justification was lack of use by drivers and too much of a maintenance burden. https://www.spinics.net/lists/netdev/msg443815.html Also moved config_parse_warn_compat to conf-parser.[ch] to fix compile errors.	2018-03-18 14:28:14 +01:00
Yu Watanabe	2abd4e388a	core: add new setting TemporaryFileSystem= This introduces a new setting TemporaryFileSystem=. This is useful to hide files not relevant to the processes invoked by unit, while necessary files or directories can be still accessed by combining with Bind{,ReadOnly}Paths=.	2018-02-21 09:17:52 +09:00
Lennart Poettering	0133d5553a	Merge pull request #7198 from poettering/stdin-stdout Add StandardInput=data, StandardInput=file:... and more	2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	5db9818772	core: don't allow DefaultStandardOutput= be set to socket/fd:/file: These three settings only make sense within the context of actual unit files, hence filter this out when applied to the per-manager default, and generate a log message about it.	2017-11-17 11:13:44 +01:00
Lennart Poettering	08f3be7a38	core: add two new unit file settings: StandardInputData= + StandardInputText= Both permit configuring data to pass through STDIN to an invoked process. StandardInputText= accepts a line of text (possibly with embedded C-style escapes as well as unit specifiers), which is appended to the buffer to pass as stdin, followed by a single newline. StandardInputData= is similar, but accepts arbitrary base64 encoded data, and will not resolve specifiers or C-style escapes, nor append newlines. This may be used to pass input/configuration data to services, directly in-line from unit files, either in a cooked or in a more raw format.	2017-11-17 11:13:44 +01:00
Lennart Poettering	7fa0dcb6e3	core: drop config_parse_input() as it is unused	2017-11-17 11:13:44 +01:00
Lennart Poettering	5afe510c89	core: add a new unit file setting CollectMode= for tweaking the GC logic Right now, the option only takes one of two possible values "inactive" or "inactive-or-failed", the former being the default, and exposing same behaviour as the status quo ante. If set to "inactive-or-failed" units may be collected by the GC logic when in the "failed" state too. This logic should be a nicer alternative to using the "-" modifier for ExecStart= and friends, as the exit data is collected and logged about and only removed when the GC comes along. This should be useful in particular for per-connection socket-activated services, as well as "systemd-run" command lines that shall leave no artifacts in the system. I was thinking about whether to expose this as a boolean, but opted for an enum instead, as I have the suspicion other tweaks like this might be a added later on, in which case we extend this setting instead of having to add yet another one. Also, let's add some documentation for the GC logic.	2017-11-16 14:38:36 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Lennart Poettering	0263828039	core: rework the Delegate= unit file setting to take a list of controller names Previously it was not possible to select which controllers to enable for a unit where Delegate=yes was set, as all controllers were enabled. With this change, this is made configurable, and thus delegation units can pick specifically what they want to manage themselves, and what they don't care about.	2017-11-13 10:49:15 +01:00
Lennart Poettering	eae51da36e	unit: when JobTimeoutSec= is turned off, implicitly turn off JobRunningTimeoutSec= too We added JobRunningTimeoutSec= late, and Dracut configured only JobTimeoutSec= to turn of root device timeouts before. With this change we'll propagate a reset of JobTimeoutSec= into JobRunningTimeoutSec=, but only if the latter wasn't set explicitly. This should restore compatibility with older systemd versions. Fixes: #6402	2017-10-05 13:06:44 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00

1 2 3

126 commits