Systemd

Author	SHA1	Message	Date
Lennart Poettering	eb5149ba74	Merge pull request #11682 from topimiettinen/private-utsname core: ProtectHostname feature	2019-02-20 14:12:15 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00
Filipe Brandenburger	10f2864111	core: add CPUQuotaPeriodSec= This new setting allows configuration of CFS period on the CPU cgroup, instead of using a hardcoded default of 100ms. Tested: - Legacy cgroup + Unified cgroup - systemctl set-property - systemctl show - Confirmed that the cgroup settings (such as cpu.cfs_period_ns) were set appropriately, including updating the CPU quota (cpu.cfs_quota_ns) when CPUQuotaPeriodSec= is updated. - Checked that clamping works properly when either period or (quota * period) are below the resolution of 1ms, or if period is above the max of 1s.	2019-02-14 11:04:42 -08:00
Chris Down	c72703e26d	cgroup: Add DisableControllers= directive to disable controller in subtree Some controllers (like the CPU controller) have a performance cost that is non-trivial on certain workloads. While this can be mitigated and improved to an extent, there will for some controllers always be some overheads associated with the benefits gained from the controller. Inside Facebook, the fix applied has been to disable the CPU controller forcibly with `cgroup_disable=cpu` on the kernel command line. This presents a problem: to disable or reenable the controller, a reboot is required, but this is quite cumbersome and slow to do for many thousands of machines, especially machines where disabling/enabling a stateful service on a machine is a matter of several minutes. Currently systemd provides some configuration knobs for these in the form of `[Default]CPUAccounting`, `[Default]MemoryAccounting`, and the like. The limitation of these is that Default*Accounting is overrideable by individual services, of which any one could decide to reenable a controller within the hierarchy at any point just by using a controller feature implicitly (eg. `CPUWeight`), even if the use of that CPU feature could just be opportunistic. Since many services are provided by the distribution, or by upstream teams at a particular organisation, it's not a sustainable solution to simply try to find and remove offending directives from these units. This commit presents a more direct solution -- a DisableControllers= directive that forcibly disallows a controller from being enabled within a subtree.	2018-12-03 15:40:31 +00:00
Lennart Poettering	7af67e9a8b	core: allow to set exit status when using SuccessAction=/FailureAction=exit in units This adds SuccessActionExitStatus= and FailureActionExitStatus= that may be used to configure the exit status to propagate in when SuccessAction=exit or FailureAction=exit is used. When not specified let's also propagate the exit status of the main process we fork off for the unit.	2018-11-27 09:44:40 +01:00
Lennart Poettering	a9353a5c5b	core: log about /var/run/ prefix used in PIDFile=, patch it to be /run instead In a way this is a follow-up for `a2d1fb882c`, but adds a similar warning for PIDFile=. There's a much stronger case for doing this kind of notification in tmpfiles.d (since it helps relating lines to each other for the purpose of merging them). Doing this for PIDFile= is mostly about being systematic and copying tmpfiles.d/ behaviour here. While we are at it, let's also support relative filenames in PIDFile= now, and prefix them with /run, to make them absolute. Fixes: #10657	2018-11-10 19:17:00 +01:00
Anita Zhang	90fc172e19	core: implement per unit journal rate limiting Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for services. If provided, these values get passed to the journald client context, and those values are used in the rate limiting function in the journal over the the journald.conf values. Part of #10230	2018-10-18 09:56:20 +02:00
Anita Zhang	c87700a133	Make Watchdog Signal Configurable Allows configuring the watchdog signal (with a default of SIGABRT). This allows an alternative to SIGABRT when coredumps are not desirable. Appropriate references to SIGABRT or aborting were renamed to reflect more liberal watchdog signals. Closes #8658	2018-09-26 16:14:29 +02:00
Tejun Heo	6ae4283cb1	core: add IODeviceLatencyTargetSec This adds support for the following proposed latency based IO control mechanism. https://lkml.org/lkml/2018/6/5/428	2018-08-22 16:46:18 +02:00
Jon Ringle	fbb48d4c66	Make final kill signal configurable Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which should create a core dump useful for debugging why the service didn't stop with the SIGTERM	2018-07-23 13:44:54 +02:00
Tejun Heo	4842263577	core: add MemoryMin The kernel added support for a new cgroup memory controller knob memory.min in bf8d5d52ffe8 ("memcg: introduce memory.min") which was merged during v4.18 merge window. Add MemoryMin to support memory.min.	2018-07-12 08:21:43 +02:00
Lennart Poettering	228af36fff	core: add new PrivateMounts= unit setting This new setting is supposed to be useful in most cases where "MountFlags=slave" is currently used, i.e. as an explicit way to run a service in its own mount namespace and decouple propagation from all mounts of the new mount namespace towards the host. The effect of MountFlags=slave and PrivateMounts=yes is mostly the same, as both cause a CLONE_NEWNS namespace to be opened, and both will result in all mounts within it to be mounted MS_SLAVE. The difference is mostly on the conceptual/philosophical level: configuring the propagation mode is nothing people should have to think about, in particular as the matter is not precisely easyto grok. Moreover, MountFlags= allows configuration of "private" and "slave" modes which don't really make much sense to use in real-life and are quite confusing. In particular PrivateMounts=private means mounts made on the host stay pinned for good by the service which is particularly nasty for removable media mount. And PrivateMounts=shared is in most ways a NOP when used a alone... The main technical difference between setting only MountFlags=slave or only PrivateMounts=yes in a unit file is that the former remounts all mounts to MS_SLAVE and leaves them there, while that latter remounts them to MS_SHARED again right after. The latter is generally a nicer approach, since it disables propagation, while MS_SHARED is afterwards in effect, which is really nice as that means further namespacing down the tree will get MS_SHARED logic by default and we unify how applications see our mounts as we always pass them as MS_SHARED regardless whether any mount namespacing is used or not. The effect of PrivateMounts=yes was implied already by all the other mount namespacing options. With this new option we add an explicit knob for it, to request it without any other option used as well. See: #4393	2018-06-12 16:12:10 +02:00
Yu Watanabe	984faf29da	load-fragment: use DEFINE_CONFIG_PARSE_*() macros	2018-05-31 11:09:41 +09:00
Yu Watanabe	00463fbf0d	load-fragment: make SocketProtocol= accept the empty string	2018-05-31 11:09:41 +09:00
Yu Watanabe	fa65c28176	namespace: rename parse_protect_{home,system}_or_bool() to protect_{home,system}_or_bool_to_string() Hence, we can define config_parse_protect_{home,system}() by using DEFINE_CONFIG_PARSE_ENUM() macro.	2018-05-31 11:09:41 +09:00
Yu Watanabe	b54e98ef8e	socket-util: rename parse_socket_address_bind_ipv6_only_or_bool() to socket_address_bind_ipv6_only_or_bool_from_string() Hence, we can define config_parse_socket_bind() by using DEFINE_CONFIG_PARSE_ENUM() macro.	2018-05-31 11:09:41 +09:00
Yu Watanabe	0a9e363870	load-fragment: drop config_parse_no_new_privileges() and use config_parse_bool() instead	2018-05-31 11:09:41 +09:00
Yu Watanabe	6c58305ac3	load-fragment: use config_parse_sec_fix_0() for TimeoutStopSec=	2018-05-31 11:09:41 +09:00
Lennart Poettering	4f424df760	core: move config_parse_limit() to the generic conf-parser.[ch] That way we can use it in nspawn. Also, while we are at it, let's rename the call config_parse_rlimit(), i.e. insert the "r", to clarify what kind of limit this is about.	2018-05-17 20:36:52 +02:00
Felipe Sateler	57b7a260c2	core: undo the dependency inversion between unit.h and all unit types	2018-05-15 14:24:34 -04:00
Yu Watanabe	2abd4e388a	core: add new setting TemporaryFileSystem= This introduces a new setting TemporaryFileSystem=. This is useful to hide files not relevant to the processes invoked by unit, while necessary files or directories can be still accessed by combining with Bind{,ReadOnly}Paths=.	2018-02-21 09:17:52 +09:00
Yu Watanabe	8ab3934766	load-fragment: obsolete OnFailureIsolate=	2018-01-02 02:23:17 +09:00
Lennart Poettering	5022f08a23	core,udev,networkd: add ConditionKernelVersion= This adds a simple condition/assert/match to the service manager, to udev's .link handling and to networkd, for matching the kernel version string. In this version we only do fnmatch() based globbing, but we might want to extend that to version comparisons later on, if we like, by slightly extending the syntax with ">=", "<=", ">", "<" and "==" expressions.	2017-12-26 17:39:44 +01:00
Chris Down	e16647c39d	condition: Create AssertControlGroupController (#7630 ) Up until now, the behaviour in systemd has (mostly) been to silently ignore failures to action unit directives that refer to an unavailble controller. The addition of AssertControlGroupController and its conditional counterpart allow explicit specification of the desired behaviour when such a situation occurs. As for how this can happen, it is possible that a particular controller is not available in the cgroup hierarchy. One possible reason for this is that, in the running kernel, the controller simply doesn't exist -- for example, the CPU controller in cgroup v2 has only recently been merged and was out of tree until then. Another possibility is that the controller exists, but has been forcibly disabled by `cgroup_disable=` on the kernel command line. In future this will also support whatever comes out of issue #7624, `DefaultXAccounting=never`, or similar.	2017-12-18 08:53:29 +01:00
Lennart Poettering	b238be1e0d	core: enable specifier expansion for What=/Where=/Type=/SourcePath= too Using specifiers in these settings isn't particularly useful by itself, but it unifies behaviour a bit. It's kinda surprising that What= in mount units resolves specifies, but Where= does not. Hence let's add that too. Also, it's surprising Where=/What= in mount units behaves differently than in automount and swap units, hence resolve specifiers there too. Then, Type= in mount units is nowadays an arbitrary, sometimes non-trivial string (think fuse!), hence let's also expand specifiers there, to match the rest of the mount settings. This has the benefit that when writing code that generates unit files, less care has to be taken to check whether escaping of specifiers is necessary or not: broadly everything that takes arbitrary user strings now does specifier expansion, while enums/numerics/booleans do not.	2017-11-29 12:32:57 +01:00
Lennart Poettering	613613f1ee	core: use config_parse_unit_string_printf() for decoding RebootArgument= All other cases where we accept a reboot argument are decoded with config_parse_unit_string_printf() rather than config_parse_unit_path_printf(), and that's really the only thing what makes sense here, hence adjust this here, too.	2017-11-29 12:32:56 +01:00
Zbigniew Jędrzejewski-Szmek	82a27ba821	Merge pull request #7389 from shawnl/warning tree-wide: adjust fall through comments so that gcc is happy	2017-11-22 07:38:51 +01:00
Shawn Landden	4831981d89	tree-wide: adjust fall through comments so that gcc is happy Distcc removes comments, making the comment silencing not work. I know there was a decision against a macro in commit `ec251fe7d5`	2017-11-20 13:06:25 -08:00
Lennart Poettering	e7dfbb4e74	core: introduce SuccessAction= as unit file property SuccessAction= is similar to FailureAction= but declares what to do on success of a unit, rather than on failure. This is useful for running commands in qemu/nspawn images, that shall power down on completion. We frequently see "ExecStopPost=/usr/bin/systemctl poweroff" or so in unit files like this. Offer a simple, more declarative alternative for this. While we are at it, hook up failure action with unit_dump() and transient units too.	2017-11-20 16:37:22 +01:00
Lennart Poettering	53c35a766f	core: generalize FailureAction= move it from service to unit All kinds of units can fail, hence it makes sense to offer this as generic concept for all unit types.	2017-11-20 16:37:22 +01:00
Lennart Poettering	08f3be7a38	core: add two new unit file settings: StandardInputData= + StandardInputText= Both permit configuring data to pass through STDIN to an invoked process. StandardInputText= accepts a line of text (possibly with embedded C-style escapes as well as unit specifiers), which is appended to the buffer to pass as stdin, followed by a single newline. StandardInputData= is similar, but accepts arbitrary base64 encoded data, and will not resolve specifiers or C-style escapes, nor append newlines. This may be used to pass input/configuration data to services, directly in-line from unit files, either in a cooked or in a more raw format.	2017-11-17 11:13:44 +01:00
Lennart Poettering	5afe510c89	core: add a new unit file setting CollectMode= for tweaking the GC logic Right now, the option only takes one of two possible values "inactive" or "inactive-or-failed", the former being the default, and exposing same behaviour as the status quo ante. If set to "inactive-or-failed" units may be collected by the GC logic when in the "failed" state too. This logic should be a nicer alternative to using the "-" modifier for ExecStart= and friends, as the exit data is collected and logged about and only removed when the GC comes along. This should be useful in particular for per-connection socket-activated services, as well as "systemd-run" command lines that shall leave no artifacts in the system. I was thinking about whether to expose this as a boolean, but opted for an enum instead, as I have the suspicion other tweaks like this might be a added later on, in which case we extend this setting instead of having to add yet another one. Also, let's add some documentation for the GC logic.	2017-11-16 14:38:36 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Lennart Poettering	0263828039	core: rework the Delegate= unit file setting to take a list of controller names Previously it was not possible to select which controllers to enable for a unit where Delegate=yes was set, as all controllers were enabled. With this change, this is made configurable, and thus delegation units can pick specifically what they want to manage themselves, and what they don't care about.	2017-11-13 10:49:15 +01:00
Yu Watanabe	c54515b1e4	core/load-fragment: add RemoveIPC= (#7288 ) PR #3865 introduced RemoveIPC= but the option is not listed in load-fragment-gperf.gperf. So, the option could be used only via d-bus. This adds RemoveIPC= in load-fragment-gperf.gperf. Then, now we can set the option in unit files. Fixes #7281.	2017-11-10 10:15:55 +01:00
Yu Watanabe	2bf13bd51e	core/load-fragment: fix alignment	2017-11-08 15:49:22 +09:00
Lennart Poettering	eae51da36e	unit: when JobTimeoutSec= is turned off, implicitly turn off JobRunningTimeoutSec= too We added JobRunningTimeoutSec= late, and Dracut configured only JobTimeoutSec= to turn of root device timeouts before. With this change we'll propagate a reset of JobTimeoutSec= into JobRunningTimeoutSec=, but only if the latter wasn't set explicitly. This should restore compatibility with older systemd versions. Fixes: #6402	2017-10-05 13:06:44 +02:00
Zbigniew Jędrzejewski-Szmek	f9fa32f09c	build-sys: s/HAVE_SMACK/ENABLE_SMACK/ Same justification as for HAVE_UTMP.	2017-10-04 12:09:50 +02:00
Daniel Mack	906c06f64a	cgroup, unit, fragment parser: make use of new firewall functions	2017-09-22 15:24:55 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
Lennart Poettering	00819cc151	core: add new UnsetEnvironment= setting for unit files With this setting we can explicitly unset specific variables for processes of a unit, as last step of assembling the environment block for them. This is useful to fix #6407. While we are at it, greatly expand the documentation on how the environment block for forked off processes is assembled.	2017-09-14 15:17:40 +02:00
Topi Miettinen	78e864e5b3	seccomp: LockPersonality boolean (#6193 ) Add LockPersonality boolean to allow locking down personality(2) system call so that the execution domain can't be changed. This may be useful to improve security because odd emulations may be poorly tested and source of vulnerabilities, while system services shouldn't need any weird personalities.	2017-08-29 15:54:50 +02:00
Zbigniew Jędrzejewski-Szmek	4bc5d27b94	Drop busname unit type Since busname units are only useful with kdbus, they weren't actively used. This was dead code, only compile-tested. If busname units are ever added back, it'll be cleaner to start from scratch (possibly reverting parts of this patch).	2017-07-23 09:29:02 -04:00
Yu Watanabe	8ae12e733c	core: fix typo (#6417 )	2017-07-21 10:36:39 +02:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Lennart Poettering	e758bc9132	Merge pull request #6387 from keszybz/fix-timeout-0 Fix x-systemd.timeout=0 in fstab	2017-07-18 00:04:24 +02:00
Zbigniew Jędrzejewski-Szmek	4a06cbf838	Use config_parse_sec_fix_0() also for JobRunningTimeoutSec `2d79a0bbb9` did that for TimeoutSec=, `89beff89ed` did that for JobTimeoutSec=, and `0004f698df` did that for x-systemd.device-timeout=. But after parsing x-systemd.device-timeout=xxx we write it out as JobRunningTimeoutSec=xxx. Two options: - write out JobRunningTimeoutSec=<a very big number>, - change JobRunningTimeoutSec= to behave like the other options. I think it would be confusing for JobRunningTimeoutSec= to have different syntax then TimeoutSec= and JobTimeoutSec=, so this patch implements the second option. Fixes #6264, https://bugzilla.redhat.com/show_bug.cgi?id=1462378.	2017-07-17 16:03:49 -04:00
Yu Watanabe	53f47dfc7b	core: allow preserving contents of RuntimeDirectory= over process restart This introduces RuntimeDirectoryPreserve= option which takes a boolean argument or 'restart'. Closes #6087.	2017-07-17 16:22:25 +09:00
Zbigniew Jędrzejewski-Szmek	2c75fb7330	core/load-fragment: refuse units with errors in RootDirectory/RootImage/DynamicUser Behaviour of the service is completely different with the option off, so the service would probably mess up state on disk and do unexpected things.	2017-07-11 13:38:13 -04:00
Lennart Poettering	defdbbb6dc	Merge pull request #5926 from fsateler/condition-uid core: add ConditionUID and ConditionGID	2017-05-29 15:18:38 +02:00
Felipe Sateler	c465a29f24	core: add ConditionUser and ConditionGroup This adds two options that are useful for user units. In particular, it is useful to check ConditionUser=!0 to not start for the root user. Closes: #5187	2017-05-26 09:42:44 -04:00
NeilBrown	2d79a0bbb9	Allow TimeoutSec=0 to work as documented in mount units and elsewhere (#6013 ) Since commit `36c16a7cdd` ("core: rework unit timeout handling, and add new setting RuntimeMaxSec=") TimeoutSec=0 in mount units has cause the mount to timeout immediately instead of never as documented. There is a similar problem with Socket.TimeoutSec and Swap.TimeoutSec. These are easily fixed using config_parse_sec_fix_0(). Automount.TimeoutIdleSec looks like it could have the same problem, but doesn't because the kernel treats '0' as 'no timeout'. It handle USEC_INFINITY correctly only because that constant has the value '-1', and when round up, it becomes zero. To avoid possible confusion, use config_parse_sec_fix_0() as well, and explicitly handle USEC_INFINITY.	2017-05-23 09:42:26 +02:00
Michal Koutný	a2df3ea4ae	job: add JobRunningTimeoutSec for JOB_RUNNING state Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts running (e.g. it groups all Exec* commands of a service or spans waiting for a device period.) Unit.JobRunningTimeoutSec is intended to be used by default instead of Unit.JobTimeoutSec for device units where such behavior causes less confusion (consider a job for a _netdev mount device, with this change the timeout will start ticking only after the network is ready).	2017-04-25 18:00:29 +02:00
Lennart Poettering	915e6d1676	core: add RootImage= setting for using a specific image file as root directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.	2017-02-07 12:19:42 +01:00
Lennart Poettering	5d997827e2	core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)	2017-02-07 11:22:05 +01:00
Lennart Poettering	d2d6c096f6	core: add ability to define arbitrary bind mounts for services This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439	2016-12-14 00:54:10 +01:00
Lennart Poettering	d107589cd2	core: turn on specifier expansion for more unit file settings Let's permit specifier expansion at a numbre of additional fields, where arbitrary strings might be passed where this might be useful one day. (Or at least where there's no clear reason where it wouldn't make sense to have.)	2016-12-07 18:47:32 +01:00
Djalal Harouni	d6299d613f	core:gperf: pass the exec_context struct directly to parse restrict namespaces The RestrictNamespaces= takes yes, no or a list of namespaces types, therefor config_parse_restrict_namespaces() is a bit complex and it operates on the ExecContext, fix this by passing the offset of ExecContext directly otherwise restricting namespaces won't work.	2016-11-15 15:04:43 +01:00
Lennart Poettering	add005357d	core: add new RestrictNamespaces= unit file setting This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.	2016-11-04 07:40:13 -06:00
Lukas Nykryn	87a47f99bc	failure-action: generalize failure action to emergency action	2016-10-21 15:13:50 +02:00
Luca Bruno	52c239d770	core/exec: add a named-descriptor option ("fd") for streams (#4179 ) This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.	2016-10-17 20:05:49 -04:00
Djalal Harouni	502d704e5e	core:sandbox: Add ProtectKernelModules= option This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.	2016-10-12 13:31:21 +02:00
Lennart Poettering	59eeb84ba6	core: add two new service settings ProtectKernelTunables= and ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.	2016-09-25 10:18:48 +02:00
Lennart Poettering	cf08b48642	core: introduce MemorySwapMax= (#3659 ) Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls controls "memory.swap.max" attribute in unified cgroup.	2016-08-31 12:28:54 +02:00
WaLyong Cho	96e131ea09	core: introduce MemorySwapMax= Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls controls "memory.swap.max" attribute in unified cgroup.	2016-08-30 11:11:45 +09:00
Barron Rulon	4f8d40a9dc	mount: add new ForceUnmount= setting for mount units, mapping to umount(8)'s "-f" switch	2016-08-27 10:46:52 -04:00
brulon	e520950a03	mount: add new LazyUnmount= setting for mount units, mapping to umount(8)'s "-l" switch (#3827 )	2016-08-26 17:57:22 +02:00
Zbigniew Jędrzejewski-Szmek	5f9a610ad2	Merge pull request #3905 from htejun/cgroup-v2-cpu core: add cgroup CPU controller support on the unified hierarchy (zj: merging not squashing to make it clear against which upstream this patch was developed.)	2016-08-14 18:03:35 -04:00
Tejun Heo	66ebf6c0a1	core: add cgroup CPU controller support on the unified hierarchy Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.	2016-08-07 09:45:39 -04:00
Lennart Poettering	d251207d55	core: add new PrivateUsers= option to service execution This setting adds minimal user namespacing support to a service. When set the invoked processes will run in their own user namespace. Only a trivial mapping will be set up: the root user/group is mapped to root, and the user/group of the service will be mapped to itself, everything else is mapped to nobody. If this setting is used the service runs with no capabilities on the host, but configurable capabilities within the service. This setting is particularly useful in conjunction with RootDirectory= as the need to synchronize /etc/passwd and /etc/group between the host and the service OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the user of the service itself. But even outside the RootDirectory= case this setting is useful to substantially reduce the attack surface of a service. Example command to test this: systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh This runs a shell as user "foobar". When typing "ps" only processes owned by "root", by "foobar", and by "nobody" should be visible.	2016-08-03 20:42:04 +02:00
Susant Sahani	9d56542764	socket: add support to control no. of connections from one source (#3607 ) Introduce MaxConnectionsPerSource= that is number of concurrent connections allowed per IP. RFE: 1939	2016-08-02 13:48:23 -04:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	66dccd8d85	core: be stricter when parsing User=/Group= fields Let's verify the validity of the syntax of the user/group names set.	2016-07-22 15:53:45 +02:00
Alessandro Puccetti	2a624c36e6	doc,core: Read{Write,Only}Paths= and InaccessiblePaths= This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`	2016-07-19 17:22:02 +02:00
Lennart Poettering	f4170c671b	execute: add a new easy-to-use RestrictRealtime= option to units It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and SCHED_DEADLINE is blocked, which my be used to lock up the system.	2016-06-23 01:45:45 +02:00
Topi Miettinen	f3e4363593	core: Restrict mmap and mprotect with PAGE_WRITE\|PAGE_EXEC (#3319 ) (#3379 ) New exec boolean MemoryDenyWriteExecute, when set, installs a seccomp filter to reject mmap(2) with PAGE_WRITE\|PAGE_EXEC and mprotect(2) with PAGE_EXEC.	2016-06-03 17:58:18 +02:00
Tejun Heo	da4d897e75	core: add cgroup memory controller support on the unified hierarchy (#3315 ) On the unified hierarchy, memory controller implements three control knobs - low, high and max which enables more useable and versatile control over memory usage. This patch implements support for the three control knobs. * MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and memory.max, respectively. * As all absolute limits on the unified hierarchy use "max" for no limit, make memory limit parse functions accept "max" in addition to "infinity" and document "max" for the new knobs. * Implement compatibility translation between MemoryMax and MemoryLimit. v2: - Fixed missing else's in config_parse_memory_limit(). - Fixed missing newline when writing out drop-ins. - Coding style updates to use "val > 0" instead of "val". - Minor updates to documentation.	2016-05-27 18:10:18 +02:00
Tejun Heo	ac06a0cf8a	core: add support for IOReadIOPSMax and IOWriteIOPSMax cgroup IO controller supports maximum limits for both bandwidth and IOPS but systemd resource control currently only supports bandwidth limits. This patch adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy is in use. It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy hierarchies but IO control on legacy hierarchies is half-broken anyway, so let's leave it alone for now.	2016-05-18 13:50:56 -07:00
Tejun Heo	13c31542cc	core: add io controller support on the unified hierarchy On the unified hierarchy, blkio controller is renamed to io and the interface is changed significantly. * blkio.weight and blkio.weight_device are consolidated into io.weight which uses the standardized weight range [1, 10000] with 100 as the default value. * blkio.throttle.{read\|write}_{bps\|iops}_device are consolidated into io.max. Expansion of throttling features is being worked on to support work-conserving absolute limits (io.low and io.high). * All stats are consolidated into io.stats. This patchset adds support for the new interface. As the interface has been revamped and new features are expected to be added, it seems best to treat it as a separate controller rather than trying to expand the blkio settings although we might add automatic translation if only blkio settings are specified. * io.weight handling is mostly identical to blkio.weight[_device] handling except that the weight range is different. * Both read and write bandwidth settings are consolidated into CGroupIODeviceLimit which describes all limits applicable to the device. This makes it less painful to add new limits. * "max" can be used to specify the maximum limit which is equivalent to no config for max limits and treated as such. If a given CGroupIODeviceLimit doesn't contain any non-default configs, the config struct is discarded once the no limit config is applied to cgroup. * lookup_blkio_device() is renamed to lookup_block_device(). Signed-off-by: Tejun Heo <htejun@fb.com>	2016-05-05 16:43:06 -04:00
Lennart Poettering	f0367da7d1	core: rename StartLimitInterval= to StartLimitIntervalSec= We generally follow the rule that for time settings we suffix the setting name with "Sec" to indicate the default unit if none is specified. The only exception was the rate limiting interval settings. Fix this, and keep the old names for compatibility. Do the same for journald's RateLimitInterval= setting	2016-04-29 16:27:48 +02:00
Lennart Poettering	7629ec4642	core: move start ratelimiting check after condition checks With #2564 unit start rate limiting was moved from after the condition checks are to before they are made, in an attempt to fix #2467. This however resulted in #2684. However, with a previous commit a concept of per socket unit trigger rate limiting has been added, to fix #2467 more comprehensively, hence the start limit can be moved after the condition checks again, thus fixing #2684. Fixes: #2684	2016-04-29 16:27:48 +02:00
Lennart Poettering	8b26cdbd2a	core: introduce activation rate limiting for socket units This adds two new settings TriggerLimitIntervalSec= and TriggerLimitBurst= that define a rate limit for activation of socket units. When the limit is hit, the socket is is put into a failure mode. This is an alternative fix for #2467, since the original fix resulted in issue #2684. In a later commit the StartLimitInterval=/StartLimitBurst= rate limiter will be changed to be applied after any start conditions checks are made. This way, there are two separate rate limiters enforced: one at triggering time, before any jobs are queued with this patch, as well as the start limit that is moved again to be run immediately before the unit is activated. Condition checks are done in between the two, and thus no longer affect the start limit.	2016-04-29 16:27:48 +02:00
Lennart Poettering	479050b363	core: drop Capabilities= setting The setting is hardly useful (since its effect is generally reduced to zero due to file system caps), and with the advent of ambient caps an actually useful replacement exists, hence let's get rid of this. I am pretty sure this was unused and our man page already recommended against its use, hence this should be a safe thing to remove.	2016-02-13 11:59:34 +01:00
Daniel Mack	9ca6ff50ab	Remove kdbus custom endpoint support This feature will not be used anytime soon, so remove a bit of cruft. The BusPolicy= config directive will stay around as compat noop.	2016-02-11 22:12:04 +01:00
Lennart Poettering	926db6521b	Merge pull request #2574 from zonque/netclass-remove cgroup: remove support for NetClass= directive	2016-02-10 17:03:00 +01:00
Daniel Mack	50f48ad37a	cgroup: remove support for NetClass= directive Support for net_cls.class_id through the NetClass= configuration directive has been added in v227 in preparation for a per-unit packet filter mechanism. However, it turns out the kernel people have decided to deprecate the net_cls and net_prio controllers in v2. Tejun provides a comprehensive justification for this in his commit, which has landed during the merge window for kernel v4.5: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671 As we're aiming for full support for the v2 cgroup hierarchy, we can no longer support this feature. Userspace tool such as nftables are moving over to setting rules that are specific to the full cgroup path of a task, which obsoletes these controllers anyway. This commit removes support for tweaking details in the net_cls controller, but keeps the NetClass= directive around for legacy compatibility reasons.	2016-02-10 16:38:56 +01:00
Lennart Poettering	89beff89ed	core: treat JobTimeout=0 as equivalent to JobTimeout=infinity Corrects an incompatibility introduced with `36c16a7cdd`. Fixes: #2537	2016-02-10 16:09:24 +01:00
Lennart Poettering	aad41f0814	core: simplify how we parse TimeoutSec=, TimeoutStartSec= and TimeoutStopSec= Let's make things more obvious by placing the parse_usec() invocation directly in config_parse_service_timeout().	2016-02-10 16:09:24 +01:00
Lennart Poettering	6bf0f408e4	core: make the StartLimitXYZ= settings generic and apply to any kind of unit, not just services This moves the StartLimitBurst=, StartLimitInterval=, StartLimitAction=, RebootArgument= from the [Service] section into the [Unit] section of unit files, and thus support it in all unit types, not just in services. This way we can enforce the start limit much earlier, in particular before testing the unit conditions, so that repeated start-up failure due to failed conditions is also considered for the start limit logic. For compatibility the four options may also be configured in the [Service] section still, but we only document them in their new section [Unit]. This also renamed the socket unit failure code "service-failed-permanent" into "service-start-limit-hit" to express more clearly what it is about, after all it's only triggered through the start limit being hit. Finally, the code in busname_trigger_notify() and socket_trigger_notify() is altered to become more alike. Fixes: #2467	2016-02-10 13:26:56 +01:00
Lennart Poettering	36c16a7cdd	core: rework unit timeout handling, and add new setting RuntimeMaxSec= This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code (following the logic that 0 means "no", and USEC_INFINITY means "never"). This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated, and sd-event considers it has indication for turning off the event source. This also alters the deserialization of the units to restart timeouts from the time they were originally started from. Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to artificially prolonged timeouts if a daemon reload took place. Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs. This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn() calls fail. Fixes: #2249	2016-02-01 22:18:16 +01:00
Lennart Poettering	d0a7c5f692	core: move parsing of rlimits into rlimit-util.[ch] This way we can reuse it for parsing rlimit settings in "systemctl set-property" and related commands.	2016-02-01 22:18:16 +01:00
Ismo Puustinen	755d4b67a4	capabilities: added support for ambient capabilities. This patch adds support for ambient capabilities in service files. The idea with ambient capabilities is that the execed processes can run with non-root user and get some inherited capabilities, without having any need to add the capabilities to the executable file. You need at least Linux 4.3 to use ambient capabilities. SecureBit keep-caps is automatically added when you use ambient capabilities and wish to change the user. An example system service file might look like this: [Unit] Description=Service for testing caps [Service] ExecStart=/usr/bin/sleep 10000 User=nobody AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW After starting the service it has these capabilities: CapInh: 0000000000003000 CapPrm: 0000000000003000 CapEff: 0000000000003000 CapBnd: 0000003fffffffff CapAmb: 0000000000003000	2016-01-12 12:14:50 +02:00
Ismo Puustinen	a103496ca5	capabilities: keep bounding set in non-inverted format. Change the capability bounding set parser and logic so that the bounding set is kept as a positive set internally. This means that the set reflects those capabilities that we want to keep instead of drop.	2016-01-12 12:14:50 +02:00
Zbigniew Jędrzejewski-Szmek	6f5d79986a	core: rename Random* to RandomizedDelay* The name RandomSec is too generic: "Sec" just specifies the default unit type, and "Random" by itself is not enough. Rename to something that should give the user general idea what the setting does without looking at documentation.	2015-11-26 16:32:41 -05:00
Lennart Poettering	744c769375	core: add new RandomSec= setting for time units This allows configuration of a random time on top of the elapse events, in order to spread time events in a network evenly across a range.	2015-11-18 17:07:11 +01:00
Lennart Poettering	edf1e71381	Merge pull request #1889 from ssahani/socket-proto socket: Add support for socket protcol	2015-11-18 11:30:06 +01:00
Susant Sahani	74bb646ee5	socket: Add support for socket protcol Now we don't support the socket protocol like sctp and udplite . This patch add a new config param SocketProtocol: udplite/sctp With this now we can configure the protocol as udplite = IPPROTO_UDPLITE sctp = IPPROTO_SCTP Tested with nspawn:	2015-11-18 09:34:18 +05:30
Lennart Poettering	3e0c30ac56	core: add RemainAfterElapse= setting to timer units Previously, after a timer unit elapsed we'd leave it around for good, which has the nice benefit that starting a timer that shall trigger at a specific point in time multiple times will only result in one trigger instead of possibly many. With this change a new option RemainAfterElapse= is added. It defaults to "true", to mimic the old behaviour. If set to "false" timer units will be unloaded after they elapsed. This is specifically useful for transient timer units.	2015-11-17 20:48:23 +01:00
Lennart Poettering	0af20ea2ee	core: add new DefaultTasksMax= setting for system.conf This allows initializing the TasksMax= setting of all units by default to some fixed value, instead of leaving it at infinity as before.	2015-11-13 19:50:52 +01:00
Lennart Poettering	f32b43bda4	core: remove support for RequiresOverridable= and RequisiteOverridable= As discussed at systemd.conf 2015 and on also raised on the ML: http://lists.freedesktop.org/archives/systemd-devel/2015-November/034880.html This removes the two XyzOverridable= unit dependencies, that were basically never used, and do not enhance user experience in any way. Most folks looking for the functionality this provides probably opt for the "ignore-dependencies" job mode, and that's probably a good idea. Hence, let's simplify systemd's dependency engine and remove these two dependency types (and their inverses). The unit file parser and the dbus property parser will now redirect the settings/properties to result in an equivalent non-overridable dependency. In the case of the unit file parser we generate a warning, to inform the user. The dbus properties for this unit type stay available on the unit objects, but they are now hidden from usual introspection and will always return the empty list when queried. This should provide enough compatibility for the few unit files that actually ever made use of this.	2015-11-12 19:27:24 +01:00

1 2 3 4 5 ...

279 commits