Systemd

Author	SHA1	Message	Date
Lennart Poettering	0133d5553a	Merge pull request #7198 from poettering/stdin-stdout Add StandardInput=data, StandardInput=file:... and more	2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	2038c3f584	core: add support for StandardInputFile= and friends These new settings permit specifiying arbitrary paths as stdin/stdout/stderr locations. We try to open/create them as necessary. Some special magic is applied: 1) if the same path is specified for both input and output/stderr, we'll open it only once O_RDWR, and duplicate them fd instead. 2) If we an AF_UNIX socket path is specified, we'll connect() to it, rather than open() it. This allows invoking systemd services with stdin/stdout/stderr connected to arbitrary foreign service sockets. Fixes: #3991	2017-11-17 11:13:44 +01:00
Lennart Poettering	08f3be7a38	core: add two new unit file settings: StandardInputData= + StandardInputText= Both permit configuring data to pass through STDIN to an invoked process. StandardInputText= accepts a line of text (possibly with embedded C-style escapes as well as unit specifiers), which is appended to the buffer to pass as stdin, followed by a single newline. StandardInputData= is similar, but accepts arbitrary base64 encoded data, and will not resolve specifiers or C-style escapes, nor append newlines. This may be used to pass input/configuration data to services, directly in-line from unit files, either in a cooked or in a more raw format.	2017-11-17 11:13:44 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Yu Watanabe	8cfa775f4f	core: add support to specify errno in SystemCallFilter= This makes each system call in SystemCallFilter= blacklist optionally takes errno name or number after a colon. The errno takes precedence over the one given by SystemCallErrorNumber=. C.f. #7173. Closes #7169.	2017-11-11 21:54:12 +09:00
Lennart Poettering	72fd17682d	core: usually our enum's _INVALID and _MAX special values are named after the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.	2017-10-02 17:41:43 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
Lennart Poettering	00819cc151	core: add new UnsetEnvironment= setting for unit files With this setting we can explicitly unset specific variables for processes of a unit, as last step of assembling the environment block for them. This is useful to fix #6407. While we are at it, greatly expand the documentation on how the environment block for forked off processes is assembled.	2017-09-14 15:17:40 +02:00
Topi Miettinen	78e864e5b3	seccomp: LockPersonality boolean (#6193 ) Add LockPersonality boolean to allow locking down personality(2) system call so that the execution domain can't be changed. This may be useful to improve security because odd emulations may be poorly tested and source of vulnerabilities, while system services shouldn't need any weird personalities.	2017-08-29 15:54:50 +02:00
Lennart Poettering	165a31c0db	core: add two new special ExecStart= character prefixes This patch adds two new special character prefixes to ExecStart= and friends, in addition to the existing "-", "@" and "+": "!" → much like "+", except with a much reduced effect as it only disables the actual setresuid()/setresgid()/setgroups() calls, but leaves all other security features on, including namespace options. This is very useful in combination with RuntimeDirectory= or DynamicUser= and similar option, as a user is still allocated and used for the runtime directory, but the actual UID/GID dropping is left to the daemon process itself. This should make RuntimeDirectory= a lot more useful for daemons which insist on doing their own privilege dropping. "!!" → Similar to "!", but on systems supporting ambient caps this becomes a NOP. This makes it relatively straightforward to write unit files that make use of ambient capabilities to let systemd drop all privs while retaining compatibility with systems that lack ambient caps, where priv dropping is the left to the daemon codes themselves. This is an alternative approach to #6564 and related PRs.	2017-08-10 15:04:32 +02:00
Lennart Poettering	1703fa41a7	core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING "Permissions" was a bit of a misnomer, as it suggests that UNIX file permission bits are adjusted, which aren't really changed here. Instead, this is about UNIX credentials such as users or groups, as well as namespacing, hence let's use a more generic term here, without any misleading reference to UNIX file permissions: "sandboxing", which shall refer to all kinds of sandboxing technologies, including UID/GID dropping, selinux relabelling, namespacing, seccomp, and so on.	2017-08-10 15:02:50 +02:00
Lennart Poettering	584b8688d1	execute: also fold the cgroup delegate bit into ExecFlags	2017-08-10 15:02:50 +02:00
Lennart Poettering	ac6479781e	execute: also control the SYSTEMD_NSS_BYPASS_BUS through an ExecFlags field Also, correct the logic while we are at it: the variable is only required for system services, not user services.	2017-08-10 15:02:49 +02:00
Lennart Poettering	8679efde21	execute: add one more ExecFlags flag, for controlling unconditional directory chowning Let's decouple the Manager object from the execution logic a bit more here too, and simply pass along the fact whether we should unconditionally chown the runtime/... directories via the ExecFlags field too.	2017-08-10 14:44:58 +02:00
Lennart Poettering	af635cf377	execute: let's decouple execute.c a bit from the unit logic Let's try to decouple the execution engine a bit from the Unit/Manager concept, and hence pass one more flag as part of the ExecParameters flags field.	2017-08-10 14:44:58 +02:00
Lennart Poettering	3ed0cd26ea	execute: replace command flag bools by a flags field This way, we can extend it later on in an easier way, and can pass it along nicely.	2017-08-10 14:44:58 +02:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Yu Watanabe	53f47dfc7b	core: allow preserving contents of RuntimeDirectory= over process restart This introduces RuntimeDirectoryPreserve= option which takes a boolean argument or 'restart'. Closes #6087.	2017-07-17 16:22:25 +09:00
Lennart Poettering	7f452159b8	core: make IOSchedulingClass= and IOSchedulingPriority= settable for transient units This patch is a bit more complex thant I hoped. In particular the single IOScheduling= property exposed on the bus is split up into IOSchedulingClass= and IOSchedulingPriority= (though compat is retained). Otherwise the asymmetry between setting props and getting them is a bit too nasty. Fixes #5613	2017-06-26 17:43:18 +02:00
Franck Bui	4c47affcf1	core: remove the redundancy of 'n_fds' and 'n_storage_fds' in ExecParameters struct 'n_fds' field in the ExecParameters structure was counting the total number of file descriptors to be passed to a unit. This counter also includes the number of passed socket fds which is counted by 'n_socket_fds' already. This patch removes that redundancy by replacing 'n_fds' with 'n_storage_fds'. The new field only counts the fds passed via the storage store mechanism. That way each fd is counted at one place only. Subsequently the patch makes sure to fix code that used 'n_fds' and also wanted to iterate through all of them by explicitly adding 'n_socket_fds' + 'n_storage_fds'. Suggested by Lennart.	2017-06-08 16:21:35 +02:00
Franck Bui	9b1419111a	core: only apply NonBlocking= to fds passed via socket activation Make sure to only apply the O_NONBLOCK flag to the fds passed via socket activation. Previously the flag was also applied to the fds which came from the fd store but this was incorrect since services, after being restarted, expect that these passed fds have their flags unchanged and can be reused as before. The documentation was a bit unclear about this so clarify it.	2017-06-06 22:42:50 +02:00
Lennart Poettering	915e6d1676	core: add RootImage= setting for using a specific image file as root directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.	2017-02-07 12:19:42 +01:00
Lennart Poettering	5d997827e2	core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)	2017-02-07 11:22:05 +01:00
Zbigniew Jędrzejewski-Szmek	4014818d53	Merge pull request #4806 from poettering/keyring-init set up a per-service session kernel keyring, and store the invocation ID in it	2016-12-13 23:24:42 -05:00
Lennart Poettering	d2d6c096f6	core: add ability to define arbitrary bind mounts for services This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439	2016-12-14 00:54:10 +01:00
Lennart Poettering	74dd6b515f	core: run each system service with a fresh session keyring This patch ensures that each system service gets its own session kernel keyring automatically, and implicitly. Without this a keyring is allocated for it on-demand, but is then linked with the user's kernel keyring, which is OK behaviour for logged in users, but not so much for system services. With this change each service gets a session keyring that is specific to the service and ceases to exist when the service is shut down. The session keyring is not linked up with the user keyring and keys hence only search within the session boundaries by default. (This is useful in a later commit to store per-service material in the keyring, for example the invocation ID) (With input from David Howells)	2016-12-13 20:59:10 +01:00
Lennart Poettering	2e6dbc0fcd	Merge pull request #4538 from fbuihuu/confirm-spawn-fixes Confirm spawn fixes/enhancements	2016-11-18 11:08:06 +01:00
Franck Bui	7d5ceb6416	core: allow to redirect confirmation messages to a different console It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.	2016-11-17 18:16:16 +01:00
Djalal Harouni	c92e8afebd	core: improve the logic that implies no new privileges The no_new_privileged_set variable is not used any more since commit `9b232d3241` that fixed another thing. So remove it. Also no need to check if we are under user manager, remove that part too.	2016-11-15 15:04:31 +01:00
Lennart Poettering	add005357d	core: add new RestrictNamespaces= unit file setting This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.	2016-11-04 07:40:13 -06:00
Luca Bruno	52c239d770	core/exec: add a named-descriptor option ("fd") for streams (#4179 ) This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.	2016-10-17 20:05:49 -04:00
Djalal Harouni	502d704e5e	core:sandbox: Add ProtectKernelModules= option This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.	2016-10-12 13:31:21 +02:00
Lennart Poettering	59eeb84ba6	core: add two new service settings ProtectKernelTunables= and ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.	2016-09-25 10:18:48 +02:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Zbigniew Jędrzejewski-Szmek	d87a2ef782	Merge pull request #3884 from poettering/private-users	2016-08-06 17:04:45 -04:00
Lennart Poettering	b08af3b127	core: only set the watchdog variables in ExecStart= lines	2016-08-04 23:08:05 +02:00
Lennart Poettering	136dc4c435	core: set $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS in ExecStop=/ExecStopPost= commands This should simplify monitoring tools for services, by passing the most basic information about service result/exit information via environment variables, thus making it unnecessary to retrieve them explicitly via the bus.	2016-08-04 23:08:05 +02:00
Lennart Poettering	9c1a61adba	core: move masking of chroot/permission masking into service_spawn() Let's fix up the flags fields in service_spawn() rather than its callers, in order to simplify things a bit.	2016-08-04 16:27:07 +02:00
Lennart Poettering	c39f1ce24d	core: turn various execution flags into a proper flags parameter The ExecParameters structure contains a number of bit-flags, that were so far exposed as bool:1, change this to a proper, single binary bit flag field. This makes things a bit more expressive, and is helpful as we add more flags, since these booleans are passed around in various callers, for example service_spawn(), whose signature can be made much shorter now. Not all bit booleans from ExecParameters are moved into the flags field for now, but this can be added later.	2016-08-04 16:27:07 +02:00
Lennart Poettering	d251207d55	core: add new PrivateUsers= option to service execution This setting adds minimal user namespacing support to a service. When set the invoked processes will run in their own user namespace. Only a trivial mapping will be set up: the root user/group is mapped to root, and the user/group of the service will be mapped to itself, everything else is mapped to nobody. If this setting is used the service runs with no capabilities on the host, but configurable capabilities within the service. This setting is particularly useful in conjunction with RootDirectory= as the need to synchronize /etc/passwd and /etc/group between the host and the service OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the user of the service itself. But even outside the RootDirectory= case this setting is useful to substantially reduce the attack surface of a service. Example command to test this: systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh This runs a shell as user "foobar". When typing "ps" only processes owned by "root", by "foobar", and by "nobody" should be visible.	2016-08-03 20:42:04 +02:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	9ce9347880	core: normalize header inclusion in execute.h a bit We don't actually need any functionality from cgroup.h in execute.h, hence don't include that. However, we do need the Unit structure from unit.h, hence include that, and move it as late as possible, since it needs the definitions from execute.h.	2016-07-20 14:53:15 +02:00
Alessandro Puccetti	2a624c36e6	doc,core: Read{Write,Only}Paths= and InaccessiblePaths= This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`	2016-07-19 17:22:02 +02:00
Torstein Husebø	61233823aa	treewide: fix typos and remove accidental repetition of words	2016-07-11 16:18:43 +02:00
Lennart Poettering	f4170c671b	execute: add a new easy-to-use RestrictRealtime= option to units It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and SCHED_DEADLINE is blocked, which my be used to lock up the system.	2016-06-23 01:45:45 +02:00
Alessandro Puccetti	cf677fe686	core/execute: add the magic character '!' to allow privileged execution (#3493 ) This patch implements the new magic character '!'. By putting '!' in front of a command, systemd executes it with full privileges ignoring paramters such as User, Group, SupplementaryGroups, CapabilityBoundingSet, AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext, AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies. Fixes partially https://github.com/systemd/systemd/issues/3414 Related to https://github.com/coreos/rkt/issues/2482 Testing: 1. Create a user 'bob' 2. Create the unit file /etc/systemd/system/exec-perm.service (You can use the example below) 3. sudo systemctl start ext-perm.service 4. Verify that the commands starting with '!' were not executed as bob, 4.1 Looking to the output of ls -l /tmp/exec-perm 4.2 Each file contains the result of the id command. ````````````````````````````````````````````````````````````````` [Unit] Description=ext-perm [Service] Type=oneshot TimeoutStartSec=0 User=bob ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ; /usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre" ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ; !/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2" ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post" ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload" ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop" ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post" [Install] WantedBy=multi-user.target] `````````````````````````````````````````````````````````````````	2016-06-10 18:19:54 +02:00
Topi Miettinen	f3e4363593	core: Restrict mmap and mprotect with PAGE_WRITE\|PAGE_EXEC (#3319 ) (#3379 ) New exec boolean MemoryDenyWriteExecute, when set, installs a seccomp filter to reject mmap(2) with PAGE_WRITE\|PAGE_EXEC and mprotect(2) with PAGE_EXEC.	2016-06-03 17:58:18 +02:00
Lennart Poettering	479050b363	core: drop Capabilities= setting The setting is hardly useful (since its effect is generally reduced to zero due to file system caps), and with the advent of ambient caps an actually useful replacement exists, hence let's get rid of this. I am pretty sure this was unused and our man page already recommended against its use, hence this should be a safe thing to remove.	2016-02-13 11:59:34 +01:00
Daniel Mack	9ca6ff50ab	Remove kdbus custom endpoint support This feature will not be used anytime soon, so remove a bit of cruft. The BusPolicy= config directive will stay around as compat noop.	2016-02-11 22:12:04 +01:00

1 2 3

111 commits