Systemd

Commit Graph

Author	SHA1	Message	Date
Lennart Poettering	065b47749d	tree-wide: use ERRNO_IS_PRIVILEGE() whereever appropriate	2020-09-22 16:25:22 +02:00
Topi Miettinen	9df2cdd8ec	exec: SystemCallLog= directive With new directive SystemCallLog= it's possible to list system calls to be logged. This can be used for auditing or temporarily when constructing system call filters. --- v5: drop intermediary, update HASHMAP_FOREACH_KEY() use v4: skip useless debug messages, actually parse directive v3: don't declare unused variables with old libseccomp v2: fix build without seccomp or old libseccomp	2020-09-15 12:54:17 +03:00
Topi Miettinen	005bfaf118	exec: Add kill action to system call filters Define explicit action "kill" for SystemCallErrorNumber=. In addition to errno code, allow specifying "kill" as action for SystemCallFilter=. --- v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes, init syscall_errno v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit parsing without seccomp v4: fix build without seccomp v3: drop log action v2: action -> number	2020-09-15 12:54:17 +03:00
Frantisek Sumsal	69e3234db7	tree-wide: fix typos found by codespell Reported by Fossies.org	2020-09-14 15:32:37 +02:00
Tobias Kaufmann	198dc17845	core: fix set keep caps for ambient capabilities The securebit keep-caps retains the capabilities in the permitted set over an UID change (ambient capabilities are cleared though). Setting the keep-caps securebit after the uid change and before execve doesn't make sense as it is cleared during execve and there is no additional user ID change after this point. Altough the documentation (man 7 capabilities) is ambigious, keep-caps is reset during execve although keep-caps-locked is set. After execve only keep-caps-locked is set and keep-caps is cleared.	2020-09-09 11:17:42 +02:00
Tobias Kaufmann	16fcb1918a	core: fix comments on ambient capabilities The comments on the code for ambient capabilities was wrong/outdated.	2020-09-09 11:17:42 +02:00
Lennart Poettering	f3f4abad29	Merge pull request #16979 from keszybz/return-log-debug Fix 'return log_error()' and 'return log_warning()' patterns	2020-09-08 19:54:38 +02:00
Lennart Poettering	c6552f7cd5	Merge pull request #16955 from keszybz/test-execute-cleanup One patch for test-execute and assorted cleanups	2020-09-08 18:33:12 +02:00
Zbigniew Jędrzejewski-Szmek	c413bb28df	tree-wide: correct cases where return log_{error,warning} is used without value In various cases, we would say 'return log_warning()' or 'return log_error()'. Those functions return 0 if no error is passed in. For log_warning or log_error this doesn't make sense, and we generally want to propagate the error. In the few cases where the error should be ignored, I think it's better to split it in two, and call 'return 0' on a separate line.	2020-09-08 17:40:46 +02:00
Zbigniew Jędrzejewski-Szmek	90e74a66e6	tree-wide: define iterator inside of the macro	2020-09-08 12:14:05 +02:00
Zbigniew Jędrzejewski-Szmek	5b10116e49	core/{execute, manager}: reduce scope of iterator variables a bit	2020-09-04 18:14:26 +02:00
Lennart Poettering	7cc60ea414	Merge pull request #16821 from cgzones/selinux_status selinux: use SELinux status page	2020-09-03 14:55:08 +02:00
Tobias Kaufmann	dbdc4098f6	core: fix securebits setting Desired functionality: Set securebits for services started as non-root user. Failure: The starting of the service fails if no ambient capability shall be raised. ... systemd[217941]: ...: Failed to set process secure bits: Operation not permitted ... systemd[217941]: ...: Failed at step SECUREBITS spawning /usr/bin/abc.service: Operation not permitted ... systemd[1]: abc.service: Failed with result 'exit-code'. Reason: For setting securebits the capability CAP_SETPCAP is required. However the securebits (if no ambient capability shall be raised) are set after setresuid. When setresuid is invoked all capabilities are dropped from the permitted, effective and ambient capability set. If the securebit SECBIT_KEEP_CAPS is set the permitted capability set is retained, but the effective and the ambient set are cleared. If ambient capabilities shall be set, the securebit SECBIT_KEEP_CAPS is added to the securebits configured in the service file and set together with the securebits from the service file before setresuid is executed (in enforce_user). Before setresuid is executed the capabilities are the same as for pid1. This means that all capabilities in the effective, permitted and bounding set are set. Thus the capability CAP_SETPCAP is in the effective set and the prctl(PR_SET_SECUREBITS, ...) succeeds. However, if the secure bits aren't set before setresuid is invoked they shall be set shortly after the uid change in enforce_user. This fails as SECBIT_KEEP_CAPS wasn't set before setresuid and in consequence the effective and permitted set was cleared, hence CAP_SETPCAP is not set in the effective set (and cannot be raised any longer) and prctl(PR_SET_SECUREBITS, ...) failes with EPERM. Proposed solution: The proposed solution consists of three parts 1. Check in enforce_user, if securebits are configured in the service file. If securebits are configured, set SECBIT_KEEP_CAPS before invoking setresuid. 2. Don't set any other securebits than SECBIT_KEEP_CAPS in enforce_user, but set all requested ones after enforce_user. This has the advantage that securebits are set at the same place for root and non-root services. 3. Raise CAP_SETPCAP to the effective set (if not already set) before setting the securebits to avoid EPERM during the prctl syscall. For gaining CAP_SETPCAP the function capability_bounding_set_drop is splitted into two functions: - The first one raises CAP_SETPCAP (required for dropping bounding capabilities) - The second drops the bounding capabilities Why are ambient capabilities not affected by this change? Ambient capabilities get cleared during setresuid, no matter if SECBIT_KEEP_CAPS is set or not. For raising ambient capabilities for a user different to root, the requested capability has to be raised in the inheritable set first. Then the SECBIT_KEEP_CAPS securebit needs to be set before setresuid is invoked. Afterwards the ambient capability can be raised, because it is in the inheritable and permitted set. Security considerations: Although the manpage is ambiguous SECBIT_KEEP_CAPS is cleared during execve no matter if SECBIT_KEEP_CAPS_LOCKED is set or not. If both are set only SECBIT_KEEP_CAPS_LOCKED is set after execve. Setting SECBIT_KEEP_CAPS in enforce_user for being able to set securebits is no security risk, as the effective and permitted set are set to the value of the ambient set during execve (if the executed file has no file capabilities. For details check man 7 capabilities). Remark: In capability-util.c is a comment complaining about the missing capability CAP_SETPCAP in the effective set, after the kernel executed /sbin/init. Thus it is checked there if this capability has to be raised in the effective set before dropping capabilities from the bounding set. If this were true all the time, ambient capabilities couldn't be set without dropping at least one capability from the bounding set, as the capability CAP_SETPCAP would miss and setting SECBIT_KEEP_CAPS would fail with EPERM.	2020-09-01 10:53:26 +02:00
Lennart Poettering	b519529104	Merge pull request #16841 from keszybz/acl-util-bitmask Use a bitmask in fd_add_uid_acl_permission()	2020-08-31 16:45:13 +02:00
Yu Watanabe	8062e643e6	core: clear bind mounts on error Follow-up for `bbb4e7f39f`. Fixes CID#1431998.	2020-08-27 18:20:34 +09:00
Christian Göttsche	2df2152c20	selinux: fork label-aware children with up-to-date label database The parent process may not perform any label operation, so the database might not get updated on a SELinux policy change on its own. Reload the label database once on a policy change, instead of n times in every started child.	2020-08-27 10:28:53 +02:00
Zbigniew Jędrzejewski-Szmek	567aeb5801	shared/acl-util: convert rd,wr,ex to a bitmask I find this version much more readable. Add replacement defines so that when acl/libacl.h is not available, the ACL_{READ,WRITE,EXECUTE} constants are also defined. Those constants were declared in the kernel headers already in 1da177e4c3f41524e886b7f1b8a0c1f, so they should be the same pretty much everywhere.	2020-08-27 10:20:12 +02:00
PhoenixDiscord	e8607daf7d	Replace gendered pronouns with gender neutral ones. (#16844 )	2020-08-27 11:52:48 +09:00
Lennart Poettering	bbb4e7f39f	core: hide /run/credentials whenever namespacing is requested Ideally we would like to hide all other service's credentials for all services. That would imply for us to enable mount namespacing for all services, which is something we cannot do, both due to compatibility with the status quo ante, and because a number of services legitimately should be able to install mounts in the host hierarchy. Hence we do the second best thing, we hide the credentials automatically for all services that opt into mount namespacing otherwise. This is quite different from other mount sandboxing options: usually you have to explicitly opt into each. However, given that the credentials logic is a brand new concept we invented right here and now, and particularly security sensitive it's OK to reverse this, and by default hide credentials whenever we can (i.e. whenever mount namespacing is otherwise opt-ed in to). Long story short: if you want to hide other service's credentials, the most basic options is to just turn on PrivateMounts= and there you go, they should all be gone.	2020-08-25 19:45:38 +02:00
Lennart Poettering	bb0c0d6f29	core: add credentials logic Fixes: #15778 #16060	2020-08-25 19:45:35 +02:00
Lennart Poettering	4e39995371	core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options Kernel 5.8 gained a hidepid= implementation that is truly per procfs, which allows us to mount a distinct once into every unit, with individual hidepid= settings. Let's expose this via two new settings: ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=). Replaces: #11670	2020-08-24 20:11:02 +02:00
Lennart Poettering	52b3d6523f	namespace: move protect_{home\|system} into NamespaceInfo it's not entirely clear what shall be passed via parameter and what via struct, but these two definitely fit well with the other protect_xyz fields, hence let's move them over. We probably should move a lot more more fields into the structure actuall (most? all even?).	2020-08-24 20:10:30 +02:00
Luca Boccassi	427353f668	core: add mount options support for MountImages Follow the same model established for RootImage and RootImageOptions, and allow to either append a single list of options or tuples of partition_number:options.	2020-08-20 14:45:40 +01:00
Luca Boccassi	9ece644435	core: change RootImageOptions to use names instead of partition numbers Follow the designations from the Discoverable Partitions Specification	2020-08-20 13:58:02 +01:00
Luca Boccassi	b3d133148e	core: new feature MountImages Follows the same pattern and features as RootImage, but allows an arbitrary mount point under / to be specified by the user, and multiple values - like BindPaths. Original implementation by @topimiettinen at: https://github.com/systemd/systemd/pull/14451 Reworked to use dissect's logic instead of bare libmount() calls and other review comments. Thanks Topi for the initial work to come up with and implement this useful feature.	2020-08-05 21:34:55 +01:00
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00
Lennart Poettering	c3f8a065e9	execute: take ownership of more fields in ExecParameters Let's simplify things a bit, and take ownership of more fields in ExecParameters, so that they are automatically freed when the structure is released.	2020-07-23 08:37:21 +02:00
Lennart Poettering	f63ef93703	execute: fix if check Fixes: coverity 1430459	2020-07-16 08:35:18 +02:00
Lennart Poettering	8d5bb13d78	core: fix invalid assertion We miscounted here, and would hit an assert once too early.	2020-07-16 09:13:04 +09:00
Zbigniew Jędrzejewski-Szmek	56a13a495c	pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only Read-only /var/tmp is more likely, because it's backed by a real device. /tmp is (by default) backed by tmpfs, but it doesn't have to be. In both cases the same consideration applies. If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail because we cannot create the subdir under /var/tmp to mount the private directory. But many services actually don't require /var/tmp (either because they only use it occasionally, or because they only use /tmp, or even because they don't use the temporary directories at all, and PrivateTmp=yes is used to isolate them from the rest of the system). To handle both cases let's create a read-only directory under /run/systemd and mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into dumping too much data in /run.) $ sudo systemd-run -t -p PrivateTmp=yes bash Running as unit: run-u14.service Press ^] three times within 1s to disconnect TTY. [root@workstation /]# ls -l /tmp/ total 0 [root@workstation /]# ls -l /var/tmp/ total 0 [root@workstation /]# touch /tmp/f [root@workstation /]# touch /var/tmp/f touch: cannot touch '/var/tmp/f': Read-only file system This commit has more changes than I like to put in one commit, but it's touching all the same paths so it's hard to split. exec_runtime_make() was using the wrong cleanup function, so the directory would be left behind on error.	2020-07-14 19:47:15 +02:00
Zbigniew Jędrzejewski-Szmek	37b22b3b47	tree: wide "the the" and other trivial grammar fixes	2020-07-02 09:51:38 +02:00
Luca Boccassi	d4d55b0d13	core: add RootHashSignature service parameter Allow to explicitly pass root hash signature as a unit option. Takes precedence over implicit checks.	2020-06-25 08:45:21 +01:00
Lennart Poettering	6b000af4f2	tree-wide: avoid some loaded terms https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.	2020-06-25 09:00:19 +02:00
Luca Boccassi	0389f4fa81	core: add RootHash and RootVerity service parameters Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.	2020-06-23 10:50:09 +02:00
Lennart Poettering	f3dc6af20f	core: automatically update StandardOuput=syslog to =journal (and similar for StandardError=) Let's go one step further and upgrade implicitly. Usually =syslog assignments are historic artifacts only. Let's upgrade the lines automatically, and politely suggest people update their unit files/configuration (and drop the lines altogether, without replacement). Fixes: #15807	2020-05-15 00:05:46 +02:00
Zbigniew Jędrzejewski-Szmek	8f3e342fa9	core: fix unused variable warning when !HAVE_SECCOMP	2020-04-23 14:42:09 +02:00
Lennart Poettering	e8cf09b2a2	core: make sure we don't get confused when setting TERM for a tty fd Fixes: #15344	2020-04-22 22:59:41 +02:00
Frantisek Sumsal	86b52a3958	tree-wide: fix spelling errors Based on a report from Fossies.org using Codespell. Followup to #15436	2020-04-21 23:21:08 +02:00
Lennart Poettering	daf8f72b4e	core: make sure ProtectHostname= is handled gracefully in containers lacking seccomp Fixes: #15408	2020-04-13 17:32:27 +02:00
Zbigniew Jędrzejewski-Szmek	ad21e542b2	manager: add CoredumpFilter= setting Fixes #6685.	2020-04-09 14:08:48 +02:00
Michal Sekletár	e2b2fb7f56	core: add support for setting CPUAffinity= to special "numa" value systemd will automatically derive CPU affinity mask from NUMA node mask. Fixes #13248	2020-03-16 08:57:28 +01:00
Topi Miettinen	efa2f3a18b	execute: don't create /tmp and /var/tmp if both are inaccessible If both /tmp and either /var/tmp or whole /var are inaccessible, there's no need to create the temporary directories.	2020-03-10 16:51:29 +02:00
Zbigniew Jędrzejewski-Szmek	908055f61f	Merge pull request #15033 from yuwata/state-directory-migrate-issue execute: Fix migration from DynamicUser=yes to no	2020-03-09 17:34:55 +01:00
Yu Watanabe	578dc69f2a	execute: Fix migration from DynamicUser=yes to no Closes #12131.	2020-03-06 21:02:26 +09:00
Zbigniew Jędrzejewski-Szmek	44e5d00603	pid1: remove unnecessary terminator We specify the number of items as the first argument already.	2020-03-05 08:13:49 +01:00
Yu Watanabe	dd0395b565	make namespace_flags_to_string() not return empty string This improves the following debug log. Before: systemd[1162]: Restricting namespace to: . After: systemd[1162]: Restricting namespace to: n/a.	2020-03-03 21:17:38 +01:00
Zbigniew Jędrzejewski-Szmek	86fca584c3	core/execute: use return value from sockaddr_un_set_path(), remove duplicate check	2020-03-02 15:56:30 +01:00
Zbigniew Jędrzejewski-Szmek	f36a9d5909	tree-wide: use the return value from sockaddr_un_set_path() It fully initializes the address structure, so no need for pre-initialization, and also returns the length of the address, so no need to recalculate using SOCKADDR_UN_LEN(). socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but seems cleaner and more portable to not assume anything about the type.)	2020-03-02 15:55:44 +01:00
Zbigniew Jędrzejewski-Szmek	ee00d1e95e	pid1: do not fail if we get EPERM while setting up network name In a user namespace container: Feb 28 12:45:53 0b2420135953 systemd[1]: Starting Home Manager... Feb 28 12:45:53 0b2420135953 systemd[21]: systemd-homed.service: Failed to set up network namespacing: Operation not permitted Feb 28 12:45:53 0b2420135953 systemd[21]: systemd-homed.service: Failed at step NETWORK spawning /usr/lib/systemd/systemd-homed: Operation not permitted Feb 28 12:45:53 0b2420135953 systemd[1]: systemd-homed.service: Main process exited, code=exited, status=225/NETWORK Feb 28 12:45:53 0b2420135953 systemd[1]: systemd-homed.service: Failed with result 'exit-code'. Feb 28 12:45:53 0b2420135953 systemd[1]: Failed to start Home Manager. We should treat this similarly to the case where network namespace are not supported at all. https://bugzilla.redhat.com/show_bug.cgi?id=1807465	2020-02-29 19:33:19 +09:00
Nate Jones	ecf63c9102	execute: Make '+' exec prefix ignore PrivateTmp=yes The man pages state that the '+' prefix in Exec* directives should ignore filesystem namespacing options such as PrivateTmp. Now it does. This is very similar to #8842, just with PrivateTmp instead of PrivateDevices.	2020-02-29 19:32:01 +09:00

1 2 3 4 5 ...

667 Commits