Systemd

Author	SHA1	Message	Date
Lennart Poettering	6dd16814a5	Merge pull request #17079 from keszybz/late-exec-resolution Resolve executable paths before execution, use fexecve()	2020-12-03 14:58:20 +01:00
Lennart Poettering	986311c2da	fileio: teach read_full_file_full() to read from offset/with maximum size	2020-12-01 14:17:47 +01:00
Yu Watanabe	d85ff94477	core: use SYNTHETIC_ERRNO() macro	2020-11-27 14:35:20 +09:00
Yu Watanabe	db9ecf0501	license: LGPL-2.1+ -> LGPL-2.1-or-later	2020-11-09 13:23:58 +09:00
Zbigniew Jędrzejewski-Szmek	a6d9111c67	core/execute: fall back to execve() for scripts fexecve() fails with ENOENT and we need a fallback. Add appropriate test.	2020-11-06 15:14:13 +01:00
Zbigniew Jędrzejewski-Szmek	b83d505087	core: use fexecve() to spawn children We base the smack/selinux setup on the executable. Let's open the file once and use the same fd for that setup and the subsequent execve.	2020-11-06 15:13:01 +01:00
Zbigniew Jędrzejewski-Szmek	5ca9139ace	basic/path-util: let find_executable_full() optionally return an fd	2020-11-06 15:12:54 +01:00
Lennart Poettering	d3dcf4e3b9	fileio: beef up READ_FULL_FILE_CONNECT_SOCKET to allow setting sender socket name This beefs up the READ_FULL_FILE_CONNECT_SOCKET logic of read_full_file_full() a bit: when used a sender socket name may be specified. If specified as NULL behaviour is as before: the client socket name is picked by the kernel. But if specified as non-NULL the client can pick a socket name to use when connecting. This is useful to communicate a minimal amount of metainformation from client to server, outside of the transport payload. Specifically, these beefs up the service credential logic to pass an abstract AF_UNIX socket name as client socket name when connecting via READ_FULL_FILE_CONNECT_SOCKET, that includes the requesting unit name and the eventual credential name. This allows servers implementing the trivial credential socket logic to distinguish clients: via a simple getpeername() it can be determined which unit is requesting a credential, and which credential specifically. Example: with this patch in place, in a unit file "waldo.service" a configuration line like the following: LoadCredential=foo:/run/quux/creds.sock will result in a connection to the AF_UNIX socket /run/quux/creds.sock, originating from an abstract namespace AF_UNIX socket: @$RANDOM/unit/waldo.service/foo (The $RANDOM is replaced by some randomized string. This is included in the socket name order to avoid namespace squatting issues: the abstract socket namespace is open to unprivileged users after all, and care needs to be taken not to use guessable names) The services listening on the /run/quux/creds.sock socket may thus easily retrieve the name of the unit the credential is requested for plus the credential name, via a simpler getpeername(), discarding the random preifx and the /unit/ string. This logic uses "/" as separator between the fields, since both unit names and credential names appear in the file system, and thus are designed to use "/" as outer separators. Given that it's a good safe choice to use as separators here, too avoid any conflicts. This is a minimal patch only: the new logic is used only for the unit file credential logic. For other places where we use READ_FULL_FILE_CONNECT_SOCKET it is probably a good idea to use this scheme too, but this should be done carefully in later patches, since the socket names become API that way, and we should determine the right amount of info to pass over.	2020-11-03 09:48:04 +01:00
Zbigniew Jędrzejewski-Szmek	1da37e58ff	core/execute: refactor creation of array with fds to keep during execution We close fds in two phases, first some and then the some more. When passing a list of fds to exclude from closing to the closing function, we would pass some in an array and the rest as separate arguments. For the fds which should be excluded in both closing phases, let's always create the array and put the relevant fds there. This has the advantage that if more fds to exclude in both phases are added later, we don't need to add more positional arguments. The list passed to setup_pam() is not changed. I think we could pass more fds to close there, but I'm leaving that unchanged. The setting of FD_CLOEXEC on an already open fds is dropped. The fd is opened in service_allocate_exec_fd() and there is no reason to suspect that it might have been opened incorrectly. If some rogue code is unsetting our FD_CLOEXEC bits, then it might flip any fd, no reason to single this one out.	2020-10-14 18:29:25 +02:00
Lennart Poettering	74aaf59b1a	execute: make sure some more functions follow coding style Initialize all return values on success, as our usual coding style suggests.	2020-10-14 16:41:37 +02:00
Lennart Poettering	f5fa352f1e	execute: fix single character typo Corrects: `c413bb28df` Fixes: #17313	2020-10-14 16:41:37 +02:00
Frantisek Sumsal	d46b79bbe0	tree-wide: drop if braces around single line expressions as well	2020-10-09 15:11:55 +02:00
Lennart Poettering	14eb3285ab	execute: use empty_to_root() a bit more	2020-10-01 11:02:11 +02:00
Lennart Poettering	74e1252072	execute: add helper for checking if root_directory/root_image are set in ExecContext	2020-10-01 11:02:11 +02:00
Lennart Poettering	36296ae2ad	Merge pull request #17152 from keszybz/make-mountapivfs-default Make MountAPIVFS=yes default	2020-10-01 11:00:02 +02:00
Zbigniew Jędrzejewski-Szmek	48904c8bfd	core/execute: escape the separator in exported paths Our paths shouldn't even contain ":", but let's escape it if one somehow sneaks in.	2020-09-25 13:36:34 +02:00
Zbigniew Jędrzejewski-Szmek	d4d9f034b1	basic/strv: allow escaping the separator in strv_join() The new parameter is false everywhere except for tests, so no functional change is expected.	2020-09-25 13:36:34 +02:00
Zbigniew Jędrzejewski-Szmek	6119878480	core: turn on MountAPIVFS=true when RootImage or RootDirectory are specified Lennart wanted to do this back in `01c33c1eff`. For better or worse, this wasn't done because I thought that turning on MountAPIVFS is a compat break for RootDirectory and people might be negatively surprised by it. Without this, search for binaries doesn't work (access_fd() requires /proc). Let's turn it on, but still allow overriding to "no". When RootDirectory=/, MountAPIVFS=1 doesn't work. This might be a buglet on its own, but this patch doesn't change the situation.	2020-09-24 10:03:18 +02:00
Zbigniew Jędrzejewski-Szmek	5e98086d16	core: remember when we set ExecContext.mount_apivfs No functional change intended so far.	2020-09-24 10:03:18 +02:00
Lennart Poettering	bcaf20dc38	Merge pull request #17143 from keszybz/late-exec-resolution-alt Late exec resolution (subset)	2020-09-24 09:38:36 +02:00
Lennart Poettering	21935150a0	tree-wide: switch remaining mount() invocations over to mount_nofollow_verbose() (Well, at least the ones where that makes sense. Where it does't make sense are the ones that re invoked on the root path, which cannot possibly be a symlink.)	2020-09-23 18:57:37 +02:00
Lennart Poettering	065b47749d	tree-wide: use ERRNO_IS_PRIVILEGE() whereever appropriate	2020-09-22 16:25:22 +02:00
Zbigniew Jędrzejewski-Szmek	0af07108e4	core/execute: reduce indentation level a bit	2020-09-18 15:28:48 +02:00
Zbigniew Jędrzejewski-Szmek	9f71ba8d95	core: resolve binary names immediately before execution This has two advantages: - we save a bit of IO in early boot because we don't look for executables which we might never call - if the executable is in a different place and it was specified as a non-absolute path, it is OK if it moves to a different place. This should solve the case paths are different in the initramfs. Since the executable path is only available quite late, the call to mac_selinux_get_child_mls_label() which uses the path needs to be moved down too. Fixes #16076.	2020-09-18 15:28:48 +02:00
Zbigniew Jędrzejewski-Szmek	0706c01259	Add CLOSE_AND_REPLACE helper Similar to free_and_replace. I think this should be uppercase to make it clear that this is a macro. free_and_replace should probably be uppercased too.	2020-09-18 15:28:48 +02:00
Topi Miettinen	9df2cdd8ec	exec: SystemCallLog= directive With new directive SystemCallLog= it's possible to list system calls to be logged. This can be used for auditing or temporarily when constructing system call filters. --- v5: drop intermediary, update HASHMAP_FOREACH_KEY() use v4: skip useless debug messages, actually parse directive v3: don't declare unused variables with old libseccomp v2: fix build without seccomp or old libseccomp	2020-09-15 12:54:17 +03:00
Topi Miettinen	005bfaf118	exec: Add kill action to system call filters Define explicit action "kill" for SystemCallErrorNumber=. In addition to errno code, allow specifying "kill" as action for SystemCallFilter=. --- v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes, init syscall_errno v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit parsing without seccomp v4: fix build without seccomp v3: drop log action v2: action -> number	2020-09-15 12:54:17 +03:00
Frantisek Sumsal	69e3234db7	tree-wide: fix typos found by codespell Reported by Fossies.org	2020-09-14 15:32:37 +02:00
Tobias Kaufmann	198dc17845	core: fix set keep caps for ambient capabilities The securebit keep-caps retains the capabilities in the permitted set over an UID change (ambient capabilities are cleared though). Setting the keep-caps securebit after the uid change and before execve doesn't make sense as it is cleared during execve and there is no additional user ID change after this point. Altough the documentation (man 7 capabilities) is ambigious, keep-caps is reset during execve although keep-caps-locked is set. After execve only keep-caps-locked is set and keep-caps is cleared.	2020-09-09 11:17:42 +02:00
Tobias Kaufmann	16fcb1918a	core: fix comments on ambient capabilities The comments on the code for ambient capabilities was wrong/outdated.	2020-09-09 11:17:42 +02:00
Lennart Poettering	f3f4abad29	Merge pull request #16979 from keszybz/return-log-debug Fix 'return log_error()' and 'return log_warning()' patterns	2020-09-08 19:54:38 +02:00
Lennart Poettering	c6552f7cd5	Merge pull request #16955 from keszybz/test-execute-cleanup One patch for test-execute and assorted cleanups	2020-09-08 18:33:12 +02:00
Zbigniew Jędrzejewski-Szmek	c413bb28df	tree-wide: correct cases where return log_{error,warning} is used without value In various cases, we would say 'return log_warning()' or 'return log_error()'. Those functions return 0 if no error is passed in. For log_warning or log_error this doesn't make sense, and we generally want to propagate the error. In the few cases where the error should be ignored, I think it's better to split it in two, and call 'return 0' on a separate line.	2020-09-08 17:40:46 +02:00
Zbigniew Jędrzejewski-Szmek	90e74a66e6	tree-wide: define iterator inside of the macro	2020-09-08 12:14:05 +02:00
Zbigniew Jędrzejewski-Szmek	5b10116e49	core/{execute, manager}: reduce scope of iterator variables a bit	2020-09-04 18:14:26 +02:00
Lennart Poettering	7cc60ea414	Merge pull request #16821 from cgzones/selinux_status selinux: use SELinux status page	2020-09-03 14:55:08 +02:00
Tobias Kaufmann	dbdc4098f6	core: fix securebits setting Desired functionality: Set securebits for services started as non-root user. Failure: The starting of the service fails if no ambient capability shall be raised. ... systemd[217941]: ...: Failed to set process secure bits: Operation not permitted ... systemd[217941]: ...: Failed at step SECUREBITS spawning /usr/bin/abc.service: Operation not permitted ... systemd[1]: abc.service: Failed with result 'exit-code'. Reason: For setting securebits the capability CAP_SETPCAP is required. However the securebits (if no ambient capability shall be raised) are set after setresuid. When setresuid is invoked all capabilities are dropped from the permitted, effective and ambient capability set. If the securebit SECBIT_KEEP_CAPS is set the permitted capability set is retained, but the effective and the ambient set are cleared. If ambient capabilities shall be set, the securebit SECBIT_KEEP_CAPS is added to the securebits configured in the service file and set together with the securebits from the service file before setresuid is executed (in enforce_user). Before setresuid is executed the capabilities are the same as for pid1. This means that all capabilities in the effective, permitted and bounding set are set. Thus the capability CAP_SETPCAP is in the effective set and the prctl(PR_SET_SECUREBITS, ...) succeeds. However, if the secure bits aren't set before setresuid is invoked they shall be set shortly after the uid change in enforce_user. This fails as SECBIT_KEEP_CAPS wasn't set before setresuid and in consequence the effective and permitted set was cleared, hence CAP_SETPCAP is not set in the effective set (and cannot be raised any longer) and prctl(PR_SET_SECUREBITS, ...) failes with EPERM. Proposed solution: The proposed solution consists of three parts 1. Check in enforce_user, if securebits are configured in the service file. If securebits are configured, set SECBIT_KEEP_CAPS before invoking setresuid. 2. Don't set any other securebits than SECBIT_KEEP_CAPS in enforce_user, but set all requested ones after enforce_user. This has the advantage that securebits are set at the same place for root and non-root services. 3. Raise CAP_SETPCAP to the effective set (if not already set) before setting the securebits to avoid EPERM during the prctl syscall. For gaining CAP_SETPCAP the function capability_bounding_set_drop is splitted into two functions: - The first one raises CAP_SETPCAP (required for dropping bounding capabilities) - The second drops the bounding capabilities Why are ambient capabilities not affected by this change? Ambient capabilities get cleared during setresuid, no matter if SECBIT_KEEP_CAPS is set or not. For raising ambient capabilities for a user different to root, the requested capability has to be raised in the inheritable set first. Then the SECBIT_KEEP_CAPS securebit needs to be set before setresuid is invoked. Afterwards the ambient capability can be raised, because it is in the inheritable and permitted set. Security considerations: Although the manpage is ambiguous SECBIT_KEEP_CAPS is cleared during execve no matter if SECBIT_KEEP_CAPS_LOCKED is set or not. If both are set only SECBIT_KEEP_CAPS_LOCKED is set after execve. Setting SECBIT_KEEP_CAPS in enforce_user for being able to set securebits is no security risk, as the effective and permitted set are set to the value of the ambient set during execve (if the executed file has no file capabilities. For details check man 7 capabilities). Remark: In capability-util.c is a comment complaining about the missing capability CAP_SETPCAP in the effective set, after the kernel executed /sbin/init. Thus it is checked there if this capability has to be raised in the effective set before dropping capabilities from the bounding set. If this were true all the time, ambient capabilities couldn't be set without dropping at least one capability from the bounding set, as the capability CAP_SETPCAP would miss and setting SECBIT_KEEP_CAPS would fail with EPERM.	2020-09-01 10:53:26 +02:00
Lennart Poettering	b519529104	Merge pull request #16841 from keszybz/acl-util-bitmask Use a bitmask in fd_add_uid_acl_permission()	2020-08-31 16:45:13 +02:00
Yu Watanabe	8062e643e6	core: clear bind mounts on error Follow-up for `bbb4e7f39f`. Fixes CID#1431998.	2020-08-27 18:20:34 +09:00
Christian Göttsche	2df2152c20	selinux: fork label-aware children with up-to-date label database The parent process may not perform any label operation, so the database might not get updated on a SELinux policy change on its own. Reload the label database once on a policy change, instead of n times in every started child.	2020-08-27 10:28:53 +02:00
Zbigniew Jędrzejewski-Szmek	567aeb5801	shared/acl-util: convert rd,wr,ex to a bitmask I find this version much more readable. Add replacement defines so that when acl/libacl.h is not available, the ACL_{READ,WRITE,EXECUTE} constants are also defined. Those constants were declared in the kernel headers already in 1da177e4c3f41524e886b7f1b8a0c1f, so they should be the same pretty much everywhere.	2020-08-27 10:20:12 +02:00
PhoenixDiscord	e8607daf7d	Replace gendered pronouns with gender neutral ones. (#16844 )	2020-08-27 11:52:48 +09:00
Lennart Poettering	bbb4e7f39f	core: hide /run/credentials whenever namespacing is requested Ideally we would like to hide all other service's credentials for all services. That would imply for us to enable mount namespacing for all services, which is something we cannot do, both due to compatibility with the status quo ante, and because a number of services legitimately should be able to install mounts in the host hierarchy. Hence we do the second best thing, we hide the credentials automatically for all services that opt into mount namespacing otherwise. This is quite different from other mount sandboxing options: usually you have to explicitly opt into each. However, given that the credentials logic is a brand new concept we invented right here and now, and particularly security sensitive it's OK to reverse this, and by default hide credentials whenever we can (i.e. whenever mount namespacing is otherwise opt-ed in to). Long story short: if you want to hide other service's credentials, the most basic options is to just turn on PrivateMounts= and there you go, they should all be gone.	2020-08-25 19:45:38 +02:00
Lennart Poettering	bb0c0d6f29	core: add credentials logic Fixes: #15778 #16060	2020-08-25 19:45:35 +02:00
Lennart Poettering	4e39995371	core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options Kernel 5.8 gained a hidepid= implementation that is truly per procfs, which allows us to mount a distinct once into every unit, with individual hidepid= settings. Let's expose this via two new settings: ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=). Replaces: #11670	2020-08-24 20:11:02 +02:00
Lennart Poettering	52b3d6523f	namespace: move protect_{home\|system} into NamespaceInfo it's not entirely clear what shall be passed via parameter and what via struct, but these two definitely fit well with the other protect_xyz fields, hence let's move them over. We probably should move a lot more more fields into the structure actuall (most? all even?).	2020-08-24 20:10:30 +02:00
Luca Boccassi	427353f668	core: add mount options support for MountImages Follow the same model established for RootImage and RootImageOptions, and allow to either append a single list of options or tuples of partition_number:options.	2020-08-20 14:45:40 +01:00
Luca Boccassi	9ece644435	core: change RootImageOptions to use names instead of partition numbers Follow the designations from the Discoverable Partitions Specification	2020-08-20 13:58:02 +01:00
Luca Boccassi	b3d133148e	core: new feature MountImages Follows the same pattern and features as RootImage, but allows an arbitrary mount point under / to be specified by the user, and multiple values - like BindPaths. Original implementation by @topimiettinen at: https://github.com/systemd/systemd/pull/14451 Reworked to use dissect's logic instead of bare libmount() calls and other review comments. Thanks Topi for the initial work to come up with and implement this useful feature.	2020-08-05 21:34:55 +01:00
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00

1 2 3 4 5 ...

691 commits