Systemd

Commit Graph

Author	SHA1	Message	Date
Yu Watanabe	c4837f4567	Revert "core/namespace: ignore ENOENT for /proc/sys/kernel/domainname and hostname" This reverts commit `0ebc9f23fa`. With the previous commit, these files should always exist. Closes #17979.	2020-12-15 02:38:35 +09:00
Yu Watanabe	ad74f28a13	core/namespace: do not ignore non-EPERM mount error Follow-up for `61f8a7bd3e`.	2020-12-15 02:37:03 +09:00
Yu Watanabe	61f8a7bd3e	core/namespace: use existing /proc when not enough priviledge Fixes #17860.	2020-12-14 16:12:43 +01:00
Yu Watanabe	0ebc9f23fa	core/namespace: ignore ENOENT for /proc/sys/kernel/domainname and hostname If they do not exist, hostname or domainname cannot be modified. So, it is ok. Fixes #17866, especially https://github.com/systemd/systemd/issues/17866#issuecomment-744118614.	2020-12-14 14:15:28 +00:00
Daan De Meyer	77f16dbd6d	Don't assume /run/systemd exists when creating unit-root When running tests in a mkosi container, /run/systemd might not exist yet in the container which causes test-execute to fail. Fixes #17842.	2020-12-05 11:11:58 +00:00
Yu Watanabe	db9ecf0501	license: LGPL-2.1+ -> LGPL-2.1-or-later	2020-11-09 13:23:58 +09:00
Frantisek Sumsal	d46b79bbe0	tree-wide: drop if braces around single line expressions as well	2020-10-09 15:11:55 +02:00
bauen1	19cd4e1967	core: ensure that namespace tmp directories always get the correct label If a namespace with PrivateTmp=true is constructed we need to restore the context of the namespaces /tmp directory (i.e. /tmp/systemd-private-XXXXX/tmp) to the (default) context of /tmp . Otherwise filetransitions might result in the namespaces tmp directory having the wrong context.	2020-09-28 12:36:07 +02:00
Lennart Poettering	21935150a0	tree-wide: switch remaining mount() invocations over to mount_nofollow_verbose() (Well, at least the ones where that makes sense. Where it does't make sense are the ones that re invoked on the root path, which cannot possibly be a symlink.)	2020-09-23 18:57:37 +02:00
Lennart Poettering	aee36b4ea2	dissect-image: process /usr/ GPT partition type	2020-09-19 21:19:51 +02:00
Lennart Poettering	89e62e0bd3	dissect: wrap verity settings in new VeritySettings structure Just some refactoring: let's place the various verity related parameters in a common structure, and pass that around instead of the individual parameters. Also, let's load the PKCS#7 signature data when finding metadata right-away, instead of delaying this until we need it. In all cases we call this there's not much time difference between the metdata finding and the loading, hence this simplifies things and makes sure root hash data and its signature is now always acquired together.	2020-09-17 20:36:23 +09:00
Lennart Poettering	bbb4e7f39f	core: hide /run/credentials whenever namespacing is requested Ideally we would like to hide all other service's credentials for all services. That would imply for us to enable mount namespacing for all services, which is something we cannot do, both due to compatibility with the status quo ante, and because a number of services legitimately should be able to install mounts in the host hierarchy. Hence we do the second best thing, we hide the credentials automatically for all services that opt into mount namespacing otherwise. This is quite different from other mount sandboxing options: usually you have to explicitly opt into each. However, given that the credentials logic is a brand new concept we invented right here and now, and particularly security sensitive it's OK to reverse this, and by default hide credentials whenever we can (i.e. whenever mount namespacing is otherwise opt-ed in to). Long story short: if you want to hide other service's credentials, the most basic options is to just turn on PrivateMounts= and there you go, they should all be gone.	2020-08-25 19:45:38 +02:00
Lennart Poettering	4e39995371	core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options Kernel 5.8 gained a hidepid= implementation that is truly per procfs, which allows us to mount a distinct once into every unit, with individual hidepid= settings. Let's expose this via two new settings: ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=). Replaces: #11670	2020-08-24 20:11:02 +02:00
Lennart Poettering	df6b900a1b	namespace: assert() first, use second	2020-08-24 20:10:58 +02:00
Lennart Poettering	52b3d6523f	namespace: move protect_{home\|system} into NamespaceInfo it's not entirely clear what shall be passed via parameter and what via struct, but these two definitely fit well with the other protect_xyz fields, hence let's move them over. We probably should move a lot more more fields into the structure actuall (most? all even?).	2020-08-24 20:10:30 +02:00
Lennart Poettering	9aab8d7a98	Merge pull request #16804 from keszybz/conditionals-and-spelling-fixes Conditionals and spelling fixes	2020-08-21 13:36:30 +02:00
Zbigniew Jędrzejewski-Szmek	2aed63f427	tree-wide: fix spelling of "fallback" Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back" is the verb. (This is pretty clear when we construct a sentence in the present continous: "we are falling back" not "we are fallbacking").	2020-08-20 17:45:32 +02:00
Luca Boccassi	427353f668	core: add mount options support for MountImages Follow the same model established for RootImage and RootImageOptions, and allow to either append a single list of options or tuples of partition_number:options.	2020-08-20 14:45:40 +01:00
Luca Boccassi	c20acbb2bd	core: cleanup unused variables Leftovers from previous implementation of MountImages feature, unused now	2020-08-20 13:24:32 +01:00
Lennart Poettering	3f181262f4	namespace: fix minor memory leak	2020-08-14 15:33:04 +02:00
Luca Boccassi	b3d133148e	core: new feature MountImages Follows the same pattern and features as RootImage, but allows an arbitrary mount point under / to be specified by the user, and multiple values - like BindPaths. Original implementation by @topimiettinen at: https://github.com/systemd/systemd/pull/14451 Reworked to use dissect's logic instead of bare libmount() calls and other review comments. Thanks Topi for the initial work to come up with and implement this useful feature.	2020-08-05 21:34:55 +01:00
Zbigniew Jędrzejewski-Szmek	7e62257219	Merge pull request #16308 from bluca/root_image_options service: add new RootImageOptions feature	2020-08-03 10:04:36 +02:00
Zbigniew Jędrzejewski-Szmek	b67ec8e5b2	pid1: stop limiting size of /dev/shm The explicit limit is dropped, which means that we return to the kernel default of 50% of RAM. See `362a55fc14` for a discussion why that is not as much as it seems. It turns out various applications need more space in /dev/shm and we would break them by imposing a low limit. While at it, rename the define and use a single macro for various tmpfs mounts. We don't really care what the purpose of the given tmpfs is, so it seems reasonable to use a single macro. This effectively reverts part of `7d85383edb`. Fixes #16617.	2020-07-30 18:48:35 +02:00
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00
Zbigniew Jędrzejewski-Szmek	6cdc429454	Merge pull request #16340 from keszybz/var-tmp-readonly Create ro private /var/tmp dir when /var/tmp is read-only	2020-07-14 19:59:48 +02:00
Zbigniew Jędrzejewski-Szmek	56a13a495c	pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only Read-only /var/tmp is more likely, because it's backed by a real device. /tmp is (by default) backed by tmpfs, but it doesn't have to be. In both cases the same consideration applies. If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail because we cannot create the subdir under /var/tmp to mount the private directory. But many services actually don't require /var/tmp (either because they only use it occasionally, or because they only use /tmp, or even because they don't use the temporary directories at all, and PrivateTmp=yes is used to isolate them from the rest of the system). To handle both cases let's create a read-only directory under /run/systemd and mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into dumping too much data in /run.) $ sudo systemd-run -t -p PrivateTmp=yes bash Running as unit: run-u14.service Press ^] three times within 1s to disconnect TTY. [root@workstation /]# ls -l /tmp/ total 0 [root@workstation /]# ls -l /var/tmp/ total 0 [root@workstation /]# touch /tmp/f [root@workstation /]# touch /var/tmp/f touch: cannot touch '/var/tmp/f': Read-only file system This commit has more changes than I like to put in one commit, but it's touching all the same paths so it's hard to split. exec_runtime_make() was using the wrong cleanup function, so the directory would be left behind on error.	2020-07-14 19:47:15 +02:00
Christian Göttsche	f2df56bfea	namespace: unify logging in mount_tmpfs Fixes: `abad72be4d` Follow up: #16426	2020-07-11 21:25:39 +02:00
Christian Göttsche	abad72be4d	namespace: fix MAC labels of TemporaryFileSystem= Reproducible with: systemd-run -p TemporaryFileSystem=/root -t /bin/bash ls -dZ /root Prior: root:object_r:tmpfs_t:s0 /root Past: root:object_r:user_home_dir_t:s0 /root	2020-07-11 00:09:05 +02:00
Zbigniew Jędrzejewski-Szmek	cbc056c819	core: wrap some long lines and other formatting changes	2020-07-08 16:37:23 +02:00
Alan Perry	5dc60faae5	add error message when bind mount src missing	2020-07-07 20:04:19 +02:00
Zbigniew Jędrzejewski-Szmek	37b22b3b47	tree: wide "the the" and other trivial grammar fixes	2020-07-02 09:51:38 +02:00
Luca Boccassi	d4d55b0d13	core: add RootHashSignature service parameter Allow to explicitly pass root hash signature as a unit option. Takes precedence over implicit checks.	2020-06-25 08:45:21 +01:00
Luca Boccassi	c2923fdcd7	dissect/nspawn: add support for dm-verity root hash signature Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a pkcs7 signature, with the public key in the kernel keyring, is available. Use it if libcryptsetup supports it.	2020-06-25 08:45:21 +01:00
Lennart Poettering	6b000af4f2	tree-wide: avoid some loaded terms https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.	2020-06-25 09:00:19 +02:00
Luca Boccassi	0389f4fa81	core: add RootHash and RootVerity service parameters Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.	2020-06-23 10:50:09 +02:00
Zbigniew Jędrzejewski-Szmek	9664be199a	Merge pull request #16118 from poettering/inaccessible-fixlets move $XDG_RUNTIME_DIR/inaccessible/ to $XDG_RUNTIME_DIR/systemd/inaccessible	2020-06-10 10:23:13 +02:00
Lennart Poettering	48b747fa03	inaccessible: move inaccessible file nodes to /systemd/ subdir in runtime dir always Let's make sure $XDG_RUNTIME_DIR for the user instance and /run for the system instance is always organized the same way: the "inaccessible" device nodes should be placed in a subdir of either called "systemd" and a subdir of that called "inaccessible". This way we can emphasize the common behaviour, and only differ where really necessary. Follow-up for #13823	2020-06-09 16:23:56 +02:00
Luca Boccassi	e7cbe5cb9e	dissect: support single-filesystem verity images with external verity hash dm-verity support in dissect-image at the moment is restricted to GPT volumes. If the image a single-filesystem type without a partition table (eg: squashfs) and a roothash/verity file are passed, set the verity flag and mark as read-only.	2020-06-09 12:19:21 +01:00
Topi Miettinen	7d85383edb	tree-wide: add size limits for tmpfs mounts Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.	2020-05-13 00:37:18 +02:00
Lennart Poettering	0cd41757d0	sd-bus: work around ubsan warning ubsan complains that we add an offset to a NULL ptr here in some cases. Which isn't really a bug though, since we only use it as the end condition for a for loop, but we can still fix it... Fixes: #15522	2020-04-23 08:54:30 +02:00
Topi Miettinen	c3151977d7	namespace: fix MAC labels of /dev when PrivateDevices=yes Without changing the SELinux label for private /dev of a service, it will take a generic file system label: system_u:object_r:tmpfs_t:s0 After this change it is the same as without `PrivateDevices=yes`: system_u:object_r:device_t:s0 This helps writing SELinux policies, as the same rules for `/dev` will apply despite any `PrivateDevices=yes` setting.	2020-03-12 08:23:27 +00:00
Topi Miettinen	de46b2be07	namespace: ignore prefix chars when comparing paths Other callers of path_strv_contains() or PATH_IN_SET() don't seem to handle paths prefixed with -+.	2020-03-10 16:48:34 +02:00
Zbigniew Jędrzejewski-Szmek	105a1a36cd	tree-wide: fix spelling of lookup and setup verbs "set up" and "look up" are the verbs, "setup" and "lookup" are the nouns.	2020-03-03 15:02:53 +01:00
Topi Miettinen	aeac9dd647	Revert "namespace: fix MAC labels of /dev when PrivateDevices=yes" This reverts commit `e6e81ec0a5`.	2020-02-29 23:35:43 +09:00
Topi Miettinen	e6e81ec0a5	namespace: fix MAC labels of /dev when PrivateDevices=yes Without changing the SELinux label for private /dev of a service, it will take a generic file system label: system_u:object_r:tmpfs_t:s0 After this change it is the same as without `PrivateDevices=yes`: system_u:object_r:device_t:s0 This helps writing SELinux policies, as the same rules for `/dev` will apply despite any `PrivateDevices=yes` setting.	2020-02-28 14:17:48 +00:00
Christian Göttsche	1acf344dfa	core: do not prepare a SELinux context for dummy files for devicenode bind-mounting Let systemd create the dummy file where a device node will be mounted on with the default label for the parent directory (e.g. /tmp/namespace-dev-yTMwAe/dev/). Fixes: #13762	2020-02-06 10:20:14 +01:00
Lennart Poettering	91dd5f7cbe	core: add new LogNamespace= execution setting	2020-01-31 15:01:43 +01:00
Lennart Poettering	575a915a74	Merge pull request #14532 from poettering/namespace-dynamic-user-fix Make DynamicUser=1 work in a userns container	2020-01-13 16:47:15 +01:00
Lennart Poettering	7cce68e1e0	core: make sure we use the correct mount flag when re-mounting bind mounts When in a userns environment we cannot take away per-mount point flags set on a mount point that was passed to us. Hence we need to be careful to always check the actual mount flags in place and manipulate only those flags of them that we actually want to change and not reset more as side-effect. We mostly got this right already in bind_remount_recursive_with_mountinfo(), but didn't in the simpler bind_remount_one_with_mountinfo(). Catch up. (The old code assumed that the MountEntry.flags field contained the right flag settings, but it actually doesn't for new mounts we just established as for those mount() establishes the initial flags for us, and we have to read them back to figure out which ones the kernel picked.) Fixes: #13622	2020-01-09 15:18:08 +01:00
Lennart Poettering	b0a94268f8	core: when we cannot open an image file for write, try read-only Closes: #14442	2020-01-09 11:18:06 +01:00

1 2 3 4 5 ...

278 Commits