Systemd

Author	SHA1	Message	Date
Topi Miettinen	7d85383edb	tree-wide: add size limits for tmpfs mounts Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.	2020-05-13 00:37:18 +02:00
Lennart Poettering	0cd41757d0	sd-bus: work around ubsan warning ubsan complains that we add an offset to a NULL ptr here in some cases. Which isn't really a bug though, since we only use it as the end condition for a for loop, but we can still fix it... Fixes: #15522	2020-04-23 08:54:30 +02:00
Topi Miettinen	c3151977d7	namespace: fix MAC labels of /dev when PrivateDevices=yes Without changing the SELinux label for private /dev of a service, it will take a generic file system label: system_u:object_r:tmpfs_t:s0 After this change it is the same as without `PrivateDevices=yes`: system_u:object_r:device_t:s0 This helps writing SELinux policies, as the same rules for `/dev` will apply despite any `PrivateDevices=yes` setting.	2020-03-12 08:23:27 +00:00
Topi Miettinen	de46b2be07	namespace: ignore prefix chars when comparing paths Other callers of path_strv_contains() or PATH_IN_SET() don't seem to handle paths prefixed with -+.	2020-03-10 16:48:34 +02:00
Zbigniew Jędrzejewski-Szmek	105a1a36cd	tree-wide: fix spelling of lookup and setup verbs "set up" and "look up" are the verbs, "setup" and "lookup" are the nouns.	2020-03-03 15:02:53 +01:00
Topi Miettinen	aeac9dd647	Revert "namespace: fix MAC labels of /dev when PrivateDevices=yes" This reverts commit `e6e81ec0a5`.	2020-02-29 23:35:43 +09:00
Topi Miettinen	e6e81ec0a5	namespace: fix MAC labels of /dev when PrivateDevices=yes Without changing the SELinux label for private /dev of a service, it will take a generic file system label: system_u:object_r:tmpfs_t:s0 After this change it is the same as without `PrivateDevices=yes`: system_u:object_r:device_t:s0 This helps writing SELinux policies, as the same rules for `/dev` will apply despite any `PrivateDevices=yes` setting.	2020-02-28 14:17:48 +00:00
Christian Göttsche	1acf344dfa	core: do not prepare a SELinux context for dummy files for devicenode bind-mounting Let systemd create the dummy file where a device node will be mounted on with the default label for the parent directory (e.g. /tmp/namespace-dev-yTMwAe/dev/). Fixes: #13762	2020-02-06 10:20:14 +01:00
Lennart Poettering	91dd5f7cbe	core: add new LogNamespace= execution setting	2020-01-31 15:01:43 +01:00
Lennart Poettering	575a915a74	Merge pull request #14532 from poettering/namespace-dynamic-user-fix Make DynamicUser=1 work in a userns container	2020-01-13 16:47:15 +01:00
Lennart Poettering	7cce68e1e0	core: make sure we use the correct mount flag when re-mounting bind mounts When in a userns environment we cannot take away per-mount point flags set on a mount point that was passed to us. Hence we need to be careful to always check the actual mount flags in place and manipulate only those flags of them that we actually want to change and not reset more as side-effect. We mostly got this right already in bind_remount_recursive_with_mountinfo(), but didn't in the simpler bind_remount_one_with_mountinfo(). Catch up. (The old code assumed that the MountEntry.flags field contained the right flag settings, but it actually doesn't for new mounts we just established as for those mount() establishes the initial flags for us, and we have to read them back to figure out which ones the kernel picked.) Fixes: #13622	2020-01-09 15:18:08 +01:00
Lennart Poettering	b0a94268f8	core: when we cannot open an image file for write, try read-only Closes: #14442	2020-01-09 11:18:06 +01:00
Lennart Poettering	c8c535d589	namespace: tweak checks whether we can mount image read-only So far we set up a loopback file read-only iff ProtectSystem= and ProtectHome= both where set to values that mark these dirs read-only. Let's extend that and also be happy if /home and the root dir are marked read-only by some other means. Fixes: #14442	2020-01-09 11:18:02 +01:00
Anita Zhang	e5f10cafe0	core: create inaccessible nodes for users when making runtime dirs To support ProtectHome=y in a user namespace (which mounts the inaccessible nodes), the nodes need to be accessible by the user. Create these paths and devices in the user runtime directory so they can be used later if needed.	2019-12-18 11:09:30 -08:00
Anita Zhang	adae5eb977	Merge pull request #14219 from poettering/homed-preparatory-loop preparatory /dev/loopN support split out of homed PR	2019-12-04 16:07:41 -08:00
Jérémy Rosen	a652f050a7	Create parent directories when creating systemd-private subdirs This is needed when systemd is compiled without systemd-tmpfiles	2019-12-04 09:22:52 +01:00
Lennart Poettering	e08f94acf5	loop-util: accept loopback flags when creating loopback device This way callers can choose if they want partition scanning or not.	2019-12-02 10:05:09 +01:00
Kevin Kuehler	94a7b2759d	core: ProtectKernelLogs= mask kmsg in proc and sys Block access to /dev/kmsg and /proc/kmsg when ProtectKernelLogs is set.	2019-11-14 12:58:43 -08:00
Yu Watanabe	e30e8b5073	tree-wide: drop stat.h or statfs.h when stat-util.h is included	2019-11-04 00:30:32 +09:00
Yu Watanabe	455fa9610c	tree-wide: drop string.h when string-util.h or friends are included	2019-11-04 00:30:32 +09:00
Yu Watanabe	f5947a5e92	tree-wide: drop missing.h	2019-10-31 17:57:03 +09:00
Zbigniew Jędrzejewski-Szmek	a5648b8094	basic/fs-util: change CHASE_OPEN flag into a separate output parameter chase_symlinks() would return negative on error, and either a non-negative status or a non-negative fd when CHASE_OPEN was given. This made the interface quite complicated, because dependning on the flags used, we would get two different "types" of return object. Coverity was always confused by this, and flagged every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it would this that an fd is returned). This patch uses a saparate output parameter, so there is no confusion. (I think it is OK to have functions which return either an error or an fd. It's only returning either an fd or a non-fd that is confusing.)	2019-10-24 22:44:24 +09:00
Lennart Poettering	2caa38e99f	tree-wide: some more [static] related fixes let's add [static] where it was missing so far Drop [static] on parameters that can be NULL. Add an assert() around parameters that have [static] and can't be NULL hence. Add some "const" where it was forgotten.	2019-07-12 16:40:10 +02:00
Lennart Poettering	c6134d3e2f	path-util: get rid of prefix_root() prefix_root() is equivalent to path_join() in almost all ways, hence let's remove it. There are subtle differences though: prefix_root() will try shorten multiple "/" before and after the prefix. path_join() doesn't do that. This means prefix_root() might return a string shorter than both its inputs combined, while path_join() never does that. I like the path_join() semantics better, hence I think dropping prefix_root() is totally OK. In the end the strings generated by both functon should always be identical in terms of path_equal() if not streq(). This leaves prefix_roota() in place. Ideally we'd have path_joina(), but I don't think we can reasonably implement that as a macro. or maybe we can? (if so, sounds like something for a later PR) Also add in a few missing OOM checks	2019-06-21 08:42:55 +09:00
Zbigniew Jędrzejewski-Szmek	7cc5ef5f18	pid1: improve message when setting up namespace fails I covered the most obvious paths: those where there's a clear problem with a path specified by the user. Prints something like this (at error level): May 21 20:00:01.040418 systemd[125871]: bad-workdir.service: Failed to set up mount namespacing: /run/systemd/unit-root/etc/tomcat9/Catalina: No such file or directory May 21 20:00:01.040456 systemd[125871]: bad-workdir.service: Failed at step NAMESPACE spawning /bin/true: No such file or directory Fixes #10972.	2019-05-22 16:28:02 +02:00
Lennart Poettering	6990fb6bc6	tree-wide: (void)ify a few unlink() and rmdir() Let's be helpful to static analyzers which care about whether we knowingly ignore return values. We do in these cases, since they are usually part of error paths.	2019-03-27 18:09:56 +01:00
Lennart Poettering	9ce4e4b0f6	namespace: when DynamicUser=1 is set, mount StateDirectory= bind mounts "nosuid" Add even more suid/sgid protection to DynamicUser= envionments: the state directories we bind mount from the host will now have the nosuid flag set, to disable the effect of nosuid on them.	2019-03-25 19:57:15 +01:00
Lennart Poettering	64e82c1976	mount-util: beef up bind_remount_recursive() to be able to toggle more than MS_RDONLY The function is otherwise generic enough to toggle other bind mount flags beyond MS_RDONLY (for example: MS_NOSUID or MS_NODEV), hence let's beef it up slightly to support that too.	2019-03-25 19:33:55 +01:00
Lennart Poettering	867189b545	namespace: get rid of {} around single-line if blocks	2019-03-25 19:33:55 +01:00
Lennart Poettering	39e91a2777	namespace: get rid of local variable	2019-03-25 19:33:55 +01:00
Lennart Poettering	1019a48f40	namespace: (void)ify a number of syscalls	2019-03-25 19:33:55 +01:00
Lennart Poettering	5f7a690aaa	namespace: replace one case of stack allocation with heap allocation The list of mounts might grow quite large, let's avoid the stack for this. Better safe than sorry.	2019-03-25 19:33:55 +01:00
Lennart Poettering	d8b4d14df4	util: split out nulstr related stuff to nulstr-util.[ch]	2019-03-14 13:25:52 +01:00
Lennart Poettering	760877e90c	util: split out sorting related calls to new sort-util.[ch]	2019-03-13 12:16:43 +01:00
Lennart Poettering	0cb8e3d118	util: split out namespace related stuff into a new namespace-util.[ch] pair Just some minor reorganiztion.	2019-03-13 12:16:38 +01:00
Yu Watanabe	5beb8688e0	core/namespace: logs mount mode when the entry is dropped	2019-03-13 11:53:22 +09:00
Yu Watanabe	1e05071d27	core/namespace: introduce new mount mode READWRITE_IMPLICIT ProtectSystem=strict or ProtectKernelTunable=yes create implicit read-write mounts, but they are not overridable by TemporaryFileSystem=. This makes such implicit read-write mounts use the new mount mode. So, they can be override by TemproraryFileSystem= now. A typical usecase is that ProtectSystem=strict and ProtectHome=tmpfs. Fixes #11276.	2019-03-13 11:51:09 +09:00
Lennart Poettering	51af7fb230	core: add open_netns_path() helper The new call allows us to open a netns from the file system, and store it in a "storage fd pair". It's supposed to work with setup_netns() and allows pre-population of the netns used with one opened from the file system.	2019-03-07 16:55:23 +01:00
Lennart Poettering	44ffcbaea4	execute: (void)ify more	2019-03-07 16:53:45 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00
Zbigniew Jędrzejewski-Szmek	3042bbebdd	tree-wide: use c99 static for array size declarations https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html This only works with clang, unfortunately gcc doesn't seem to implement the check (tested with gcc-8.2.1-5.fc29.x86_64). Simulated error: [2/3] Compiling C object 'systemd-nspawn@exe/src_nspawn_nspawn.c.o'. ../src/nspawn/nspawn.c:3179:45: warning: array argument is too small; contains 15 elements, callee requires at least 16 [-Warray-bounds] candidate = (uid_t) siphash24(arg_machine, strlen(arg_machine), hash_key); ^ ~~~~~~~~ ../src/basic/siphash24.h:24:64: note: callee declares array parameter as static here uint64_t siphash24(const void *in, size_t inlen, const uint8_t k[static 16]); ^~~~~~~~~~~~	2019-01-04 12:37:25 +01:00
Zbigniew Jędrzejewski-Szmek	049af8ad0c	Split out part of mount-util.c into mountpoint-util.c The idea is that anything which is related to actually manipulating mounts is in mount-util.c, but functions for mountpoint introspection are moved to the new file. Anything which requires libmount must be in mount-util.c. This was supposed to be a preparation for further changes, with no functional difference, but it results in a significant change in linkage: $ ldd build/libnss_*.so.2 (before) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff77bf5000) librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000) libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000) /lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffc19cc0000) librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000) /lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe8ef8e000) librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000) libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000) /lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffda6d17000) librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000) libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000) /lib64/ld-linux-x86-64.so.2 (0x00007f610b995000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000) (after) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff0b5e2000) librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000) libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000) /lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffdc30a7000) librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000) libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000) /lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe95747000) librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000) libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000) /lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffe67b51000) librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000) libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000) libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000) /lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000) I don't quite understand what is going on here, but let's not be too picky.	2018-11-29 21:03:44 +01:00
Zbigniew Jędrzejewski-Szmek	baaa35ad70	coccinelle: make use of SYNTHETIC_ERRNO Ideally, coccinelle would strip unnecessary braces too. But I do not see any option in coccinelle for this, so instead, I edited the patch text using search&replace to remove the braces. Unfortunately this is not fully automatic, in particular it didn't deal well with if-else-if-else blocks and ifdefs, so there is an increased likelikehood be some bugs in such spots. I also removed part of the patch that coccinelle generated for udev, where we returns -1 for failure. This should be fixed independently.	2018-11-22 10:54:38 +01:00
Yu Watanabe	93bab28895	tree-wide: use typesafe_qsort()	2018-09-19 08:02:52 +09:00
Yu Watanabe	2e4a4faea8	core/namespace: add more log messages	2018-09-18 14:31:09 +09:00
Alan Jenkins	fcac12d150	namespace: remove redundant .has_prefix=false The MountEntry's added for EMPTY_DIR work very similarly to the TMPFS ones. In both cases, .has_prefix is false. In fact, .has_prefix is false in all the MountEntry's we add except for the access mounts (READONLY etc). But EMPTY_DIR stuck out by explicitly setting .has_prefix = false. Let's remove that.	2018-09-01 17:23:01 +09:00
Alan Jenkins	4a756839e6	namespace: we always use a root_directory now We changed to always setup the new namespace in a separate directory (commit `0722b35`)	2018-09-01 17:23:01 +09:00
Alan Jenkins	ad8e66dcc4	namespace: fix mode for TemporaryFileSystem= ... when no mount options are passed. Change the code, to avoid the following failure in the newly added tests: exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c '[ "$(stat -c %a /var)" == 755 ]' ++ stat -c %a /var + '[' 1777 == 755 ']' Received SIGCHLD from PID 30364 (sh). Child 30364 (sh) died (code=exited, status=1/FAILURE) (And I spotted an opportunity to use TAKE_PTR() at the end).	2018-09-01 17:22:14 +09:00
Alan Jenkins	69338c3dfb	namespace: don't try to remount superblocks We can't remount the underlying superblocks, if we are inside a user namespace and running Linux <= 4.17. We can only change the per-mount flags (MS_REMOUNT \| MS_BIND). This type of mount() call can only change the per-mount flags, so we don't have to worry about passing the right string options now. Fixes #9914 ("Since `1beab8b` was merged, systemd has been failing to start systemd-resolved inside unprivileged containers" ... "Failed to re-mount '/run/systemd/unit-root/dev' read-only: Operation not permitted"). > It's basically my fault :-). I pointed out we could remount read-only > without MS_BIND when reviewing the PR that added TemporaryFilesystem=, > and poettering suggested to change PrivateDevices= at the same time. > I think it's safe to change back, and I don't expect anyone will notice > a difference in behaviour. > > It just surprised me to realize that > `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the > superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does. > Maybe a comment could note the kernel version (v4.18), that lets you > remount without MS_BIND inside a user namespace. This makes the code longer and I guess this function is still ugly, sorry. One obstacle to cleaning it up is the interaction between `PrivateDevices=yes` and `ReadOnlyPaths=/dev`. I've added a test for the existing behaviour, which I think is now the correct behaviour.	2018-08-30 11:17:16 +01:00
Yu Watanabe	52e4d62550	Merge pull request #9852 from poettering/namespace-errno namespace: be more careful when handling namespacing failures	2018-08-22 11:16:29 +09:00

1 2 3 4 5

240 commits