Systemd

Author	SHA1	Message	Date
Topi Miettinen	7d85383edb	tree-wide: add size limits for tmpfs mounts Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.	2020-05-13 00:37:18 +02:00
Wen Yang	f74349d88b	mount-setup: change the system mount propagation to shared by default only at bootup The commit `b3ac5f8cb9` has changed the system mount propagation to shared by default, and according to the following patch: https://github.com/opencontainers/runc/pull/208 When starting the container, the pouch daemon will call runc to execute make-private. However, if the systemctl daemon-reexec is executed after the container has been started, the system mount propagation will be changed to share again by default, and the make-private operation above will have no chance to execute.	2020-04-09 10:14:20 +02:00
Topi Miettinen	3b5b6826aa	mount-setup: make /dev noexec /dev used to be mounted with "exec" flag due to /dev/MAKEDEV script but that's history and it's now located in /sbin. mmap() with file descriptor to "/dev/zero" (instead of modern mmap(,,,MAP_ANON...)) will still work.	2020-03-09 19:08:42 +01:00
Anita Zhang	e5f10cafe0	core: create inaccessible nodes for users when making runtime dirs To support ProtectHome=y in a user namespace (which mounts the inaccessible nodes), the nodes need to be accessible by the user. Create these paths and devices in the user runtime directory so they can be used later if needed.	2019-12-18 11:09:30 -08:00
Yu Watanabe	f5947a5e92	tree-wide: drop missing.h	2019-10-31 17:57:03 +09:00
Zbigniew Jędrzejewski-Szmek	86e94d95d0	Merge pull request #13246 from keszybz/add-SystemdOptions-efi-variable Add efi variable to augment /proc/cmdline	2019-10-03 12:19:44 +02:00
Zbigniew Jędrzejewski-Szmek	90b059b608	pid1: do not warn if /run/systemd/relabel-extra.d/ doesn't exist After all, that is the expected state.	2019-09-19 18:01:40 +02:00
Zbigniew Jędrzejewski-Szmek	0bb2f0f10e	util-lib: split shared/efivars into basic/efivars and shared/efi-loader I want to use efivars.[ch] in proc-cmdline.c, but most of the efivars stuff is not needed in basic/. Move the file from shared/ to basic/, but then move back most of the higher-level functions to the new shared/efi-loader.c file.	2019-09-16 18:08:53 +02:00
Zbigniew Jędrzejewski-Szmek	fdb3decaa7	util-lib: move some functions from basic/cgroup-util to shared/cgroup-setup This way less stuff needs to be in basic. Initially, I wanted to move all the parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the move done in this patch is not necessary anymore. Nevertheless, it is still valid on its own. If at some point we want to expose libbasic, it is better to to not have stuff that belong in libshared there.	2019-09-16 18:08:00 +02:00
Yu Watanabe	f39fc2d88b	Merge pull request #13354 from keszybz/two-refactoring-patches Two or more refactoring patches	2019-09-16 21:24:13 +09:00
Zbigniew Jędrzejewski-Szmek	36b12282e1	basic/conf-files: make conf_files_list() take just a single directory This function had two users (apart from tests), and both only used one argument. And it seems likely that if we need to pass more directories, either the _nulstr() or the _strv() form would be used. Let's simplify the code.	2019-09-16 09:15:05 +02:00
Zbigniew Jędrzejewski-Szmek	48da02ec6f	core/mount-setup: use conf_files_list_strv() for relabel-extra.d/	2019-09-16 09:15:05 +02:00
Benjamin Gilbert	71de68476c	mount-setup: relabel items mentioned directly in relabel-extra.d relabel_extra() relabels the descendants of directories listed in relabel-extra.d, but doesn't relabel the files or directories explicitly named there. This makes it impossible to use relabel-extra.d to relabel the root of a filesystem. Fix by relabeling the named items too.	2019-09-16 09:04:22 +02:00
Lennart Poettering	b910cc72c0	tree-wide: get rid of strappend() It's a special case of strjoin(), so no need to keep both. In particular as typing strjoin() is even shoert than strappend().	2019-07-12 14:31:12 +09:00
Lennart Poettering	d8b4d14df4	util: split out nulstr related stuff to nulstr-util.[ch]	2019-03-14 13:25:52 +01:00
Lennart Poettering	70a74ec645	mount-setup: don't consider it reason to fail if we can't relabel cgroupfs We usually don't care much about relabel failures, let's not do that here either.	2018-12-12 20:46:07 +01:00
Lennart Poettering	c4217b43d1	mount-setup: use FOREACH_STRING where appropriate	2018-12-12 20:46:07 +01:00
Lennart Poettering	65e183d789	mount-setup: optionally, relabel a configured set of files/dirs after loading policy Fixes: #10466	2018-12-12 20:46:07 +01:00
Zbigniew Jędrzejewski-Szmek	b2ac2b01c8	Merge pull request #10996 from poettering/oci-prep Preparation for the nspawn-OCI work	2018-11-30 10:09:00 +01:00
Zbigniew Jędrzejewski-Szmek	049af8ad0c	Split out part of mount-util.c into mountpoint-util.c The idea is that anything which is related to actually manipulating mounts is in mount-util.c, but functions for mountpoint introspection are moved to the new file. Anything which requires libmount must be in mount-util.c. This was supposed to be a preparation for further changes, with no functional difference, but it results in a significant change in linkage: $ ldd build/libnss_*.so.2 (before) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff77bf5000) librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000) libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000) /lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffc19cc0000) librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000) /lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe8ef8e000) librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000) libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000) /lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffda6d17000) librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000) libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000) /lib64/ld-linux-x86-64.so.2 (0x00007f610b995000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000) (after) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff0b5e2000) librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000) libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000) /lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffdc30a7000) librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000) libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000) /lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe95747000) librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000) libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000) /lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffe67b51000) librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000) libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000) libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000) /lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000) I don't quite understand what is going on here, but let's not be too picky.	2018-11-29 21:03:44 +01:00
Lennart Poettering	30874dda3a	dev-setup: generalize logic we use to create "inaccessible" device nodes Let's generalize this, so that we can use this in nspawn later on, which is pretty useful as we need to be able to mask files from the inner child of nspawn too, where the host's /run/systemd/inaccessible directory is not visible anymore. Moreover, if nspawn can create these nodes on its own before the payload this means the payload can run with fewer privileges.	2018-11-29 20:21:40 +01:00
Lennart Poettering	143fadf369	core: remove JoinControllers= configuration setting This removes the ability to configure which cgroup controllers to mount together. Instead, we'll now hardcode that "cpu" and "cpuacct" are mounted together as well as "net_cls" and "net_prio". The concept of mounting controllers together has no future as it does not exist to cgroupsv2. Moreover, the current logic is systematically broken, as revealed by the discussions in #10507. Also, we surveyed Red Hat customers and couldn't find a single user of the concept (which isn't particularly surprising, as it is broken...) This reduced the (already way too complex) cgroup handling for us, since we now know whenever we make a change to a cgroup for one controller to which other controllers it applies.	2018-11-16 14:54:13 +01:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Lennart Poettering	ef31828d06	tree-wide: unify how we define bit mak enums Let's always write "1 << 0", "1 << 1" and so on, except where we need more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force 64bit values.	2018-06-12 21:44:00 +02:00
Zbigniew Jędrzejewski-Szmek	6978efcffb	core/mount-setup: remove part of check which is always true `f1470e424b` removed one check, but missed a similar one a few lines down. CID #1390949.	2018-05-14 08:50:00 +02:00
Zbigniew Jędrzejewski-Szmek	f1470e424b	core/mount-setup: remove part of check which is always true k was set to join_controllers at this point and only incremented, so it cannot be null at this point. CID #1390949.	2018-05-10 02:03:23 +02:00
Lennart Poettering	fe80fcc7e8	mount-setup: add a comment that the character/block device nodes are "optional" (#8893 ) if we lack privs to create device nodes that's fine, and creating /run/systemd/inaccessible/chr or /run/systemd/inaccessible/blk won't work then. Document this in longer comments. Fixes: #4484	2018-05-03 23:10:35 +09:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Lennart Poettering	771b7ead84	machine-image,mount-setup: minor coding style fixes	2018-03-28 22:04:58 +02:00
Krzysztof Nowicki	6f7729c176	core: dont't remount /sys/fs/cgroup for relabel if not needed (#8595 ) The initial fix for relabelling the cgroup filesystem for SELinux delivered in commit `8739f23e3` was based on the assumption that the cgroup filesystem is already populated once mount_setup() is executed, which was true for my system. What I wasn't aware is that this is the case only when another instance of systemd was running before this one, which can happen if systemd is used in the initrd (for ex. by dracut). In case of a clean systemd start-up the cgroup filesystem is actually being populated after mount_setup() and does not need relabelling as at that moment the SELinux policy is already loaded. Since however the root cgroup filesystem was remounted read-only in the meantime this operation will now fail. To fix this check for the filesystem mount flags before relabelling and only remount ro->rw->ro if necessary and leave the filesystem read-write otherwise. Fixes #7901.	2018-03-28 13:36:33 +02:00
Lennart Poettering	08c849815c	label: rework label_fix() implementations (#8583 ) This reworks the SELinux and SMACK label fixing calls in a number of ways: 1. The two separate boolean arguments of these functions are converted into a flags type LabelFixFlags. 2. The operations are now implemented based on O_PATH. This should resolve TTOCTTOU races between determining the label for the file system object and applying it, as it it allows to pin the object while we are operating on it. 3. When changing a label fails we'll query the label previously set, and if matches what we want to set anyway we'll suppress the error. Also, all calls to label_fix() are now (void)ified, when we ignore the return values. Fixes: #8566	2018-03-27 07:38:26 +02:00
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Yu Watanabe	5cbaad2f67	core: do not free heap-allocated strings (#8391 ) Fixes #8387.	2018-03-08 14:21:54 +01:00
Lennart Poettering	39f305a901	mount-setup: change bpf mount mode to 0700 (#8334 ) After discussing with the kernel folks, we agreed to default to 0700 for this. Better safe than sorry.	2018-03-02 12:55:24 +01:00
Lennart Poettering	6590080851	mount-setup: always use the same source as fstype for the API VFS we mount So far, for all our API VFS mounts we used the fstype also as mount source, let's do that for the cgroupsv2 mounts too. The kernel doesn't really care about the source for API VFS, but it's visible to the user, hence let's clean this up and follow the rule we otherwise follow.	2018-02-21 16:43:36 +01:00
Lennart Poettering	43b7f24b5e	bpf: mount bpffs by default on boot We make heavy use of BPF functionality these days, hence expose the BPF file system too by default now. (Note however, that we don't actually make use bpf file systems object yet, but we might later on too.)	2018-02-21 16:43:36 +01:00
Zbigniew Jędrzejewski-Szmek	56c8d7444a	pid1: do not initialize join_controllers by default We're moving towards unified cgroup hierarchy where this is not necessary. This makes main.c a bit simpler.	2018-02-19 15:18:54 +01:00
Lennart Poettering	713a88757a	mount-setup: fix MNT_CHECK_WRITABLE error handling, and log about the issue Let's correct the error handling (the error is in errno, not r), and let's add logging like the rest of the function has it.	2017-12-15 20:52:28 +01:00
Krzysztof Nowicki	8739f23e3c	Fix SELinux labels in cgroup filesystem root directory (#7496 ) When using SELinux with legacy cgroups the tmpfs on /sys/fs/cgroup is by default labelled as tmpfs_t. This label is also inherited by the "cpu" and "cpuacct" symbolic links. Unfortunately the policy expects them to be labelled as cgroup_t, which is used for all the actual cgroup filesystems. Failure to do so results in a stream of denials. This state cannot be fixed reliably when the cgroup filesystem structure is set-up as the SELinux policy is not yet loaded at this moment. It also cannot be fixed later as the root of the cgroup filesystem is remounted read-only. In order to fix it the root of the cgroup filesystem needs to be temporary remounted read-write, relabelled and remounted back read-only.	2017-11-30 11:59:29 +01:00
Christian Brauner	1ff654e28b	core: remove empty cgroups (#7457 ) When we skip an unwritable cgroup also remove the empty mountpoint.	2017-11-24 21:05:16 +01:00
Christian Brauner	2d56b80a18	cgroup: test whether pure unified hierarchy is writable If it is not writable we should not mount it.	2017-11-22 17:35:21 +01:00
Christian Brauner	e07aefbd67	cgroup: check whether unified hierarchy is writable When systemd is running inside a container employing user namespaces it currently mounts the unified cgroup hierarchy without being able to write to it. This causes systemd to freeze during boot. This patch checks whether the unified cgroup hierarchy is writable. If it is not it will not mount it. This solution is based on a patch by Evgeny Vereshchagin. Closes #6408. Closes https://github.com/lxc/lxc/issues/1678 .	2017-11-22 17:34:25 +01:00
Lennart Poettering	6925a0de4e	cgroup-util: move Set* allocation into cg_kernel_controllers() Previously, callers had to do this on their own. Let's make the call do that instead, making the caller code a bit shorter.	2017-11-21 11:54:08 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek	f9fa32f09c	build-sys: s/HAVE_SMACK/ENABLE_SMACK/ Same justification as for HAVE_UTMP.	2017-10-04 12:09:50 +02:00
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
vliaskov	6c24adfd46	Revert "mount-setup: mount xenfs filesystem (#6491 )" (#6662 ) This reverts commit `b305bd3aab`.	2017-08-28 18:46:01 +02:00
vliaskov	b305bd3aab	mount-setup: mount xenfs filesystem (#6491 )	2017-07-31 15:59:02 +02:00
Tejun Heo	4095205ecc	core: support "nsdelegate" cgroup v2 mount option (#6294 ) cgroup namespace wasn't useful for delegation because it allowed resource control interface files (e.g. memory.high) to be written from inside the namespace - this allowed the namespace parent's resource distribution to be disturbed by its namespace-scoped children. A new mount option, "nsdelegate", was added to cgroup v2 to address this issue. The flag is meangingful only when mounting cgroup v2 in the init namespace and makes a cgroup namespace a delegation boundary. The kernel feature is pending for v4.13. This should have been the default behavior on cgroup namespaces and this commit makes systemd try "nsdelegate" first when trying to mount cgroup v2 and fall back if the option is not supported. Note that this has danger of breaking usages which depend on modifying the parent's resource settings from the namespace root, which isn't a valid thing to do, but such usages may still exist.	2017-07-14 19:27:13 +02:00

1 2 3

135 commits