Systemd

Author	SHA1	Message	Date
Yu Watanabe	db9ecf0501	license: LGPL-2.1+ -> LGPL-2.1-or-later	2020-11-09 13:23:58 +09:00
Lennart Poettering	30f5d10421	mount-util: rework umount_verbose() to take log level and flags arg Let's make umount_verbose() more like mount_verbose_xyz(), i.e. take log level and flags param. In particular the latter matters, since we typically don't actually want to follow symlinks when unmounting.	2020-09-23 18:57:36 +02:00
Lennart Poettering	511a8cfe30	mount-util: switch most mount_verbose() code over to not follow symlinks	2020-09-23 18:57:36 +02:00
Zbigniew Jędrzejewski-Szmek	90e30d767a	Rename strv_split_extract() to strv_split_full() Now that _full() is gone, we can rename _extract() to have the usual suffix we use for the more featureful version.	2020-09-09 09:34:55 +02:00
Zbigniew Jędrzejewski-Szmek	b67ec8e5b2	pid1: stop limiting size of /dev/shm The explicit limit is dropped, which means that we return to the kernel default of 50% of RAM. See `362a55fc14` for a discussion why that is not as much as it seems. It turns out various applications need more space in /dev/shm and we would break them by imposing a low limit. While at it, rename the define and use a single macro for various tmpfs mounts. We don't really care what the purpose of the given tmpfs is, so it seems reasonable to use a single macro. This effectively reverts part of `7d85383edb`. Fixes #16617.	2020-07-30 18:48:35 +02:00
Lennart Poettering	d64e32c245	nspawn: rework how /run/host/ is set up Let's find the right os-release file on the host side, and only mount the one that matters, i.e. /etc/os-release if it exists and /usr/lib/os-release otherwise. Use the fixed path /run/host/os-release for that. Let's also mount /run/host as a bind mount on itself before we set up /run/host, and let's mount it MS_RDONLY after we are done, so that it remains immutable as a whole.	2020-07-23 18:47:38 +02:00
Luca Boccassi	14f1c47a0c	nspawn: mount os-release in two steps to make it read-only The kernel interface requires setting up read-only bind-mounts in two steps, the bind first and then a read-only remount. Fix nspawn-mount, and cover this case in the integration test. Fixes #16484	2020-07-16 09:59:59 +01:00
Luca Boccassi	eafc7d6056	nspawn: use access/F_OK instead of stat to check for file existence	2020-07-16 09:59:59 +01:00
Luca Boccassi	e1bb4b0d1d	nspawn: implement container host os-release interface	2020-06-23 12:58:21 +01:00
Luca Boccassi	b3b1a08a56	nspawn: use mkdir_p_safe instead of homegrown version	2020-06-23 12:57:05 +01:00
Lennart Poettering	6fe01ced0e	nspawn: mkdir selinux mount point once, but not twice Since #15533 we didn't create the mount point for selinuxfs anymore. Before it we created it twice because we mount selinuxfs twice: once the superblock, and once we remount its bind mound read-only. The second mkdir would mean we'd chown() the host version of selinuxfs (since there's only one selinuxfs superblock kernel-wide). The right time to create mount point point is once: before we mount the selinuxfs. But not a second time for the remount. Fixes: #16032	2020-06-23 10:17:36 +02:00
Lennart Poettering	48b747fa03	inaccessible: move inaccessible file nodes to /systemd/ subdir in runtime dir always Let's make sure $XDG_RUNTIME_DIR for the user instance and /run for the system instance is always organized the same way: the "inaccessible" device nodes should be placed in a subdir of either called "systemd" and a subdir of that called "inaccessible". This way we can emphasize the common behaviour, and only differ where really necessary. Follow-up for #13823	2020-06-09 16:23:56 +02:00
Topi Miettinen	7d85383edb	tree-wide: add size limits for tmpfs mounts Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.	2020-05-13 00:37:18 +02:00
Lennart Poettering	dcff2fa5d1	nspawn: be more careful with creating/chowning directories to overmount We should never re-chown selinuxfs. Fixes: #15475	2020-04-28 19:40:46 +02:00
Yu Watanabe	9610210d32	nspawn: voidify umount_verbose() Fixes CID#1415122.	2020-01-31 23:10:29 +09:00
Daan De Meyer	bbd407ea2b	nspawn: Don't mount read-only if we have a custom mount on root.	2020-01-03 14:06:38 +01:00
Anita Zhang	e5f10cafe0	core: create inaccessible nodes for users when making runtime dirs To support ProtectHome=y in a user namespace (which mounts the inaccessible nodes), the nodes need to be accessible by the user. Create these paths and devices in the user runtime directory so they can be used later if needed.	2019-12-18 11:09:30 -08:00
Lennart Poettering	d0556c55e7	nspawn: fix overlay with automatic temporary tree This makes --overlay=+/foobar::/foobar work again, i.e. where the middle parameter is left out. According to the documentation this is supposed to generate a temporary writable work place in the midle. But it apparently never did. Weird.	2019-12-13 15:11:38 +01:00
Daan De Meyer	bd6609eb11	nspawn-mount: Use FLAGS_SET to check flags.	2019-12-12 20:18:37 +01:00
Daan De Meyer	e091a5dfd1	nspawn-mount: Remove unused parameters	2019-12-12 20:15:10 +01:00
Daan De Meyer	5f0a6347ac	nspawn: Enable specifying root as the mount target directory. Fixes #3847.	2019-12-12 20:15:03 +01:00
Zbigniew Jędrzejewski-Szmek	a5648b8094	basic/fs-util: change CHASE_OPEN flag into a separate output parameter chase_symlinks() would return negative on error, and either a non-negative status or a non-negative fd when CHASE_OPEN was given. This made the interface quite complicated, because dependning on the flags used, we would get two different "types" of return object. Coverity was always confused by this, and flagged every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it would this that an fd is returned). This patch uses a saparate output parameter, so there is no confusion. (I think it is OK to have functions which return either an error or an fd. It's only returning either an fd or a non-fd that is confusing.)	2019-10-24 22:44:24 +09:00
Frantisek Sumsal	38288f0bb8	tree-wide: various code-formatting improvements Reported/found by Coccinelle	2019-09-22 07:17:27 +02:00
Lennart Poettering	07b9f3f03c	nspawn: print an explanatory error when people try to use --volatile=yes on distros that are not /usr-merged	2019-07-29 11:30:47 +02:00
Iago López Galeiras	a11fd4067b	Revert "nspawn: remove unnecessary mount option parsing logic" This reverts commit `72d967df3e`. Revert this because it broke the `norbind` option of the bind flags because it does bind-mounts unconditionally recursive. Let's bring the old logic back. Fixes: #13170	2019-07-24 17:17:42 +02:00
Lennart Poettering	cee97d5768	Merge pull request #12836 from yuwata/tree-wide-replace-strjoin tree-wide: replace strjoin() with path_join()	2019-06-22 20:02:46 +02:00
Lennart Poettering	c6134d3e2f	path-util: get rid of prefix_root() prefix_root() is equivalent to path_join() in almost all ways, hence let's remove it. There are subtle differences though: prefix_root() will try shorten multiple "/" before and after the prefix. path_join() doesn't do that. This means prefix_root() might return a string shorter than both its inputs combined, while path_join() never does that. I like the path_join() semantics better, hence I think dropping prefix_root() is totally OK. In the end the strings generated by both functon should always be identical in terms of path_equal() if not streq(). This leaves prefix_roota() in place. Ideally we'd have path_joina(), but I don't think we can reasonably implement that as a macro. or maybe we can? (if so, sounds like something for a later PR) Also add in a few missing OOM checks	2019-06-21 08:42:55 +09:00
Yu Watanabe	657ee2d82b	tree-wide: replace strjoin() with path_join()	2019-06-21 03:26:16 +09:00
Zbigniew Jędrzejewski-Szmek	ca78ad1de9	headers: remove unneeded includes from util.h This means we need to include many more headers in various files that simply included util.h before, but it seems cleaner to do it this way.	2019-03-27 11:53:12 +01:00
Zbigniew Jędrzejewski-Szmek	e1af3bc62a	Merge pull request #12106 from poettering/nosuidns add "nosuid" flag to exec directory mounts of DynamicUser=1 services	2019-03-26 08:58:00 +01:00
Lennart Poettering	849b9b85b8	nspawn: mount mqueue with nodev,noexec,nosuid, too The host mounts it like that, nspawn hence should do too. Moreover, mount the file system after doing CLONEW_NEWIPC so that it actually reflects the right mqueues. Finally, mount it wthout considering it fatal, since POSIX mqueue support is little used and it should be fine not to support it in the kernel.	2019-03-25 19:53:05 +01:00
Lennart Poettering	64e82c1976	mount-util: beef up bind_remount_recursive() to be able to toggle more than MS_RDONLY The function is otherwise generic enough to toggle other bind mount flags beyond MS_RDONLY (for example: MS_NOSUID or MS_NODEV), hence let's beef it up slightly to support that too.	2019-03-25 19:33:55 +01:00
Lennart Poettering	2c9b7a7e62	mount: when we fail to establish an inaccessible mount gracefully, undo the mount	2019-03-21 12:41:02 +01:00
Zbigniew Jędrzejewski-Szmek	d0b6a10c00	Merge pull request #9762 from poettering/nspawn-oci OCI runtime support for nspawn	2019-03-21 11:01:53 +01:00
Yu Watanabe	1d0c1146ea	nspawn: fix memleak Fixes oss-fuzz#13691.	2019-03-15 23:53:05 +09:00
Lennart Poettering	de40a3037a	nspawn: add support for executing OCI runtime bundles with nspawn This is a pretty large patch, and adds support for OCI runtime bundles to nspawn. A new switch --oci-bundle= is added that takes a path to an OCI bundle. The JSON file included therein is read similar to a .nspawn settings files, however with a different feature set. Implementation-wise this mostly extends the pre-existing Settings object to carry additional properties for OCI. However, OCI supports some concepts .nspawn files did not support yet, which this patch also adds: 1. Support for "masking" files and directories. This functionatly is now also available via the new --inaccesible= cmdline command, and Inaccessible= in .nspawn files. 2. Support for mounting arbitrary file systems. (not exposed through nspawn cmdline nor .nspawn files, because probably not a good idea) 3. Ability to configure the console settings for a container. This functionality is now also available on the nspawn cmdline in the new --console= switch (not added to .nspawn for now, as it is something specific to the invocation really, not a property of the container) 4. Console width/height configuration. Not exposed through .nspawn/cmdline, but this may be controlled through $COLUMNS and $LINES like in most other UNIX tools. 5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on the cmdline, since containers likely have different user tables, and the existing --user= switch appears to be the better option) 6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to OCI) 7. Creation of additional devices nodes in /dev. Most likely not a good idea, hence not exposed in .nspawn/cmdline. There's already --bind= to achieve the same, which is the better alternative. 8. Explicit syscall filters. This is not a good idea, due to the skewed arch support, hence not exposed through .nspawn/cmdline. 9. Configuration of some sysctls on a whitelist. Questionnable, not supported in .nspawn/cmdline for now. 10. Configuration of all 5 types of capabilities. Not a useful concept, since the kernel will reduce the caps on execve() anyway. Not exposed through .nspawn/cmdline as this is not very useful hence. Note that this only implements the OCI runtime logic itself. It does not provide a runc-compatible command line tool. This is left for a later PR. Only with that in place tools such as "buildah" can use the OCI support in nspawn as drop-in replacement. Currently still missing is OCI hook support, but it's already parsed and everything, and should be easy to add. Other than that it's OCI is implemented pretty comprehensively. There's a list of incompatibilities in the nspawn-oci.c file. In a later PR I'd like to convert this into proper markdown and add it to the documentation directory.	2019-03-15 15:41:28 +01:00
Lennart Poettering	760877e90c	util: split out sorting related calls to new sort-util.[ch]	2019-03-13 12:16:43 +01:00
Zbigniew Jędrzejewski-Szmek	0e636bf51a	nspawn: fix memleak uncovered by fuzzer Also use TAKE_PTR as appropriate.	2019-03-11 14:29:30 +01:00
Lennart Poettering	6c610acaaa	nspawn: add --volatile=overlay support Fixes: #11054 #3847	2019-03-01 14:11:06 +01:00
Lennart Poettering	c55d0ae764	nspawn: fix an error path	2019-03-01 14:11:06 +01:00
Lennart Poettering	e5b43a04b6	nspawn: add volatile mode multiplexer call setup_volatile_mode() Just some refactoring, no change in behaviour.	2019-03-01 14:11:06 +01:00
Lennart Poettering	0646d3c3dd	nspawn: explicitly refuse mounts over / Previously this would fail later on, but let's filter this out at the time of parsing.	2019-03-01 14:11:06 +01:00
Lennart Poettering	e4de72876e	util-lib: split out all temporary file related calls into tmpfiles-util.c This splits out a bunch of functions from fileio.c that have to do with temporary files. Simply to make the header files a bit shorter, and to group things more nicely. No code changes, just some rearranging of source files.	2018-12-02 13:22:29 +01:00
Zbigniew Jędrzejewski-Szmek	b2ac2b01c8	Merge pull request #10996 from poettering/oci-prep Preparation for the nspawn-OCI work	2018-11-30 10:09:00 +01:00
Zbigniew Jędrzejewski-Szmek	049af8ad0c	Split out part of mount-util.c into mountpoint-util.c The idea is that anything which is related to actually manipulating mounts is in mount-util.c, but functions for mountpoint introspection are moved to the new file. Anything which requires libmount must be in mount-util.c. This was supposed to be a preparation for further changes, with no functional difference, but it results in a significant change in linkage: $ ldd build/libnss_*.so.2 (before) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff77bf5000) librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000) libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000) /lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffc19cc0000) librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000) /lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe8ef8e000) librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000) libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000) /lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffda6d17000) librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000) libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000) /lib64/ld-linux-x86-64.so.2 (0x00007f610b995000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000) (after) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff0b5e2000) librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000) libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000) /lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffdc30a7000) librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000) libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000) /lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe95747000) librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000) libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000) /lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffe67b51000) librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000) libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000) libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000) /lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000) I don't quite understand what is going on here, but let's not be too picky.	2018-11-29 21:03:44 +01:00
Lennart Poettering	17c58ba97b	nspawn: let's also pre-mount /dev/mqueue	2018-11-29 20:21:40 +01:00
Zbigniew Jędrzejewski-Szmek	baaa35ad70	coccinelle: make use of SYNTHETIC_ERRNO Ideally, coccinelle would strip unnecessary braces too. But I do not see any option in coccinelle for this, so instead, I edited the patch text using search&replace to remove the braces. Unfortunately this is not fully automatic, in particular it didn't deal well with if-else-if-else blocks and ifdefs, so there is an increased likelikehood be some bugs in such spots. I also removed part of the patch that coccinelle generated for udev, where we returns -1 for failure. This should be fixed independently.	2018-11-22 10:54:38 +01:00
Lennart Poettering	1099ceebce	nspawn: optionally don't mount a tmpfs over /tmp (#10294 ) nspawn: optionally, don't mount a tmpfs on /tmp Fixes: #10260	2018-10-08 18:32:03 +02:00
Yu Watanabe	93bab28895	tree-wide: use typesafe_qsort()	2018-09-19 08:02:52 +09:00
Franck Bui	03d0f4b58e	nspawn: always use mode 555 for /sys When a network namespace is needed, /sys is mounted as tmpfs (see commit `d8fc6a000f` for details). But in this case mode 755 was used as initial permissions for /sys whereas the default mode for sysfs is 555. In practice using 755 doesn't have any impact because /sys is mounted read-only too but for consistency, let's use the correct mode. Fixes: #10050	2018-09-11 00:34:00 +02:00

1 2 3

137 commits