Systemd

Author	SHA1	Message	Date
Topi Miettinen	7d85383edb	tree-wide: add size limits for tmpfs mounts Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.	2020-05-13 00:37:18 +02:00
Lennart Poettering	dcff2fa5d1	nspawn: be more careful with creating/chowning directories to overmount We should never re-chown selinuxfs. Fixes: #15475	2020-04-28 19:40:46 +02:00
Yu Watanabe	9610210d32	nspawn: voidify umount_verbose() Fixes CID#1415122.	2020-01-31 23:10:29 +09:00
Daan De Meyer	bbd407ea2b	nspawn: Don't mount read-only if we have a custom mount on root.	2020-01-03 14:06:38 +01:00
Anita Zhang	e5f10cafe0	core: create inaccessible nodes for users when making runtime dirs To support ProtectHome=y in a user namespace (which mounts the inaccessible nodes), the nodes need to be accessible by the user. Create these paths and devices in the user runtime directory so they can be used later if needed.	2019-12-18 11:09:30 -08:00
Lennart Poettering	d0556c55e7	nspawn: fix overlay with automatic temporary tree This makes --overlay=+/foobar::/foobar work again, i.e. where the middle parameter is left out. According to the documentation this is supposed to generate a temporary writable work place in the midle. But it apparently never did. Weird.	2019-12-13 15:11:38 +01:00
Daan De Meyer	bd6609eb11	nspawn-mount: Use FLAGS_SET to check flags.	2019-12-12 20:18:37 +01:00
Daan De Meyer	e091a5dfd1	nspawn-mount: Remove unused parameters	2019-12-12 20:15:10 +01:00
Daan De Meyer	5f0a6347ac	nspawn: Enable specifying root as the mount target directory. Fixes #3847.	2019-12-12 20:15:03 +01:00
Zbigniew Jędrzejewski-Szmek	a5648b8094	basic/fs-util: change CHASE_OPEN flag into a separate output parameter chase_symlinks() would return negative on error, and either a non-negative status or a non-negative fd when CHASE_OPEN was given. This made the interface quite complicated, because dependning on the flags used, we would get two different "types" of return object. Coverity was always confused by this, and flagged every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it would this that an fd is returned). This patch uses a saparate output parameter, so there is no confusion. (I think it is OK to have functions which return either an error or an fd. It's only returning either an fd or a non-fd that is confusing.)	2019-10-24 22:44:24 +09:00
Frantisek Sumsal	38288f0bb8	tree-wide: various code-formatting improvements Reported/found by Coccinelle	2019-09-22 07:17:27 +02:00
Lennart Poettering	07b9f3f03c	nspawn: print an explanatory error when people try to use --volatile=yes on distros that are not /usr-merged	2019-07-29 11:30:47 +02:00
Iago López Galeiras	a11fd4067b	Revert "nspawn: remove unnecessary mount option parsing logic" This reverts commit `72d967df3e`. Revert this because it broke the `norbind` option of the bind flags because it does bind-mounts unconditionally recursive. Let's bring the old logic back. Fixes: #13170	2019-07-24 17:17:42 +02:00
Lennart Poettering	cee97d5768	Merge pull request #12836 from yuwata/tree-wide-replace-strjoin tree-wide: replace strjoin() with path_join()	2019-06-22 20:02:46 +02:00
Lennart Poettering	c6134d3e2f	path-util: get rid of prefix_root() prefix_root() is equivalent to path_join() in almost all ways, hence let's remove it. There are subtle differences though: prefix_root() will try shorten multiple "/" before and after the prefix. path_join() doesn't do that. This means prefix_root() might return a string shorter than both its inputs combined, while path_join() never does that. I like the path_join() semantics better, hence I think dropping prefix_root() is totally OK. In the end the strings generated by both functon should always be identical in terms of path_equal() if not streq(). This leaves prefix_roota() in place. Ideally we'd have path_joina(), but I don't think we can reasonably implement that as a macro. or maybe we can? (if so, sounds like something for a later PR) Also add in a few missing OOM checks	2019-06-21 08:42:55 +09:00
Yu Watanabe	657ee2d82b	tree-wide: replace strjoin() with path_join()	2019-06-21 03:26:16 +09:00
Zbigniew Jędrzejewski-Szmek	ca78ad1de9	headers: remove unneeded includes from util.h This means we need to include many more headers in various files that simply included util.h before, but it seems cleaner to do it this way.	2019-03-27 11:53:12 +01:00
Zbigniew Jędrzejewski-Szmek	e1af3bc62a	Merge pull request #12106 from poettering/nosuidns add "nosuid" flag to exec directory mounts of DynamicUser=1 services	2019-03-26 08:58:00 +01:00
Lennart Poettering	849b9b85b8	nspawn: mount mqueue with nodev,noexec,nosuid, too The host mounts it like that, nspawn hence should do too. Moreover, mount the file system after doing CLONEW_NEWIPC so that it actually reflects the right mqueues. Finally, mount it wthout considering it fatal, since POSIX mqueue support is little used and it should be fine not to support it in the kernel.	2019-03-25 19:53:05 +01:00
Lennart Poettering	64e82c1976	mount-util: beef up bind_remount_recursive() to be able to toggle more than MS_RDONLY The function is otherwise generic enough to toggle other bind mount flags beyond MS_RDONLY (for example: MS_NOSUID or MS_NODEV), hence let's beef it up slightly to support that too.	2019-03-25 19:33:55 +01:00
Lennart Poettering	2c9b7a7e62	mount: when we fail to establish an inaccessible mount gracefully, undo the mount	2019-03-21 12:41:02 +01:00
Zbigniew Jędrzejewski-Szmek	d0b6a10c00	Merge pull request #9762 from poettering/nspawn-oci OCI runtime support for nspawn	2019-03-21 11:01:53 +01:00
Yu Watanabe	1d0c1146ea	nspawn: fix memleak Fixes oss-fuzz#13691.	2019-03-15 23:53:05 +09:00
Lennart Poettering	de40a3037a	nspawn: add support for executing OCI runtime bundles with nspawn This is a pretty large patch, and adds support for OCI runtime bundles to nspawn. A new switch --oci-bundle= is added that takes a path to an OCI bundle. The JSON file included therein is read similar to a .nspawn settings files, however with a different feature set. Implementation-wise this mostly extends the pre-existing Settings object to carry additional properties for OCI. However, OCI supports some concepts .nspawn files did not support yet, which this patch also adds: 1. Support for "masking" files and directories. This functionatly is now also available via the new --inaccesible= cmdline command, and Inaccessible= in .nspawn files. 2. Support for mounting arbitrary file systems. (not exposed through nspawn cmdline nor .nspawn files, because probably not a good idea) 3. Ability to configure the console settings for a container. This functionality is now also available on the nspawn cmdline in the new --console= switch (not added to .nspawn for now, as it is something specific to the invocation really, not a property of the container) 4. Console width/height configuration. Not exposed through .nspawn/cmdline, but this may be controlled through $COLUMNS and $LINES like in most other UNIX tools. 5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on the cmdline, since containers likely have different user tables, and the existing --user= switch appears to be the better option) 6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to OCI) 7. Creation of additional devices nodes in /dev. Most likely not a good idea, hence not exposed in .nspawn/cmdline. There's already --bind= to achieve the same, which is the better alternative. 8. Explicit syscall filters. This is not a good idea, due to the skewed arch support, hence not exposed through .nspawn/cmdline. 9. Configuration of some sysctls on a whitelist. Questionnable, not supported in .nspawn/cmdline for now. 10. Configuration of all 5 types of capabilities. Not a useful concept, since the kernel will reduce the caps on execve() anyway. Not exposed through .nspawn/cmdline as this is not very useful hence. Note that this only implements the OCI runtime logic itself. It does not provide a runc-compatible command line tool. This is left for a later PR. Only with that in place tools such as "buildah" can use the OCI support in nspawn as drop-in replacement. Currently still missing is OCI hook support, but it's already parsed and everything, and should be easy to add. Other than that it's OCI is implemented pretty comprehensively. There's a list of incompatibilities in the nspawn-oci.c file. In a later PR I'd like to convert this into proper markdown and add it to the documentation directory.	2019-03-15 15:41:28 +01:00
Lennart Poettering	760877e90c	util: split out sorting related calls to new sort-util.[ch]	2019-03-13 12:16:43 +01:00
Zbigniew Jędrzejewski-Szmek	0e636bf51a	nspawn: fix memleak uncovered by fuzzer Also use TAKE_PTR as appropriate.	2019-03-11 14:29:30 +01:00
Lennart Poettering	6c610acaaa	nspawn: add --volatile=overlay support Fixes: #11054 #3847	2019-03-01 14:11:06 +01:00
Lennart Poettering	c55d0ae764	nspawn: fix an error path	2019-03-01 14:11:06 +01:00
Lennart Poettering	e5b43a04b6	nspawn: add volatile mode multiplexer call setup_volatile_mode() Just some refactoring, no change in behaviour.	2019-03-01 14:11:06 +01:00
Lennart Poettering	0646d3c3dd	nspawn: explicitly refuse mounts over / Previously this would fail later on, but let's filter this out at the time of parsing.	2019-03-01 14:11:06 +01:00
Lennart Poettering	e4de72876e	util-lib: split out all temporary file related calls into tmpfiles-util.c This splits out a bunch of functions from fileio.c that have to do with temporary files. Simply to make the header files a bit shorter, and to group things more nicely. No code changes, just some rearranging of source files.	2018-12-02 13:22:29 +01:00
Zbigniew Jędrzejewski-Szmek	b2ac2b01c8	Merge pull request #10996 from poettering/oci-prep Preparation for the nspawn-OCI work	2018-11-30 10:09:00 +01:00
Zbigniew Jędrzejewski-Szmek	049af8ad0c	Split out part of mount-util.c into mountpoint-util.c The idea is that anything which is related to actually manipulating mounts is in mount-util.c, but functions for mountpoint introspection are moved to the new file. Anything which requires libmount must be in mount-util.c. This was supposed to be a preparation for further changes, with no functional difference, but it results in a significant change in linkage: $ ldd build/libnss_*.so.2 (before) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff77bf5000) librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000) libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000) /lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffc19cc0000) librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000) libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000) /lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe8ef8e000) librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000) libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000) libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000) /lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffda6d17000) librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000) libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000) /lib64/ld-linux-x86-64.so.2 (0x00007f610b995000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000) (after) build/libnss_myhostname.so.2: linux-vdso.so.1 (0x00007fff0b5e2000) librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000) libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000) /lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000) build/libnss_mymachines.so.2: linux-vdso.so.1 (0x00007ffdc30a7000) librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000) libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000) /lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000) build/libnss_resolve.so.2: linux-vdso.so.1 (0x00007ffe95747000) librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000) libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000) libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000) /lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000) build/libnss_systemd.so.2: linux-vdso.so.1 (0x00007ffe67b51000) librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000) libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000) libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000) /lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000) I don't quite understand what is going on here, but let's not be too picky.	2018-11-29 21:03:44 +01:00
Lennart Poettering	17c58ba97b	nspawn: let's also pre-mount /dev/mqueue	2018-11-29 20:21:40 +01:00
Zbigniew Jędrzejewski-Szmek	baaa35ad70	coccinelle: make use of SYNTHETIC_ERRNO Ideally, coccinelle would strip unnecessary braces too. But I do not see any option in coccinelle for this, so instead, I edited the patch text using search&replace to remove the braces. Unfortunately this is not fully automatic, in particular it didn't deal well with if-else-if-else blocks and ifdefs, so there is an increased likelikehood be some bugs in such spots. I also removed part of the patch that coccinelle generated for udev, where we returns -1 for failure. This should be fixed independently.	2018-11-22 10:54:38 +01:00
Lennart Poettering	1099ceebce	nspawn: optionally don't mount a tmpfs over /tmp (#10294 ) nspawn: optionally, don't mount a tmpfs on /tmp Fixes: #10260	2018-10-08 18:32:03 +02:00
Yu Watanabe	93bab28895	tree-wide: use typesafe_qsort()	2018-09-19 08:02:52 +09:00
Franck Bui	03d0f4b58e	nspawn: always use mode 555 for /sys When a network namespace is needed, /sys is mounted as tmpfs (see commit `d8fc6a000f` for details). But in this case mode 755 was used as initial permissions for /sys whereas the default mode for sysfs is 555. In practice using 755 doesn't have any impact because /sys is mounted read-only too but for consistency, let's use the correct mode. Fixes: #10050	2018-09-11 00:34:00 +02:00
Luke Shumaker	677a72cd3e	nspawn: mount_sysfs(): Unconditionally mkdir /sys/fs/cgroup Currently, mount_sysfs() only creates /sys/fs/cgroup if cg_ns_supported(). The comment explains that we need to "Create mountpoint for cgroups. Otherwise we are not allowed since we remount /sys read-only."; that is: that we need to do it now, rather than later. However, the comment doesn't do anything to explain why we only need to do this if cg_ns_supported(); shouldn't we _always_ need to do it? The answer is that if !use_cgns, then this was already done by the outer child, so mount_sysfs() only needs to do it if use_cgns. Now, mount_sysfs() doesn't know whether use_cgns, but !cg_ns_supported() implies !use_cgns, so we can optimize" the case where we _know_ !use_cgns, and deal with a no-op mkdir_p() in the false-positive where cgns_supported() but !use_cgns. But is it really much of an optimization? We're potentially spending an access(2) (cg_ns_supported() could be cached from a previous call) to potentially save an lstat(2) and mkdir(2); and all of them are on virtual fileystems, so they should all be pretty cheap. So, simplify and drop the conditional. It's a dubious optimization that requires more text to explain than it's worth.	2018-07-20 12:12:03 -04:00
Luke Shumaker	0402948206	nspawn: Move cgroup mount stuff from nspawn-mount.c to nspawn-cgroup.c	2018-07-20 12:12:02 -04:00
Luke Shumaker	2fa017f169	nspawn: Simplify tmpfs_patch_options() usage, and trickle that up One of the things that tmpfs_patch_options does is take an (optional) UID, and insert "uid=${UID},gid=${UID}" into the options string. So we need a uid_t argument, and a way of telling if we should use it. Fortunately, that is built in to the uid_t type by having UID_INVALID as a possible value. So this is really a feature that requires one argument. Yet, it is somehow taking 4! That is absurd. Simplify it to only take one argument, and have that trickle all the way up to mount_all()'s usage. Now, in may of the uses, the argument becomes uid_shift == 0 ? UID_INVALID : uid_shift because it used to treat uid_shift=0 as invalid unless the patch_ids flag was also set. This keeps the behavior the same. Note that in all cases where it is invoked, if !use_userns (sometimes called !userns), then uid_shift is 0; we don't have to add any checks for that. That said, I'm pretty sure that "uid=0" and not setting "uid=" are the same, but Christian Brauner seemed to not think so when implementing the cgns support. https://github.com/systemd/systemd/pull/3589	2018-07-20 12:12:02 -04:00
Luke Shumaker	9c0fad5fb5	nspawn: Simplify mkdir_userns() usage, and trickle that up One of the things that mkdir_userns{,_p}() does is take an (optional) UID, and chown the directory to that. So we need a uid_t argument, and a way of telling if we should use that uid_t argument. Fortunately, that is built in to the uid_t type by having UID_INVALID as a possible value. However, currently mkdir_userns() also takes a MountSettingsMask and checks a couple of bits in it to decide if it should perform the chown. Drop the mask argument, and instead have the caller pass UID_INVALID if it shouldn't chown.	2018-07-20 12:12:02 -04:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Lennart Poettering	d4b653c589	nspawn: lock down a few things in /proc by default This tightens security on /proc: a couple of files exposed there are now made inaccessible. These files might potentially leak kernel internals or expose non-virtualized concepts, hence lock them down by default. Moreover, a couple of dirs in /proc that expose stuff also exposed in /sys are now marked read-only, similar to how we handle /sys. The list is taken from what docker/runc based container managers generally apply, but slightly extended.	2018-05-03 17:45:42 +02:00
Lennart Poettering	10af01a5ff	nspawn: use free_and_replace() at more places	2018-05-03 17:19:46 +02:00
Lennart Poettering	88614c8a28	nspawn: size_t more stuff A follow-up for #8840	2018-05-03 17:19:46 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Yu Watanabe	1cc6c93a95	tree-wide: use TAKE_PTR() and TAKE_FD() macros	2018-04-05 14:26:26 +09:00
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00

1 2 3

125 commits