Systemd

Author	SHA1	Message	Date
Lennart Poettering	5b14956385	Merge pull request #16543 from poettering/nspawn-run-host nspawn: /run/host/ tweaks	2020-08-20 16:20:05 +02:00
Lennart Poettering	0f48ba7b84	nspawn: provide $container and $container_uuid in /run/host too This has the major benefit that the entire payload of the container can access these files there. Previously, we'd set them only as env vars, but that meant only PID 1 could read them directly or other privileged payload code with access to /run/1/environ.	2020-08-20 10:17:55 +02:00
Lennart Poettering	9fac502920	nspawn,pid1: pass "inaccessible" nodes from cntr mgr to pid1 payload via /run/host Let's make /run/host the sole place we pass stuff from host to container in and place the "inaccessible" nodes in /run/host too. In contrast to the previous two commits this is a minor compat break, but not a relevant one I think. Previously the container manager would place these nodes in /run/systemd/inaccessible/ and that's where PID 1 in the container would try to add them too when missing. Container manager and PID 1 in the container would thus manage the same dir together. With this change the container manager now passes an immutable directory to the container and leaves /run/systemd entirely untouched, and managed exclusively by PID 1 inside the container, which is nice to have clear separation on who manages what. In order to make sure systemd then usses the /run/host/inaccesible/ nodes this commit changes PID 1 to look for that dir and if it exists will symlink it to /run/systemd/inaccessible. Now, this will work fine if new nspawn and new pid 1 in the container work together. as then the symlink is created and the difference between the two dirs won't matter. For the case where an old nspawn invokes a new PID 1: in this case things work as they always worked: the dir is managed together. For the case where different container manager invokes a new PID 1: in this case the nodes aren't typically passed in, and PID 1 in the container will try to create them and will likely fail partially (though gracefully) when trying to create char/block device nodes. THis is fine though as there are fallbacks in place for that case. For the case where a new nspawn invokes an old PID1: this is were the (minor) incompatibily happens: in this case new nspawn will place the nodes in the /run/host/inaccessible/ subdir, but the PID 1 in the container won't look for them there. Since the nodes are also not pre-created in /run/systed/inaccessible/ PID 1 will try to create them there as if a different container manager sets them up. This is of course not sexy, but is not a total loss, since as mentioned fallbacks are in place anyway. Hence I think it's OK to accept this minor incompatibility.	2020-08-20 10:17:52 +02:00
Lennart Poettering	e96ceabac9	nspawn: move $NOTIFY_SOCKET into /run/host/ too The sd_notify() socket that nspawn binds that the payload can use to talk to it was previously stored in /run/systemd/nspawn/notify, which is weird (as in the previous commit) since this makes /run/systemd something that is cooperatively maintained by systemd inside the container and nspawn outside of it. We now have a better place where container managers can put the stuff they want to pass to the payload: /run/host/, hence let's make use of that. This is not a compat breakage, since the sd_notify() protocol is based on the $NOTIFY_SOCKET env var, where we place the new socket path.	2020-08-20 10:17:48 +02:00
Lennart Poettering	5a27b39518	nspawn/machine: move mount propagation dir to /run/host/incoming Previously we'd use a directory /run/systemd/nspawn/incoming for accepting mounts to propagate from the host. This is a bit weird, since we have a shared namespace: /run/systemd/ contains both stuff managed by the surround nspawn as well as from the systemd inside. We now have the /run/host/ hierarchy that has special stuff we want to pass from host to container. Let's make use of that here, and move this directory here too. This is not a compat breakage, since the payload never interfaces with that directory natively: it's only nspawn and machined that need to agree on it.	2020-08-20 10:17:25 +02:00
Zbigniew Jędrzejewski-Szmek	b4eaa6cc99	shared/seccomp: use _cleanup_ in one more place (cherry picked from commit 27605d6a836d85563faf41db9f7a72883d44c0ff)	2020-08-19 10:57:30 +02:00
Lennart Poettering	af187ab237	dissect: introduce new helper dissected_image_mount_and_warn() and use it everywhere	2020-08-11 22:26:48 +02:00
Zbigniew Jędrzejewski-Szmek	7e62257219	Merge pull request #16308 from bluca/root_image_options service: add new RootImageOptions feature	2020-08-03 10:04:36 +02:00
Daan De Meyer	6f646e0175	nspawn: Fix incorrect usage of putenv strv_env_get only returns the environment variable value. putenv expects KEY=VALUE format strings. Use setenv instead to fix the use.	2020-08-03 09:58:05 +02:00
Zbigniew Jędrzejewski-Szmek	b67ec8e5b2	pid1: stop limiting size of /dev/shm The explicit limit is dropped, which means that we return to the kernel default of 50% of RAM. See `362a55fc14` for a discussion why that is not as much as it seems. It turns out various applications need more space in /dev/shm and we would break them by imposing a low limit. While at it, rename the define and use a single macro for various tmpfs mounts. We don't really care what the purpose of the given tmpfs is, so it seems reasonable to use a single macro. This effectively reverts part of `7d85383edb`. Fixes #16617.	2020-07-30 18:48:35 +02:00
Luca Boccassi	18d7370587	service: add new RootImageOptions feature Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.	2020-07-29 17:17:32 +01:00
Lennart Poettering	d64e32c245	nspawn: rework how /run/host/ is set up Let's find the right os-release file on the host side, and only mount the one that matters, i.e. /etc/os-release if it exists and /usr/lib/os-release otherwise. Use the fixed path /run/host/os-release for that. Let's also mount /run/host as a bind mount on itself before we set up /run/host, and let's mount it MS_RDONLY after we are done, so that it remains immutable as a whole.	2020-07-23 18:47:38 +02:00
Lennart Poettering	d130181fd8	nspawn: add missing spdx header	2020-07-23 18:47:38 +02:00
Lennart Poettering	2a2e78e969	nspawn: fix MS_SHARED mount propagation for userns containers We want our OS trees to be MS_SHARED by default, so that our service namespacing logic can work correctly. Thus in nspawn we mount everything MS_SHARED when organizing our tree. We do this early on, before changing the user namespace (if that's requested). However CLONE_NEWUSER actually resets MS_SHARED to MS_SLAVE for all mounts (so that less privileged environments can't affect the more privileged ones). Hence, when invoking it we have to reset things to MS_SHARED afterwards again. This won't reestablish propagation, but it will make sure we get a new set of mount peer groups everywhere that then are honoured for the mount namespaces/propagated mounts set up inside the container further down.	2020-07-23 17:08:39 +02:00
Luca Boccassi	ed4512d009	nspawn: set container_host env vars before user arguments Allows users on the command line to seamlessly override $container_host_* just like they can override $container_id and $container	2020-07-20 07:28:22 +02:00
Luca Boccassi	14f1c47a0c	nspawn: mount os-release in two steps to make it read-only The kernel interface requires setting up read-only bind-mounts in two steps, the bind first and then a read-only remount. Fix nspawn-mount, and cover this case in the integration test. Fixes #16484	2020-07-16 09:59:59 +01:00
Luca Boccassi	eafc7d6056	nspawn: use access/F_OK instead of stat to check for file existence	2020-07-16 09:59:59 +01:00
Lennart Poettering	38ccb55731	nss-mymachines: drop support for UID/GID resolving Now that we make the user/group name resolving available via userdb and thus nss-systemd, we do not need the UID/GID resolving support in nss-mymachines anymore. Let's drop it hence. We keep the module around, since besides UID/GID resolving it also does hostname resolving, which we care about. (One of those days we should replace that by some Varlink logic between nss-resolve/systemd-resolved.service too) The hooks are kept in the NSS module, but they do not resolve anything anymore, in order to keep compat at a maximum.	2020-07-14 17:08:12 +02:00
Zbigniew Jędrzejewski-Szmek	55aacd502b	Merge pull request #15891 from bluca/host_os_release Container Interface: expose the host's os-release metadata to nspawn and portable guests	2020-07-08 23:52:13 +02:00
Lennart Poettering	9b71e4ab90	shared: actually move all BusLocator related calls to bus-locator.c	2020-06-30 15:09:19 +02:00
Luca Boccassi	c2923fdcd7	dissect/nspawn: add support for dm-verity root hash signature Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a pkcs7 signature, with the public key in the kernel keyring, is available. Use it if libcryptsetup supports it.	2020-06-25 08:45:21 +01:00
Lennart Poettering	6b000af4f2	tree-wide: avoid some loaded terms https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.	2020-06-25 09:00:19 +02:00
Luca Boccassi	e1bb4b0d1d	nspawn: implement container host os-release interface	2020-06-23 12:58:21 +01:00
Luca Boccassi	b3b1a08a56	nspawn: use mkdir_p_safe instead of homegrown version	2020-06-23 12:57:05 +01:00
Luca Boccassi	0389f4fa81	core: add RootHash and RootVerity service parameters Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.	2020-06-23 10:50:09 +02:00
Lennart Poettering	6fe01ced0e	nspawn: mkdir selinux mount point once, but not twice Since #15533 we didn't create the mount point for selinuxfs anymore. Before it we created it twice because we mount selinuxfs twice: once the superblock, and once we remount its bind mound read-only. The second mkdir would mean we'd chown() the host version of selinuxfs (since there's only one selinuxfs superblock kernel-wide). The right time to create mount point point is once: before we mount the selinuxfs. But not a second time for the remount. Fixes: #16032	2020-06-23 10:17:36 +02:00
Zbigniew Jędrzejewski-Szmek	9664be199a	Merge pull request #16118 from poettering/inaccessible-fixlets move $XDG_RUNTIME_DIR/inaccessible/ to $XDG_RUNTIME_DIR/systemd/inaccessible	2020-06-10 10:23:13 +02:00
Lennart Poettering	48b747fa03	inaccessible: move inaccessible file nodes to /systemd/ subdir in runtime dir always Let's make sure $XDG_RUNTIME_DIR for the user instance and /run for the system instance is always organized the same way: the "inaccessible" device nodes should be placed in a subdir of either called "systemd" and a subdir of that called "inaccessible". This way we can emphasize the common behaviour, and only differ where really necessary. Follow-up for #13823	2020-06-09 16:23:56 +02:00
Luca Boccassi	e7cbe5cb9e	dissect: support single-filesystem verity images with external verity hash dm-verity support in dissect-image at the moment is restricted to GPT volumes. If the image a single-filesystem type without a partition table (eg: squashfs) and a roothash/verity file are passed, set the verity flag and mark as read-only.	2020-06-09 12:19:21 +01:00
Lennart Poettering	4f9ff96a55	conf-parser: return mtime in config_parse() and friends This is a follow-up for `9f83091e3c`. Instead of reading the mtime off the configuration files after reading, let's do so before reading, but with the fd we read the data from. This is not only cleaner (as it allows us to save one stat()), but also has the benefit that we'll detect changes that happen while we read the files. This also reworks unit file drop-ins to use the common code for determining drop-in mtime, instead of reading system clock for that.	2020-06-02 19:32:20 +02:00
Tobias Hunger	129635333d	repart: Add UUID option to config files Add a option to provide a UUID for the partition that will get created and document that.	2020-05-25 15:48:59 +02:00
Topi Miettinen	7d85383edb	tree-wide: add size limits for tmpfs mounts Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.	2020-05-13 00:37:18 +02:00
Zbigniew Jędrzejewski-Szmek	8acb7780df	Merge pull request #15623 from poettering/cmsg-cleanup various CMSG_xyz clean-ups, split out of #15571	2020-05-08 11:05:06 +02:00
Vito Caputo	5e55340ad4	Merge pull request #15681 from vcaputo/buslocator *: switch to BusLocator-oriented helpers	2020-05-07 09:46:01 -07:00
Vito Caputo	1ecaac5c30	nspawn: switch to BusLocator-oriented helpers Mechanical substitution reducing some verbosity	2020-05-07 08:46:44 -07:00
Lennart Poettering	fb29cdbef2	tree-wide: make sure our control buffers are properly aligned We always need to make them unions with a "struct cmsghdr" in them, so that things properly aligned. Otherwise we might end up at an unaligned address and the counting goes all wrong, possibly making the kernel refuse our buffers. Also, let's make sure we initialize the control buffers to zero when sending, but leave them uninitialized when reading. Both the alignment and the initialization thing is mentioned in the cmsg(3) man page.	2020-05-07 14:39:44 +02:00
Zbigniew Jędrzejewski-Szmek	be32732168	basic/set: let set_put_strdup() create the set with string hash ops If we're using a set with _put_strdup(), most of the time we want to use string hash ops on the set, and free the strings when done. This defines the appropriate a new string_hash_ops_free structure to automatically free the keys when removing the set, and makes set_put_strdup() and set_put_strdupv() instantiate the set with those hash ops. hashmap_put_strdup() was already doing something similar. (It is OK to instantiate the set earlier, possibly with a different hash ops structure. set_put_strdup() will then use the existing set. It is also OK to call set_free_free() instead of set_free() on a set with string_hash_ops_free, the effect is the same, we're just overriding the override of the cleanup function.) No functional change intended.	2020-05-06 16:54:06 +02:00
Motiejus Jakštys	5c4deb9a5c	nspawn: mount custom paths before writing to /etc Consider such configuration: $ systemd-nspawn --read-only --timezone=copy --resolv-conf=copy-host \ --overlay="+/etc::/etc" <...> Assuming one wants `/` to be read-only, DNS and `/etc/localtime` to work. One way to do it is to create an overlay filesystem in `/etc/`. However, systemd-nspawn tries to create `/etc/resolv.conf` and `/etc/localtime` before mounting the custom paths, while `/` (and, by extension, `/etc`) is read-only. Thus it fails to create those files. Mounting custom paths before modifying anything in `/etc/` makes this possible. Full example: ``` $ debootstrap buster /var/lib/machines/t1 http://deb.debian.org/debian $ systemd-nspawn --private-users=false --timezone=copy --resolv-conf=copy-host --read-only --tmpfs=/var --tmpfs=/run --overlay="+/etc::/etc" -D /var/lib/machines/t1 ping -c 1 example.com Spawning container t1 on /var/lib/machines/t1. Press ^] three times within 1s to kill container. ping: example.com: Temporary failure in name resolution Container t1 failed with error code 130. ``` With the patch: ``` $ sudo ./build/systemd-nspawn --private-users=false --timezone=copy --resolv-conf=copy-host --read-only --tmpfs=/var --tmpfs=/run --overlay="+/etc::/etc" -D /var/lib/machines/t1 ping -qc 1 example.com Spawning container t1 on /var/lib/machines/t1. Press ^] three times within 1s to kill container. PING example.com (93.184.216.34) 56(84) bytes of data. --- example.org ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 110.912/110.912/110.912/0.000 ms Container t1 exited successfully. ```	2020-05-05 09:02:57 +02:00
Lennart Poettering	dcff2fa5d1	nspawn: be more careful with creating/chowning directories to overmount We should never re-chown selinuxfs. Fixes: #15475	2020-04-28 19:40:46 +02:00
Lennart Poettering	371d72e05b	socket-util: introduce type-safe, dereferencing wrapper CMSG_FIND_DATA around cmsg_find() let's take this once step further, and add type-safety to cmsg_find(), and imply the CMSG_DATA() macro for finding the cmsg payload.	2020-04-23 19:41:15 +02:00
Lennart Poettering	0f4a141744	Merge pull request #15504 from poettering/cmsg-find-pure just the recvmsg_safe() stuff from #15457	2020-04-23 17:28:19 +02:00
Lennart Poettering	3691bcf3c5	tree-wide: use recvmsg_safe() at various places Let's be extra careful whenever we return from recvmsg() and see MSG_CTRUNC set. This generally means we ran into a programming error, as we didn't size the control buffer large enough. It's an error condition we should at least log about, or propagate up. Hence do that. This is particularly important when receiving fds, since for those the control data can be of any size. In particular on stream sockets that's nasty, because if we miss an fd because of control data truncation we cannot recover, we might not even realize that we are one off. (Also, when failing early, if there's any chance the socket might be AF_UNIX let's close all received fds, all the time. We got this right most of the time, but there were a few cases missing. God, UNIX is hard to use)	2020-04-23 09:41:47 +02:00
Lennart Poettering	287b737693	nspawn: refuse politely when we are run in the non-host netns in combination with --image= Strictly speaking this doesn't really fix #15079, but it at least means we won't hang anymore. Fixes: #15079	2020-04-23 09:18:43 +02:00
Lennart Poettering	1433e0f212	nspawn: minor simplification	2020-04-23 09:18:05 +02:00
Zbigniew Jędrzejewski-Szmek	4ee40eefce	Merge pull request #15516 from poettering/nspawn-resolv-conf beef up --resolv-conf= options of systemd-nspawn	2020-04-23 08:01:46 +02:00
Lennart Poettering	81d2fe53fc	nspawn: some minor modernizations	2020-04-23 07:59:26 +02:00
Lennart Poettering	86775e3524	nspawn: beef up --resolve-conf= modes Let's add flavours for copying stub/uplink resolv.conf versions. Let's add a more brutal "replace" mode, where we'll replace any existing destination file. Let's also change what "auto" means: instead of copying the static file, let's use the stub file, so that DNS search info is copied over. Fixes: #15340	2020-04-22 19:38:04 +02:00
Frantisek Sumsal	86b52a3958	tree-wide: fix spelling errors Based on a report from Fossies.org using Codespell. Followup to #15436	2020-04-21 23:21:08 +02:00
Zbigniew Jędrzejewski-Szmek	162392b75a	tree-wide: spellcheck using codespell Fixes #15436.	2020-04-16 18:00:40 +02:00
Vito Caputo	9f81a592c1	*: convert amenable fdopendir() calls to take_fdopendir() Some fdopendir() calls remain where safe_close() is manually performed, those could be simplified as well by converting to use the _cleanup_close_ machinery, but makes things less trivial to review so left for a future cleanup.	2020-03-31 06:48:03 -07:00

1 2 3 4 5 ...

1077 commits