Systemd

Author	SHA1	Message	Date
Lennart Poettering	09d423e921	nspawn: add greater control over how /etc/resolv.conf is handled Fixes: #8014 #1781	2018-05-22 16:19:26 +02:00
Lennart Poettering	a5201ed6ce	tree-wide: fix a couple of TABs	2018-05-22 16:13:45 +02:00
Arnaud Rebillout	c9fe05e07d	nspawn: support pivot-root option during directory validation Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>	2018-05-22 14:42:10 +02:00
Lennart Poettering	5c828e66b5	tree-wide: port various bits of the tree over to the new DUMP_STRING_TABLE() macro	2018-05-22 13:14:18 +02:00
Lennart Poettering	919f5ae0c7	nspawn: voidify more things	2018-05-17 20:48:55 +02:00
Lennart Poettering	5d9614077d	nspawn: split out merging of settings object Let's separate the loading of the settings object and the merging into our arg_xyz fields into two. This will become particularly useful when we eventually are able to load settings from OCI runtime files in addition to .nspawn files.	2018-05-17 20:48:55 +02:00
Lennart Poettering	d107bb7d63	nspawn: add a new --cpu-affinity= switch Similar as the other options added before, this is primarily useful to provide comprehensive OCI runtime compatbility, but might be useful otherwise, too.	2018-05-17 20:48:54 +02:00
Lennart Poettering	50ebcf6cb7	nspawn: show --help text in a pager The text is long enough now, and we do auto-paging for systemctl already, hence let's do it here too.	2018-05-17 20:48:13 +02:00
Lennart Poettering	81f345dfed	nspawn: add a new --oom-score-adjust= command line switch This is primarily useful in order to provide comprehensive OCI runtime compatibility with nspawn, but might have uses outside of it.	2018-05-17 20:48:12 +02:00
Lennart Poettering	c818eef1cd	nspawn: properly handle and log about hostname setting errors	2018-05-17 20:47:21 +02:00
Lennart Poettering	66edd96310	nspawn: add a new --no-new-privileges= cmdline option to nspawn This simply controls the PR_SET_NO_NEW_PRIVS flag for the container. This too is primarily relevant to provide OCI runtime compaitiblity, but might have other uses too, in particular as it nicely complements the existing --capability= and --drop-capability= flags.	2018-05-17 20:47:20 +02:00
Lennart Poettering	3a9530e5f1	nspawn: make the hostname of the container explicitly configurable with a new --hostname= switch Previously, the container's hostname was exclusively initialized from the machine name configured with --machine=, i.e. the internal name and the external name used for and by the container was synchronized. This adds a new option --hostname= that optionally allows the internal name to deviate from the external name. This new option is mainly useful to ultimately implement the OCI runtime spec directly in nspawn, but it might be useful on its own for some other usecases too.	2018-05-17 20:46:45 +02:00
Lennart Poettering	bf428efb07	nspawn: add new --rlimit= switch, and always set resource limits explicitly for our container payloads This ensures we set the various resource limits of our container explicitly on each invocation so that we inherit less from our callers into the payload. By default resource limits are now set to the same values Linux generally passes to the host PID 1, thus minimizing needless differences between host and container environments. The limits are now also configurable using a new --rlimit= switch. This is preparation for teaching nspawn native OCI runtime support as OCI permits setting resource limits for container payloads, and it hence probably makes sense if we do too.	2018-05-17 20:45:54 +02:00
Yu Watanabe	130d3d22e9	tree-wide: use strv_free_and_replace() macro	2018-05-10 00:57:34 +09:00
Lennart Poettering	720f0a2f3c	nspawn: move nspawn cgroup hierarchy one level down unconditionally We need to do this in all cases, including on cgroupsv1 in order to ensure the host systemd and any systemd in the payload won't fight for the cgroup attributes of the top-level cgroup of the payload. This is because systemd for Delegate=yes units will only delegate the right to create children as well as their attributes. However, nspawn expects that the cgroup delegated covers both the right to create children and the attributes of the cgroup itself. Hence, to clear this up, let's unconditionally insert a intermediary cgroup, on cgroupsv1 as well as cgroupsv2, unconditionally. This is also nice as it reduces the differences in the various setups and exposes very close behaviour everywhere.	2018-05-03 17:45:42 +02:00
Lennart Poettering	9ec5a93c98	nspawn: don't make /proc/kmsg node too special Similar to the previous commit, let's just use our regular calls for managing temporary nodes take care of this.	2018-05-03 17:45:42 +02:00
Lennart Poettering	cdde6ba6b6	nspawn: mount boot ID from temporary file in /tmp Let's not make /run too special and let's make sure the source file is not guessable: let's use our regular temporary file helper calls to create the source node.	2018-05-03 17:45:42 +02:00
Lennart Poettering	88614c8a28	nspawn: size_t more stuff A follow-up for #8840	2018-05-03 17:19:46 +02:00
Yu Watanabe	29a3db75fd	util: rename signal_from_string_try_harder() to signal_from_string() Also this makes the new `signal_from_string()` function reject e.g, `SIG3` or `SIG+5`.	2018-05-03 16:52:49 +09:00
Yu Watanabe	1e4f1671c2	nspawn: fix warning by -Wnonnull (#8877 )	2018-05-02 10:03:31 +02:00
Lennart Poettering	8e766630f0	tree-wide: drop redundant _cleanup_ macros (#8810 ) This drops a good number of type-specific _cleanup_ macros, and patches all users to just use the generic ones. In most recent code we abstained from defining type-specific macros, and this basically removes all those added already, with the exception of the really low-level ones. Having explicit macros for this is not too useful, as the expression without the extra macro is generally just 2ch wider. We should generally emphesize generic code, unless there are really good reasons for specific code, hence let's follow this in this case too. Note that _cleanup_free_ and similar really low-level, libc'ish, Linux API'ish macros continue to be defined, only the really high-level OO ones are dropped. From now on this should really be the rule: for really low-level stuff, such as memory allocation, fd handling and so one, go ahead and define explicit per-type macros, but for high-level, specific program code, just use the generic _cleanup_() macro directly, in order to keep things simple and as readable as possible for the uninitiated. Note that before this patch some of the APIs (notable libudev ones) were already used with the high-level macros at some places and with the generic _cleanup_ macro at others. With this patch we hence unify on the latter.	2018-04-25 12:31:45 +02:00
Lennart Poettering	0c300adfa4	nspawn: when running nspawn, set a $PATH including both bin + sbin by default (#8756 ) We don't know what the container payload needs, hence default to a PATH with both bin and sbin included, as well as / and /usr. Follow-up for #8324 Fixes: #8698	2018-04-20 11:36:25 +02:00
Lennart Poettering	5d13a15b1d	tree-wide: drop spurious newlines (#8764 ) Double newlines (i.e. one empty lines) are great to structure code. But let's avoid triple newlines (i.e. two empty lines), quadruple newlines, quintuple newlines, …, that's just spurious whitespace. It's an easy way to drop 121 lines of code, and keeps the coding style of our sources a bit tigther.	2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Philip Sequeira	7511655807	nspawn: wait for network namespace creation before interface setup (#8633 ) Otherwise, network interfaces can be "moved" into the container's namespace while it's still the same as the host namespace, in which case e.g. host0 for a veth ends up on the host side instead of inside the container. Regression introduced in `0441378080`. Fixes #8599.	2018-04-05 07:04:27 -07:00
Yu Watanabe	1cc6c93a95	tree-wide: use TAKE_PTR() and TAKE_FD() macros	2018-04-05 14:26:26 +09:00
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Lennart Poettering	4526113f57	dissect: add dissect_image_and_warn() that unifies error message generation for dissect_image() (#8517 )	2018-03-21 12:10:01 +01:00
Zbigniew Jędrzejewski-Szmek	0441378080	nspawn: move network namespace creation to a separate step (#8430 ) Fixes #8427. Unsharing the namespace in a separate step changes the ownership of /proc/net/ip_tables_names (and related files) from nobody:nobody to root:root. See [1] and [2] for all the details. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f13f2aeed154da8e48f90b85e720f8ba39b1e881 [2] https://bugzilla.netfilter.org/show_bug.cgi?id=1064#c9	2018-03-20 18:07:17 +01:00
Lennart Poettering	2b33ab0957	tree-wide: port various places over to use new rearrange_stdio()	2018-03-02 11:42:10 +01:00
Zbigniew Jędrzejewski-Szmek	8405dcf752	nspawn: make sure we don't leak the fd in chase_symlinks_and_update No callers use CHASE_OPEN right now, but let's be defensive.	2018-02-15 10:18:25 +01:00
Lennart Poettering	d72495759b	tree-wide: port all code to use safe_getcwd()	2018-01-17 11:17:38 +01:00
Lennart Poettering	75152a4d6a	tree-wide: install matches asynchronously Let's remove a number of synchronization points from our service startups: let's drop synchronous match installation, and let's opt for asynchronous instead. Also, let's use sd_bus_match_signal() instead of sd_bus_add_match() where we can.	2018-01-05 13:58:32 +01:00
Lennart Poettering	d2e0ac3d1e	tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork()	2018-01-04 13:27:27 +01:00
Lennart Poettering	7d4904fe7a	process-util: rework wait_for_terminate_and_warn() to take a flags parameter This renames wait_for_terminate_and_warn() to wait_for_terminate_and_check(), and adds a flags parameter, that controls how much to log: there's one flag that means we log about abnormal stuff, and another one that controls whether we log about non-zero exit codes. Finally, there's a shortcut flag value for logging in both cases, as that's what we usually use. All callers are accordingly updated. At three occasions duplicate logging is removed, i.e. where the old function was called but logged in the caller, too.	2018-01-04 13:27:27 +01:00
Zbigniew Jędrzejewski-Szmek	dae8b82eb9	Add mkdir_errno_wrapper() and use instead of mkdir() in various places We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir returns the error in errno while mkdir_label returns the error directly.	2017-12-16 13:28:22 +01:00
Zbigniew Jędrzejewski-Szmek	bdd2bbc445	Merge pull request #7469 from kinvolk/dongsu/nspawn-netns nspawn: introduce an option for specifying network namespace path	2017-12-14 22:47:57 +01:00
Lennart Poettering	fbd0b64f44	tree-wide: make use of new STRLEN() macro everywhere (#7639 ) Let's employ coccinelle to do this for us. Follow-up for #7625.	2017-12-14 19:02:29 +01:00
Dongsu Park	d7bea6b629	nspawn: introduce an option for specifying network namespace path Add a new option `--network-namespace-path` to systemd-nspawn to allow users to specify an arbitrary network namespace, e.g. `/run/netns/foo`. Then systemd-nspawn will open the netns file, pass the fd to outer_child, and enter the namespace represented by the fd before running inner_child. ``` $ sudo ip netns add foo $ mount \| grep /run/netns/foo nsfs on /run/netns/foo type nsfs (rw) ... $ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \ /bin/readlink -f /proc/self/ns/net /proc/1/ns/net:[4026532009] ``` Note that the option `--network-namespace-path=` cannot be used together with other network-related options such as `--private-network` so that the options do not conflict with each other. Fixes https://github.com/systemd/systemd/issues/7361	2017-12-13 10:21:06 +00:00
Lennart Poettering	fba868fa71	tree-wide: unify logging of "Must be root" message Let's unify this in one call, generalizing must_be_root() from bootctl.c.	2017-12-11 23:19:45 +01:00
Lennart Poettering	8fd010bb1b	nspawn: turn on watchdog logic for nspawn too It's a long-running daemon, and it's easy to enable, hence do it.	2017-12-07 12:34:46 +01:00
Lennart Poettering	87d5e4f286	build-sys: make the dynamic UID range, and the container UID range configurable Also, export these ranges in our pkg-config files.	2017-12-06 12:55:37 +01:00
Lennart Poettering	de54e02d5e	nspawn: when in hybrid mode, chown() both the legacy and the unified hierarchy to the root in the container If user namespacing is used, let's make sure that the root user in the container gets access to both /sys/fs/cgroup/systemd and /sys/fs/cgroup/unified. This matches similar logic in cg_set_access().	2017-12-05 13:49:13 +01:00
Lennart Poettering	2d3a5a73e0	nspawn: make sure images containing an ESP are compatible with userns -U mode In -U mode we might need to re-chown() all files and directories to match the UID shift we want for the image. That's problematic on fat partitions, such as the ESP (and which is generated by mkosi's --bootable switch), because fat of course knows no UID/GID file ownership natively. With this change we take benefit of the uid= and gid= mount options FAT knows: instead of chown()ing all files and directories we can just specify the right UID/GID to use at mount time. This beefs up the image dissection logic in two ways: 1. First of all support for mounting relevant file systems with uid=/gid= is added: when a UID is specified during mount it is used for all applicable file systems. 2. Secondly, two new mount flags are added: DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY. If one is specified the mount routine will either only mount the root partition of an image, or all partitions except the root partition. This is used by nspawn: first the root partition is mounted, so that we can determine the UID shift in use so far, based on ownership of the image's root directory. Then, we mount the remaining partitions in a second go, this time with the right UID/GID information.	2017-12-05 13:49:12 +01:00
Lennart Poettering	8199d554c1	nspawn: figure out cgroup mode after mounting image If we operate on a disk image (i.e. --image=) then it's pointless to look into the mount directory before it is actually mounted to see which systemd version is running inside... Unfortunately we only mount the disk image in the child process, but the parent needs to know the cgroup mode, hence add some IPC for this purpose and communicate the cgroup mode determined from the image back to the parent.	2017-12-05 13:49:12 +01:00
Yu Watanabe	62b1e758d3	nspawn: adjust path to static resolv.conf to support split usr Fixes #7302.	2017-11-25 21:11:07 +09:00
Lennart Poettering	d381c8a6bf	nspawn: hash the machine name, when looking for a suitable UID base (#7437 ) When "-U" is used we look for a UID range we can use for our container. We start with the UID the tree is already assigned to, and if that didn't work we'd pick random ranges so far. With this change we'll first try to hash a suitable range from the container name, and use that if it works, in order to make UID assignments more likely to be stable. This follows a similar logic PID 1 follows when using DynamicUser=1.	2017-11-24 20:57:19 +01:00
Lennart Poettering	abdb9b08f6	nspawn: make use of the RequestStop logic of scope units Since time began, scope units had a concept of "Controllers", a bus peer that would be notified when somebody requested a unit to stop. None of our code used that facility so far, let's change that. This way, nspawn can print a nice message when somebody invokes "systemctl stop" on the container's scope unit, and then react with the right action to shut it down.	2017-11-23 21:47:48 +01:00
Shawn Landden	4831981d89	tree-wide: adjust fall through comments so that gcc is happy Distcc removes comments, making the comment silencing not work. I know there was a decision against a macro in commit `ec251fe7d5`	2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	3603efdea5	nspawn: make recursive chown()ing logic safe for being aborted in the middle We currently use the ownership of the top-level directory as a hint whether we need to descent into the whole tree to chown() it recursively or not. This is problematic with the previous chown()ing algorithm, as when descending into the tree we'd first chown() and then descend further down, which meant that the top-level directory would be chowned first, and an aborted recursive chowning would appear on the next invocation as successful, even though it was not. Let's reshuffle things a bit, to make the re-chown()ing safe regarding interruptions: a) We chown() the dir we are looking at last, and descent into all its children first. That way we know that if the top-level dir is properly owned everything inside of it is properly owned too. b) Before starting a chown()ing operation, we mark the top-level directory as owned by a special "busy" UID range, which we can use to recognize whether a tree was fully chowned: if it is marked as busy, it's definitely not fully chowned, as the busy ownership will only be fixed as final step of the chowning. Fixes: #6292	2017-11-17 11:12:33 +01:00
Lennart Poettering	0986658d51	Merge pull request #6866 from sourcejedi/set-linger2 logind: fix `loginctl enable-linger`	2017-11-15 11:15:15 +01:00
Lennart Poettering	759aaedc5c	dissect: when we invoke dissection on a loop device with partscan help the user This adds some simply detection logic for cases where dissection is invoked on an externally created loop device, and partitions have been detected on it, but partition scanning so far was off. If this is detected we now print a brief message indicating what the issue is, instead of failing with a useless EINVAL message the kernel passed to us.	2017-10-26 17:54:56 +02:00
Lennart Poettering	eb38edce88	machine-image: add partial discovery of block devices as images This adds some basic discovery of block device images for nspawn and friends. Note that this doesn't add searching for block devices using udev, but instead expects users to symlink relevant block devices into /var/lib/machines. Discovery is hence done exactly like for dir/subvol/raw file images, except that what is found may be a (symlink to) a block device. For now, we do not support cloning these images, but removal, renaming and read-only flags are supported to the point where that makes sense. Fixe: #6990	2017-10-26 17:54:56 +02:00
Alan Jenkins	8d9c2bca41	nspawn: comment to acknowledge lying about "user session"	2017-10-18 09:47:10 +01:00
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
Andreas Rammhold	3742095b27	tree-wide: use IN_SET where possible In addition to the changes from #6933 this handles cases that could be matched with the included cocci file.	2017-10-02 13:09:54 +02:00
Lennart Poettering	8e5430c4bd	nspawn: set up a new session keyring for the container process keyring material should not leak into the container. So far we relied on seccomp to deny access to the keyring, but given that we now made the seccomp configurable, and access to keyctl() and friends may optionally be permitted to containers now let's make sure we disconnect the callers keyring from the keyring of PID 1 in the container.	2017-09-22 15:28:04 +02:00
Lennart Poettering	960e4569e1	nspawn: implement configurable syscall whitelisting/blacklisting Now that we have ported nspawn's seccomp code to the generic code in seccomp-util, let's extend it to support whitelisting and blacklisting of specific additional syscalls. This uses similar syntax as PID1's support for system call filtering, but in contrast to that always implements a blacklist (and not a whitelist), as we prepopulate the filter with a blacklist, and the unit's system call filter logic does not come with anything prepopulated. (Later on we might actually want to invert the logic here, and whitelist rather than blacklist things, but at this point let's not do that. In case we switch this over later, the syscall add/remove logic of this commit should be compatible conceptually.) Fixes: #5163 Replaces: #5944	2017-09-12 14:06:21 +02:00
Lennart Poettering	21022b9dde	util-lib: wrap personality() to fix up broken glibc error handling (#6766 ) glibc appears to propagate different errors in different ways, let's fix this up, so that our own code doesn't get confused by this. See #6752 + #6737 for details. Fixes: #6755	2017-09-08 17:16:29 +03:00
Lennart Poettering	8cb5743079	nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416 ) Given that we set NOTIFY_SOCKET unconditionally it's not surprising that processes way down the process tree think it's smart to send us a notification message. It's still useful to keep this message, for debugging things, but it shouldn't be generated by default.	2017-07-20 14:46:58 -04:00
Lennart Poettering	cd2dfc6fae	nspawn: register a scope for the unit if --register=no is specified (#6166 ) Previously, only when --register=yes was set (the default) the invoked container would get its own scope, created by machined on behalf of nspawn. With this change if --register=no is set nspawn will still get its own scope (which is a good thing, so that --slice= and --property= take effect), but this is not done through machined but by registering a scope unit directly in PID 1. Summary: --register=yes → allocate a new scope through machined (the default) --register=yes --keep-unit → use the unit we are already running in an register with machined --register=no → allocate a new scope directly, but no machined --register=no --keep-unit → do not allocate nor register anything Fixes: #5823	2017-06-28 13:22:46 -04:00
Zbigniew Jędrzejewski-Szmek	35bca925f9	tree-wide: fix incorrect uses of %m In those cases errno was not set, so we would be logging some unrelated error or "Success".	2017-05-13 15:42:26 -04:00
Zbigniew Jędrzejewski-Szmek	ab8ee0f259	tree-wide: use SET_FLAG in more places (#5892 )	2017-05-07 07:03:28 -04:00
Zbigniew Jędrzejewski-Szmek	399e391fa6	nspawn: check cgroups after parsing options Same justification as in previous commit.	2017-04-25 08:54:00 -04:00
Lennart Poettering	948a3241de	Merge pull request #5708 from vcatechnology/arm-cross-compile ARM32 cross-compile fixes	2017-04-17 15:49:06 +02:00
Matt Clarkson	6b5cf3ea62	build-sys: correct blkid.h includes When using pkg-config to determine the include flags for blkid the flags are returned as: $ pkg-config blkid --cflags -I/usr/include/blkid -I/usr/include/uuid We use the <blkid/blkid.h> include which would be correct when using the default compiler /usr/include header search path. However, when cross-compiling the blkid.h will not be installed at /usr/include and highly likely in a temporary system root. It is futher compounded if the cross-compile packages are split up and the blkid package is not available in the same sysroot as the compiler. Regardless of the compilation setup, the correct include path should be <blkid.h> if using the pkg-config returned CFLAGS.	2017-04-06 14:33:02 +01:00
David Michael	7357272ed1	nspawn: check if the DNS stub is listening for requests	2017-03-31 11:34:32 -07:00
Zbigniew Jędrzejewski-Szmek	78e4f19ebc	Merge pull request #5444 from poettering/cgroups-revert-no-error Revert "core: simplify cg_[all_]unified()" and more.	2017-02-24 18:48:57 -05:00
AsciiWolf	13e785f7a0	Fix missing space in comments (#5439 )	2017-02-24 18:14:02 +01:00
Lennart Poettering	c22800e40e	cgroup: rename cg_unified() → cg_unified_controller() cg_unified() is a bit generic a name, let's make clear that it checks whether a specified controller is in unified mode.	2017-02-24 18:00:04 +01:00
Lennart Poettering	b4cccbc13a	cgroup: change cg_unified() to possibly return errors again We use our cgroup APIs in various contexts, including from our libraries sd-login, sd-bus. As we don#t control those environments we can't rely that the unified cgroup setup logic succeeds, and hence really shouldn't assert on it. This more or less reverts `415fc41cea`.	2017-02-24 17:52:58 +01:00
Tejun Heo	2977724b09	core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd hierarchy Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1 name=systemd hierarchy. While this works fine for systemd itself, it breaks tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd. This patch updates the hybrid mode so that it mounts v2 hierarchy on /sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on /sys/fs/cgroup/systemd for compatibility. systemd itself doesn't depend on the "name=systemd" hierarchy at all. All operations take place on the v2 hierarchy as before but the v1 hierarchy is kept in sync so that any tools which expect it to be there can keep doing so. This allows systemd to take advantage of cgroup v2 process management without requiring other tools to be aware of the hybrid mode. The hybrid mode is implemented by mapping the special systemd controller to /sys/fs/cgroup/unified and making the basic cgroup utility operations - cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the /sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated. While a bit messy, this will allow dropping complications from using cgroup v1 for process management a lot sooner than otherwise possible which should make it a net gain in terms of maintainability. v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount point to /sys/fs/cgroup/unified as suggested by @brauner. v3: chown the compat hierarchy too on delegation. Suggested by @evverx. v4: [zj] - drop the change to default, full "legacy" is still the default.	2017-02-20 12:28:35 -05:00
Tejun Heo	415fc41cea	core: simplify cg_[all_]unified() cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().	2017-02-18 17:51:13 -05:00
Tejun Heo	bd15ab41a1	nspawn: fix cgroup mode detection cgroup mode detection is broken in two different ways. * detect_unified_cgroup_hierarchy() is called too nested in outer_child(). sync_cgroup() which is used by run() also needs to know the requested cgroup mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN. This makes it skip syncing the inner cgroup hierarchy on some config combinations. $ cat /proc/self/cgroup \| grep systemd 1:name=systemd:/user.slice/user-0.slice/session-c1.scope $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container ... [root@container ~]# cat /proc/self/cgroup \| grep systemd 1:name=systemd:/machine.slice/machine-container.x86_64.scope $ exit $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container [root@container ~]# cat /proc/self/cgroup \| grep 0:: 0::/ $ exit Note how the unified hierarchy case's path is not synchronized with the host. This for example can cause issues when there are multiple such containers. Fixed by moving detect_unified_cgroup_hierarchy() invocation to main(). * inner_child() was invoking cg_unified_flush(). inner_child() executes fully scoped and can't determine which cgroup mode the host was in. It doesn't make sense to keep flushing the detected mode when the host mode can't change. Fixed by replacing cg_unified_flush() invocations in outer_child() and inner_child() with one in main().	2017-02-18 17:49:06 -05:00
Zbigniew Jędrzejewski-Szmek	581a07f9f0	Merge pull request #5369 from poettering/nspawn-resolved fixes for running nspawn+resolved in combination	2017-02-18 11:54:34 -05:00
Lennart Poettering	b053cd5f8e	nspawn: tweak check whether resolved is around a bit Let's check D-Bus instead of files in /run to see if resolved is running. This is a bit nicer as bus names are automatically cleaned up when resolved dies, which is not the case for files in /run. See: #4649	2017-02-17 16:06:31 -05:00
Lennart Poettering	1c876927e4	copy: change the various copy_xyz() calls to take a unified flags parameter This adds a unified "copy_flags" parameter to all copy_xyz() function calls, replacing the various boolean flags so far used. This should make many invocations more readable as it is clear what behaviour is precisely requested. This also prepares ground for adding support for more modes later on.	2017-02-17 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek	fc6149a6ce	Merge pull request #4962 from poettering/root-directory-2 Add new MountAPIVFS= boolean unit file setting + RootImage=	2017-02-08 23:05:05 -05:00
Philip Withnall	b53ede699c	nspawn: Add support for sysroot pivoting (#5258 ) Add a new --pivot-root argument to systemd-nspawn, which specifies a directory to pivot to / inside the container; while the original / is pivoted to another specified directory (if provided). This adds support for booting container images which may contain several bootable sysroots, as is common with OSTree disk images. When these disk images are booted on real hardware, ostree-prepare-root is run in conjunction with sysroot.mount in the initramfs to achieve the same results.	2017-02-08 16:54:31 +01:00
Lennart Poettering	78ebe98061	core,nspawn,dissect: make nspawn's .roothash file search reusable This makes nspawn's logic of automatically discovering the root hash of an image file generic, and then reuses it in systemd-dissect and in PID1's RootImage= logic, so that verity is automatically set up whenever we can.	2017-02-07 12:21:28 +01:00
Lennart Poettering	ced58da749	nspawn: shown exec() command is misleading There's no point in updating exec_target for each binary we try to execute, if we override it right-away anyway... Let's just do this once, and include all binaries we try each time. Follow-up for `1a68e1e543`.	2017-02-02 20:10:28 +01:00
Philip Withnall	1a68e1e543	nspawn: Print attempted execv() path on failure (#5199 ) The failure message is typically currently: execv() failed: No such file or directory which is not very useful because it doesn’t tell you which file or directory it was trying to exec.	2017-02-01 08:36:16 -05:00
Zbigniew Jędrzejewski-Szmek	ec251fe7d5	tree-wide: adjust fall through comments so that gcc is happy gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways we could deal with that. After we take into account the need to stay compatible with older versions of the compiler (and other compilers), I don't think adding __attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks out too much, a comment is just as good. But gcc has some very specific requiremnts how the comment should look. Adjust it the specific form that it likes. I don't think the extra stuff we had in those comments was adding much value. (Note: the documentation seems to be wrong, and seems to describe a different pattern from the one that is actually used. I guess either the docs or the code will have to change before gcc 7 is finalized.)	2017-01-31 14:04:55 -05:00
Zbigniew Jędrzejewski-Szmek	9ce6d1b319	nspawn: fix clobbering of selinux context arg First bug fixed by gcc 7. Yikes.	2017-01-31 14:04:55 -05:00
Evgeny Vereshchagin	adc7d9f0da	nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root Fixes #4944	2017-01-17 08:40:05 +00:00
Zbigniew Jędrzejewski-Szmek	e0489532fd	nspawn: fix memleak CID #1368262: fn is allocated with new, so it should be freed.	2017-01-15 16:57:57 -05:00
Zbigniew Jędrzejewski-Szmek	6b3d378331	Merge pull request #4879 from poettering/systemd	2017-01-14 21:29:27 -05:00
Lennart Poettering	8dbf71ec58	nspawn: reword notice when /dev is pre-mounted and populated (#4971 ) Fixes: #4676	2016-12-29 11:02:39 +01:00
Lennart Poettering	87447ae459	nspawn: tweaks to /etc/resolv.conf management Handle properly if /etc is a symlink (i.e. make sure we don't follow the symlink outside the image). Also follow /etc/resolv.conf if it is a symlink, and use the resolved path when creating a mount point and mounting (as both of these operations follow symlinks and rally shouldn't). Handle more types of read-only errors as debug-level issues.	2016-12-21 19:09:32 +01:00
Lennart Poettering	8ccf7e9e96	nspawn: don't complain when we can't fix the timezone of read-only containers There's nothing we can do about it, hence don't complain.	2016-12-21 19:09:32 +01:00
Lennart Poettering	e0f9e7bd03	dissect: make using a generic partition as root partition optional In preparation for reusing the image dissector in the GPT auto-discovery logic, only optionally fail the dissection when we can't identify a root partition. In the GPT auto-discovery we are completely fine with any kind of root, given that we run when it is already mounted and all we do is find some additional auxiliary partitions on the same disk.	2016-12-21 19:09:30 +01:00
Lennart Poettering	4ad14eff19	nspawn: restore --volatile=yes support This was broken by `19caffac75` which remounted the root directory to MS_SHARED before applying the volatile mount logic. This broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we need MS_MOVE in the volatile mount logic to rearrange the directory tree. Simply swap the order here, apply the volatile logic before we switch to MS_SHARED.	2016-12-21 19:09:28 +01:00
Evgeny Vereshchagin	5773024d7f	nspawn: unref the notify event source (#4941 ) Fixes: ``` sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b ... ==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15 ==21224== at 0x4C2FA50: calloc (vg_replace_malloc.c:711) ==21224== by 0x4F6F565: sd_event_new (sd-event.c:431) ==21224== by 0x1210BE: run (nspawn.c:3351) ==21224== by 0x123908: main (nspawn.c:3826) ==21224== ==21224== LEAK SUMMARY: ==21224== definitely lost: 656 bytes in 1 blocks ==21224== indirectly lost: 1,788 bytes in 11 blocks ==21224== possibly lost: 0 bytes in 0 blocks ==21224== still reachable: 8,344 bytes in 3 blocks ==21224== suppressed: 0 bytes in 0 blocks ``` Closes #4934	2016-12-21 18:36:15 +01:00
Lennart Poettering	9b6deb03fc	dissect: optionally, only look for GPT partition tables, nothing else This is useful for reusing the dissector logic in the gpt-auto-discovery logic: there we really don't want to use MBR or naked file systems as root device.	2016-12-20 20:00:09 +01:00
Lennart Poettering	75bf701f5c	nspawn: flush out environment block of the -a stub init process The container detection code in virt.c we ship checks for /proc/1/environ, looking for "container=" in it. Let's make sure our "-a" init stub exposes that correctly. Without this "systemd-detect-virt" run in a "-a" container won't detect that it is being run in a container.	2016-12-14 18:29:30 +01:00
Andrey Ulanov	6916b16464	nspawn: when getting SIGCHLD make sure it's from the first child (#4855 ) When getting SIGCHLD we should not assume that it was the first child forked from system-nspawn that has died as it may also be coming from an orphan process. This change adds a signal handler that ignores SIGCHLD unless it came from the first containerized child - the real child. Before this change the problem can be reproduced as follows: $ sudo systemd-nspawn --directory=/container-root --share-system Press ^] three times within 1s to kill container. [root@andreyu-coreos ~]# { true & } & [1] 22201 [root@andreyu-coreos ~]# Container root-fedora-latest terminated by signal KILL	2016-12-13 02:38:18 +01:00
Zbigniew Jędrzejewski-Szmek	4a5567d5d6	Merge pull request #4795 from poettering/dissect Generalize image dissection logic of nspawn, and make it useful for other tools.	2016-12-10 01:08:13 -05:00
Wim de With	2e1f244efd	nspawn: add missing -E to getopt_long (#4860 )	2016-12-10 07:33:58 +03:00
Franck Bui	5367354dae	nspawn: resolv.conf might not be created initially (#4799 ) This might happen that resolv.conf is missing in a minimal rootfs and in this case the following warning is emitted: Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory This patch fixes this case.	2016-12-07 21:36:39 +01:00
Lennart Poettering	4623e8e6ac	nspawn/dissect: automatically discover dm-verity verity partitions This adds support for discovering and making use of properly tagged dm-verity data integrity partitions. This extends both systemd-nspawn and systemd-dissect with a new --root-hash= switch that takes the root hash to use for the root partition, and is otherwise fully automatic. Verity partitions are discovered automatically by GPT table type UUIDs, as listed in https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ (which I updated prior to this change, to include new UUIDs for this purpose. mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images that carry the necessary integrity data. With that PR and this commit, the following simply lines suffice to boot up an integrity-protected container image: ``` # mkdir test # cd test # mkosi --verity # systemd-nspawn -i ./image.raw -bn ``` Note that mkosi writes the image file to "image.raw" next to a a file "image.roothash" that contains the root hash. systemd-nspawn will look for that file and use it if it exists, in case --root-hash= is not specified explicitly.	2016-12-07 18:38:41 +01:00
Lennart Poettering	4827ab4854	nspawn: when generating a machine name from an image name, truncate .raw suffix Let's prettify the machine name we generate for image-based containers: let's chop off the .raw suffix before using it as machine name.	2016-12-07 18:38:41 +01:00
Lennart Poettering	18b5886e56	dissect: add support for encrypted images This adds support to the image dissector to deal with encrypted images (only LUKS). Given that we now have a neatly isolated image dissector codebase, let's add a new feature to it: support for automatically dealing with encrypted images. This is then exposed in systemd-dissect and nspawn. It's pretty basic: only support for passphrase-based encryption. In order to ensure that "systemd-dissect --mount" results in mount points whose backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at the moment doesn't provide a proper API for this. Thankfully, the ioctl() API is pretty easy to use.	2016-12-07 18:38:41 +01:00
Lennart Poettering	2d8457851b	nspawn: port nspawn to new generalized image dissection code Let's make use of the new internal API. This mostly doesn't change anything for the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new code can handle disk images with no partition tables, and make any detected images directly the root.	2016-12-07 18:38:40 +01:00
Lennart Poettering	cb638b5e96	util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENT As suggested by @keszybz	2016-12-01 12:49:55 +01:00
Lennart Poettering	86c0dd4a71	nspawn: permit prefixing of source paths in --bind= and --overlay= with "+" If a source path is prefixed with "+" it is taken relative to the container's root directory instead of the host. This permits easily establishing bind and overlay mounts based on data from the container rather than the host. This also reworks custom_mounts_prepare(), and turns it into two functions: one custom_mount_check_all() that remains in nspawn.c but purely verifies the validity of the custom mounts configured. And one called custom_mount_prepare_all() that actually does the preparation step, sorts the custom mounts, resolves relative paths, and allocates temporary directories as necessary.	2016-12-01 12:41:18 +01:00
Lennart Poettering	e28c7cd066	tree-wide: set SA_RESTART for signal handlers we install We already set it in most cases, but make sure to set it in all others too, and document that that's a good idea.	2016-12-01 12:41:17 +01:00
Lennart Poettering	ad85779a50	nspawn: split out overlayfs argument parsing into a function of its own Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and bind_mount_parse().	2016-12-01 00:25:51 +01:00
Lennart Poettering	8d4aa2bb32	nspawn: make use of CHASE_NON_EXISTING when locking image If --template= is used on an image, then the image might not exist initially. We can use CHASE_NON_EXISTING to properly lock the image already before it exists. Let's do so.	2016-12-01 00:25:51 +01:00
Lennart Poettering	c4f4fce79e	fs-util: add flags parameter to chase_symlinks() Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of chase_symlinks_prefix().	2016-12-01 00:25:51 +01:00
Lennart Poettering	8cd328d82e	nspawn: accept --ephemeral --template= as alternative for --ephemeral --directory= As suggested in PR #3667. This PR simply ensures that --template= can be used as alternative to --directory= when --ephemeral is used, following the logic that for ephemeral options the source directory is actually a template. This does not deprecate usage of --directory= with --ephemeral, as I am not convinced the old logic wouldn't make sense. Fixes: #3667	2016-12-01 00:25:51 +01:00
Lennart Poettering	3f342ec4b0	nspawn: properly handle image/directory paths that are symlinks This resolves any paths specified on --directory=, --template=, and --image= before using them. This makes sure nspawn can be used correctly on symlinked images and directory trees. Fixes: #2001	2016-12-01 00:25:51 +01:00
Lennart Poettering	e187369587	tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead Let's use chase_symlinks() everywhere, and stop using GNU canonicalize_file_name() everywhere. For most cases this should not change behaviour, however increase exposure of our function to get better tested. Most importantly in a few cases (most notably nspawn) it can take the correct root directory into account when chasing symlinks.	2016-12-01 00:25:51 +01:00
Lennart Poettering	17cbb288fa	nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot Given that other file systems (notably: xfs) support reflinks these days, let's extend the file system snapshotting logic to fall back to plan copies or reflinks when full btrfs subvolume snapshots are not available. This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn --template=" available on non-btrfs subvolumes. Of course, both operations will still be slower on non-btrfs than on btrfs (simply because reflinking each file individually in a directory tree is still slower than doing this in one step for a whole subvolume), but it's probably good enough for many cases, and we should provide the users with the tools, they have to figure out what's good for them. Note that "machinectl clone" already had a fallback like this in place, this patch generalizes this, and adds similar support to our other cases.	2016-11-22 13:35:09 +01:00
Lennart Poettering	c67b008273	nspawn: remove temporary root directory on exit When mountint a loopback image, we need a temporary root directory we can mount stuff to. Make sure to actually remove it when exiting, so that we don't leave stuff around in /tmp unnecessarily. See: #4664	2016-11-22 13:35:09 +01:00
Lennart Poettering	6a0f896b97	nspawn: try to wait for the container PID 1 to exit, before we exit Let's make the shutdown logic synchronous, so that there's a better chance to detach the loopback device after use.	2016-11-22 13:35:09 +01:00
Lennart Poettering	0f3be6ca4d	nspawn: support ephemeral boots from images Previously --ephemeral was only supported with container trees in btrfs subvolumes (i.e. in combination with --directory=). This adds support for --ephemeral in conjunction with disk images (i.e. --image=) too. As side effect this fixes that --ephemeral was accepted but ignored when using -M on a container that turned out to be an image. Fixes: #4664	2016-11-22 13:35:09 +01:00
Lennart Poettering	f4ff4aa800	Merge pull request #4395 from s-urbaniak/rw-support nspawn: R/W support for /sysfs, /proc, and /proc/sys/net	2016-11-18 12:36:46 +01:00
Sergiusz Urbaniak	4f086aab52	nspawn: R/W support for /sys, and /proc/sys This commit adds the possibility to leave /sys, and /proc/sys read-write. It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE to enable this feature. If set to "yes", /sys, and /proc/sys will be read-write. If set to "no", /sys, and /proc/sys will be read-only. If set to "network" /proc/sys/net will be read-write. This is useful in use-cases, where systemd-nspawn is used in an external network namespace. This adds the possibility to start privileged containers which need more control over settings in the /proc, and /sys filesystem. This is also a follow-up on the discussion from https://github.com/systemd/systemd/pull/4018#r76971862 where an introduction of a simple env var to enable R/W support for those directories was already discussed.	2016-11-18 09:50:40 +01:00
Zbigniew Jędrzejewski-Szmek	2a49b6120f	nspawn: restart the whole systemd-nspawn@.service unit on container reboot (#4613 ) Since 133 is now used in a few places, add a #define for it. Also make the status message a bit informative. Another issue introduced in `b006762`. The logic was borked, we were supposed to return 0 to break the loop, and 133 to restart the container, not the other way around. But this doesn't seem to work, reboot fails with: Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists So actually the version before this patch worked better, since 133 > 0 and we'd at least loop internally.	2016-11-14 11:49:49 +01:00
Christian Hesse	7debb05dbe	nspawn: fix condition for mounting resolv.conf (#4622 ) The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us whether or not systemd-resolved is running or not. So check for /run/systemd/resolve/resolv.conf as well, which is created at runtime and hence is a better indication.	2016-11-08 22:01:26 -05:00
Zbigniew Jędrzejewski-Szmek	a809cee582	Merge pull request #4612 from keszybz/format-strings Format string tweaks (and a small fix on 32bit)	2016-11-08 08:09:40 -05:00
Martin Pitt	cfed63f60d	nspawn: fix exit code for --help and --version (#4609 ) Commit `b006762` inverted the initial exit code which is relevant for --help and --version without a particular reason. For these special options, parse_argv() returns 0 so that our main() immediately skips to the end without adjusting "ret". Otherwise, if an actual container is being started, ret is set on error in run(), which still provides the "non-zero exit on error" behaviour. Fixes #4605.	2016-11-07 23:31:55 -05:00
Zbigniew Jędrzejewski-Szmek	f97b34a629	Rename formats-util.h to format-util.h We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.	2016-11-07 10:15:08 -05:00
Lennart Poettering	493fd52f1a	Merge pull request #4510 from keszybz/tree-wide-cleanups Tree wide cleanups	2016-11-03 13:59:20 -06:00
Lennart Poettering	2bce2acce8	nspawn: if we set up a loopback device, try to mount it with "discard" Let's make sure that our loopback files remain sparse, hence let's set "discard" as mount option on file systems that support it if the backing device is a loopback.	2016-11-02 11:39:49 -06:00
Evgeny Vereshchagin	6d66bd3b2a	nspawn: become a new root early `036d523641` > vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955 $ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide. Press ^] three times within 1s to kill container. Child died too early. Selected user namespace base 1073283072 and range 65536. Failed to mount to /sys/fs/cgroup/systemd: No such file or directory Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519 Fixes: #4352	2016-10-23 23:23:42 -04:00
Zbigniew Jędrzejewski-Szmek	605405c6cc	tree-wide: drop NULL sentinel from strjoin This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/.c) git grep -e '\bstrjoin\b.NULL' -l\|xargs sed -i -r 's/strjoin$(.*), NULL$/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.	2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek	24597ee0e6	nspawn, NEWS: add missing "s" in --private-users-chown (#4438 )	2016-10-21 06:03:26 +03:00
Evgeny Vereshchagin	f0bef277a4	nspawn: cleanup and chown the synced cgroup hierarchy (#4223 ) Fixes: #4181	2016-10-13 09:50:46 -04:00
Zbigniew Jędrzejewski-Szmek	60e76d4897	nspawn,mount-util: add [u]mount_verbose and use it in nspawn This makes it easier to debug failed nspawn invocations: Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY\|MS_NOSUID\|MS_NOEXEC\|MS_NODEV "")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID\|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID\|MS_NODEV\|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")... Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID\|MS_NODEV\|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")... Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")... Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY\|MS_NOSUID\|MS_NOEXEC\|MS_NODEV\|MS_BIND\|MS_REMOUNT "")... Mounting proc on /proc (MS_NOSUID\|MS_NOEXEC\|MS_NODEV "")... Bind-mounting /proc/sys on /proc/sys (MS_BIND "")... Remounting /proc/sys (MS_RDONLY\|MS_NOSUID\|MS_NOEXEC\|MS_NODEV\|MS_BIND\|MS_REMOUNT "")... Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")... Remounting /proc/sysrq-trigger (MS_RDONLY\|MS_NOSUID\|MS_NOEXEC\|MS_NODEV\|MS_BIND\|MS_REMOUNT "")... Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")... Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID\|MS_NOEXEC\|MS_NODEV\|MS_STRICTATIME "mode=755,uid=0,gid=0")... Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID\|MS_NOEXEC\|MS_NODEV "none,name=systemd,xattr")... Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID\|MS_NOEXEC\|MS_NODEV "none,name=systemd,xattr"): No such file or directory	2016-10-11 16:50:07 -04:00
Zbigniew Jędrzejewski-Szmek	ada5412039	nspawn: simplify arg_us_cgns passing We would check the condition cg_ns_supported() twice. No functional change.	2016-10-11 16:46:58 -04:00
Lennart Poettering	6dca2fe325	Merge pull request #4332 from keszybz/nspawn-arguments-3 nspawn --private-users parsing, v2	2016-10-10 19:51:51 +02:00
Evgeny Vereshchagin	a0f72a24e0	Merge pull request #4310 from keszybz/nspawn-autodetect Autodetect systemd version in containers started by systemd-nspawn	2016-10-10 20:47:25 +03:00
Zbigniew Jędrzejewski-Szmek	be7157316c	nspawn: better error messages for parsing errors In particular, the check for arg_uid_range <= 0 is moved to the end, so that "foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".	2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek	ae209204d8	nspawn,man: fix parsing of numeric args for --private-users, accept any boolean This is like the previous reverted commit, but any boolean is still accepted, not just "yes" and "no". Man page is adjusted to match the code.	2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek	6c2058b35e	Revert "nspawn: fix parsing of numeric arguments for --private-users" This reverts commit `bfd292ec35`.	2016-10-10 11:17:40 -04:00
Zbigniew Jędrzejewski-Szmek	bfd292ec35	nspawn: fix parsing of numeric arguments for --private-users The documentation says lists "yes", "no", "pick", and numeric arguments. But parse_boolean was attempted first, so various numeric arguments were misinterpreted. In particular, this fixes --private-users=0 to mean the same thing as --private-users=0:65536. While at it, use strndupa to avoid some error handling. Also give a better error for an empty UID range. I think it's likely that people will use --private-users=0:0 thinking that the argument means UID:GID.	2016-10-09 11:52:35 -04:00
Zbigniew Jędrzejewski-Szmek	27eb8e9028	nspawn: reindent table	2016-10-09 11:51:18 -04:00
Zbigniew Jędrzejewski-Szmek	a8725a06e6	nspawn: also fall back to legacy cgroup hierarchy for old containers Current systemd version detection routine cannot detect systemd 230, only systmed >= 231. This means that we'll still use the legacy hierarchy in some cases where we wouldn't have too. If somebody figures out a nice way to detect systemd 230 this can be later improved.	2016-10-08 19:03:53 -04:00
Zbigniew Jędrzejewski-Szmek	0fd9563fde	nspawn: use mixed cgroup hierarchy only when container has new systemd systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy. So make an educated guess, and use the mixed hierarchy in that case. Tested by running the host with mixed hierarchy (i.e. simply using a recent kernel with systemd from git), and booting first a container with older systemd, and then one with a newer systemd. Fixes #4008.	2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek	27e29a1e43	nspawn: fix spurious reboot if container process returns 133	2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek	b006762524	nspawn: move the main loop body out to a new function The new function has 416 lines by itself! "return log_error_errno" is used to nicely reduce the volume of error handling code. A few minor issues are fixed on the way: - positive value was used as error value (EIO), causing systemd-nspawn to return success, even though it shouldn't. - In two places random values were used as error status, when the actual value was in an unusual place (etc_password_lock, notify_socket). Those are the only functional changes. There is another potential issue, which is marked with a comment, and left unresolved: the container can also return 133 by itself, causing a spurious reboot.	2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek	98afd6af3a	nspawn: check env var first, detect second If we are going to use the env var to override the detection result anyway, there is not point in doing the detection, especially that it can fail.	2016-10-08 14:48:41 -04:00
Lennart Poettering	7429b2eb83	tree-wide: drop some misleading compiler warnings gcc at some optimization levels thinks thes variables were used without initialization. it's wrong, but let's make the message go anyway.	2016-10-06 19:04:10 +02:00
Djalal Harouni	41eb436265	nspawn: add log message to let users know that nspawn needs an empty /dev directory (#4226 ) Fixes https://github.com/systemd/systemd/issues/3695 At the same time it adds a protection against userns chown of inodes of a shared mount point.	2016-10-05 06:57:02 +02:00
Alban Crequy	19caffac75	nspawn: set shared propagation mode for the container	2016-10-03 14:19:27 +02:00
Evgeny Vereshchagin	cc238590e4	Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1 core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes	2016-09-28 04:50:30 +03:00
Torstein Husebø	d23a0044a3	treewide: fix typos (#4217 )	2016-09-26 11:32:47 +02:00
Lennart Poettering	6b7c9f8bce	namespace: rework how ReadWritePaths= is applied Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths= specification, then we'd first recursively apply the ReadOnlyPaths= paths, and make everything below read-only, only in order to then flip the read-only bit again for the subdirs listed in ReadWritePaths= below it. This is not only ugly (as for the dirs in question we first turn on the RO bit, only to turn it off again immediately after), but also problematic in containers, where a container manager might have marked a set of dirs read-only and this code will undo this is ReadWritePaths= is set for any. With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be applied to the children listed in ReadWritePaths= in the first place, so that we do not need to turn off the RO bit for those after all. This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the RO bit, but never to turn it off again. Or to say this differently: if some dirs are marked read-only via some external tool, then ReadWritePaths= will not undo it. This is not only the safer option, but also more in-line with what the man page currently claims: "Entries (files or directories) listed in ReadWritePaths= are accessible from within the namespace with the same access rights as from outside." To implement this change bind_remount_recursive() gained a new "blacklist" string list parameter, which when passed may contain subdirs that shall be excluded from the read-only mounting. A number of functions are updated to add more debug logging to make this more digestable.	2016-09-25 10:40:51 +02:00
Luca Bruno	48a8d337a6	nspawn: decouple --boot from CLONE_NEWIPC (#4180 ) This commit is a minor tweak after the split of `--share-system`, decoupling the `--boot` option from IPC namespacing. Historically there has been a single `--share-system` option for sharing IPC/PID/UTS with the host, which was incompatible with boot/pid1 mode. After the split, it is now possible to express the requirements with better granularity. For reference, this is a followup to #4023 which contains references to previous discussions. I realized too late that CLONE_NEWIPC is not strictly needed for boot mode.	2016-09-24 08:30:42 -04:00
Michael Pope	21dc02277d	nspawn: fix comment typo in setup_timezone example (#4183 )	2016-09-20 07:30:48 +02:00
Michael Pope	0b493a0263	nspawn: clarify log warning for /etc/localtime not being a symbolic link (#4163 )	2016-09-17 09:59:28 +02:00
Luca Bruno	0c582db0c6	nspawn: split down SYSTEMD_NSPAWN_SHARE_SYSTEM (#4023 ) This commit follows further on the deprecation path for --share-system, by splitting and gating each share-able namespace behind its own environment flag.	2016-08-26 00:08:26 +02:00
Tejun Heo	5da38d0768	core: use the unified hierarchy for the systemd cgroup controller hierarchy Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().	2016-08-17 17:44:36 -04:00
Tejun Heo	ca2f6384aa	core: rename cg_unified() to cg_all_unified() A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.	2016-08-15 18:13:36 -04:00
Lennart Poettering	07a1734a13	Merge pull request #3885 from keszybz/help-output Update help for "short-full" and shorten to 80 columns	2016-08-04 16:11:38 +02:00
Zbigniew Jędrzejewski-Szmek	90b4a64d77	nspawn,resolve: short --help output to fit within 80 columns make dist-check-help FTW!	2016-08-04 09:03:42 -04:00
Lennart Poettering	f7b7b3df9e	nspawn: if we can't mark the boot ID RO let's fail It's probably better to be safe here.	2016-08-03 14:52:16 +02:00
Lennart Poettering	a6b5216c7c	nspawn: deprecate --share-system support This removes the --share-system switch: from the documentation, the --help text as well as the command line parsing. It's an ugly option, given that it kinda contradicts the whole concept of PID namespaces that nspawn implements. Since it's barely ever used, let's just deprecate it and remove it from the options. It might be useful as a debugging option, hence the functionality is kept around for now, exposed via an undocumented $SYSTEMD_NSPAWN_SHARE_SYSTEM environment variable.	2016-08-03 14:52:16 +02:00
Lennart Poettering	3539724c26	nspawn: try to bind mount resolved's resolv.conf snippet into the container This has the benefit that the container can follow the host's DNS server changes without us having to constantly update the container's resolv.conf settings.	2016-08-03 14:52:16 +02:00
Christian Brauner	5a8ff0e61d	nspawn: add SYSTEMD_NSPAWN_USE_CGNS env variable (#3809 ) SYSTEMD_NSPAWN_USE_CGNS allows to disable the use of cgroup namespaces.	2016-07-26 16:49:15 +02:00
Zbigniew Jędrzejewski-Szmek	e28973ee18	Merge pull request #3757 from poettering/efi-search	2016-07-25 16:34:18 -04:00
Lennart Poettering	1a0b98c437	Merge pull request #3589 from brauner/cgroup_namespace Cgroup namespace	2016-07-25 22:23:00 +02:00
Zbigniew Jędrzejewski-Szmek	476b8254d9	nspawn: don't skip cleanup on locking error	2016-07-22 21:25:09 -04:00
Lennart Poettering	15b1248a6b	machine-id-setup: port machine_id_commit() to new id128-util.c APIs	2016-07-22 12:59:36 +02:00
Lennart Poettering	317feb4d9f	nspawn: rework /etc/machine-id handling With this change we'll no longer write to /etc/machine-id from nspawn, as that breaks the --volatile= operation, as it ensures the image is never considered in "first boot", since that's bound to the pre-existance of /etc/machine-id. The new logic works like this: - If /etc/machine-id already exists in the container, it is read by nspawn and exposed in "machinectl status" and friends. - If the file doesn't exist yet, but --uuid= is passed on the nspawn cmdline, this UUID is passed in $container_uuid to PID 1, and PID 1 is then expected to persist this to /etc/machine-id for future boots (which systemd already does). - If the file doesn#t exist yet, and no --uuid= is passed a random UUID is generated and passed via $container_uuid. The result is that /etc/machine-id is never initialized by nspawn itself, thus unbreaking the volatile mode. However still the machine ID configured in the machine always matches nspawn's and thus machined's idea of it. Fixes: #3611	2016-07-22 12:59:36 +02:00
Lennart Poettering	691675ba9f	nspawn: rework machine/boot ID handling code to use new calls from id128-util.[ch]	2016-07-22 12:59:36 +02:00
Lennart Poettering	910fd145f4	sd-id128: split UUID file read/write code into new id128-util.[ch] We currently have code to read and write files containing UUIDs at various places. Unify this in id128-util.[ch], and move some other stuff there too. The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/), because they are actually the backend of sd_id128_get_machine() and sd_id128_get_boot(). In follow-up patches we can use this reduce the code in nspawn and machine-id-setup by adopted the common implementation.	2016-07-22 12:59:36 +02:00
Lennart Poettering	3bbaff3e08	tree-wide: use sd_id128_is_null() instead of sd_id128_equal where appropriate It's a bit easier to read because shorter. Also, most likely a tiny bit faster.	2016-07-22 12:38:08 +02:00
Lennart Poettering	a6bc7db980	nspawn: if an ESP is part of the disk image to operate on, mount it to /efi or /boot Matching the behaviour of gpt-auto-generator, if we find an ESP while dissecting a container image, mount it to /efi or /boot if those dirs exist and are empty. This should enable us to run "bootctl" inside a container and do the right thing.	2016-07-21 11:10:35 +02:00
Lennart Poettering	1ddc1272e7	nspawn: when netns is on, mount /proc/sys/net writable Normally we make all of /proc/sys read-only in a container, but if we do have netns enabled we can make /proc/sys/net writable, as things are virtualized then.	2016-07-20 14:53:15 +02:00
Lennart Poettering	065d31c360	nspawn: document why the uid shift range is the way it is	2016-07-20 14:53:15 +02:00
Thomas Hindoe Paaboel Andersen	ba19c6e181	treewide: remove unused variables	2016-07-18 22:32:08 +02:00
Zbigniew Jędrzejewski-Szmek	2ed968802c	tree-wide: get rid of selinux_context_t (#3732 ) `9eb9c93275` deprecated selinux_context_t. Replace with a simple char* everywhere. Alternative fix for #3719.	2016-07-15 18:44:02 +02:00
Michael Biebl	595bfe7df2	Various fixes for typos found by lintian (#3705 )	2016-07-12 12:52:11 +02:00
Christian Brauner	0996ef00fb	nspawn: handle cgroup namespaces (NOTE: Cgroup namespaces work with legacy and unified hierarchies: "This is completely backward compatible and will be completely invisible to any existing cgroup users (except for those running inside a cgroup namespace and looking at /proc/pid/cgroup of tasks outside their namespace.)" (https://lists.linuxfoundation.org/pipermail/containers/2016-January/036582.html) So there is no need to special case unified.) If cgroup namespaces are supported we skip mount_cgroups() in the outer_child(). Instead, we unshare(CLONE_NEWCGROUP) in the inner_child() and only then do we call mount_cgroups(). The clean way to handle cgroup namespaces would be to delegate mounting of cgroups completely to the init system in the container. However, this would likely break backward compatibility with the UNIFIED_CGROUP_HIERARCHY flag of systemd-nspawn. Also no cgroupfs would be mounted whenever the user simply requests a shell and no init is available to mount cgroups. Hence, we introduce mount_legacy_cgns_supported(). After calling unshare(CLONE_NEWCGROUP) it parses /proc/self/cgroup to find the mounted controllers and mounts them inside the new cgroup namespace. This should preserve backward compatibility with the UNIFIED_CGROUP_HIERARCHY flag and mount a cgroupfs when no init in the container is running.	2016-07-09 06:34:11 +02:00
Lennart Poettering	50b52222f2	nspawn: order caps to retain alphabetically	2016-06-13 16:25:54 +02:00
Alessandro Puccetti	9c1e04d0fa	nspawn: introduce --notify-ready=[no\|yes] (#3474 ) This the patch implements a notificaiton mechanism from the init process in the container to systemd-nspawn. The switch --notify-ready=yes configures systemd-nspawn to wait the "READY=1" message from the init process in the container to send its own to systemd. --notify-ready=no is equivalent to the previous behavior before this patch, systemd-nspawn notifies systemd with a "READY=1" message when the container is created. This notificaiton mechanism uses socket file with path relative to the contanier "/run/systemd/nspawn/notify". The default values it --notify-ready=no. It is also possible to configure this mechanism from the .nspawn files using NotifyReady. This parameter takes the same options of the command line switch. Before this patch, systemd-nspawn notifies "ready" after the inner child was created, regardless the status of the service running inside it. Now, with --notify-ready=yes, systemd-nspawn notifies when the service is ready. This is really useful when there are dependencies between different contaniers. Fixes https://github.com/systemd/systemd/issues/1369 Based on the work from https://github.com/systemd/systemd/pull/3022 Testing: Boot a OS inside a container with systemd-nspawn. Note: modify the commands accordingly with your filesystem. 1. Create a filesystem where you can boot an OS. 2. sudo systemd-nspawn -D ${HOME}/distros/fedora-23/ sh 2.1. Create the unit file /etc/systemd/system/sleep.service inside the container (You can use the example below) 2.2. systemdctl enable sleep 2.3 exit 3. sudo systemd-run --service-type=notify --unit=notify-test ${HOME}/systemd/systemd-nspawn --notify-ready=yes -D ${HOME}/distros/fedora-23/ -b 4. In a different shell run "systemctl status notify-test" When using --notify-ready=yes the service status is "activating" for 20 seconds before being set to "active (running)". Instead, using --notify-ready=no the service status is marked "active (running)" quickly, without waiting for the 20 seconds. This patch was also test with --private-users=yes, you can test it just adding it at the end of the command at point 3. ------ sleep.service ------ [Unit] Description=sleep After=network.target [Service] Type=oneshot ExecStart=/bin/sleep 20 [Install] WantedBy=multi-user.target ------------ end ------------	2016-06-10 13:09:06 +02:00
Michael Karcher	8869a0b40b	util-lib: Add sparc64 support for process creation (#3348 ) The current raw_clone function takes two arguments, the cloning flags and a pointer to the stack for the cloned child. The raw cloning without passing a "thread main" function does not make sense if a new stack is specified, as it returns in both the parent and the child, which will fail in the child as the stack is virgin. All uses of raw_clone indeed pass NULL for the stack pointer which indicates that both processes should share the stack address (so you better don't pass CLONE_VM). This commit refactors the code to not require the caller to pass the stack address, as NULL is the only sensible option. It also adds the magic code needed to make raw_clone work on sparc64, which does not return 0 in %o0 for the child, but indicates the child process by setting %o1 to non-zero. This refactoring is not plain aesthetic, because non-NULL stack addresses need to get mangled before being passed to the clone syscall (you have to apply STACK_BIAS), whereas NULL must not be mangled. Implementing the conditional mangling of the stack address would needlessly complicate the code. raw_clone is moved to a separete header, because the burden of including the assert machinery and sched.h shouldn't be applied to every user of missing_syscalls.h	2016-05-29 20:03:51 -04:00
Djalal Harouni	520e0d541f	nspawn: rename arg_retain to arg_caps_retain The argument is about capabilities.	2016-05-26 22:43:34 +02:00
Djalal Harouni	f011b0b87a	nspawn: split out seccomp call into nspawn-seccomp.[ch] Split seccomp into nspawn-seccomp.[ch]. Currently there are no changes, but this will make it easy in the future to share or use the seccomp logic from systemd core.	2016-05-26 22:42:29 +02:00
Zbigniew Jędrzejewski-Szmek	b5a2179b10	nspawn: remove unreachable return statement (#3320 )	2016-05-22 13:02:41 +02:00
Lennart Poettering	2099b3e993	nspawn: drop spurious newline	2016-05-12 20:14:58 +02:00
Lennart Poettering	7513c5b89f	nspawn: only remove veth links we created ourselves Let's make sure we don't remove veth links that existed before nspawn was invoked. https://github.com/systemd/systemd/pull/3209#discussion_r62439999	2016-05-09 15:45:31 +02:00
Lennart Poettering	22b28dfdc7	nspawn: add new --network-zone= switch for automatically managed bridge devices This adds a new concept of network "zones", which are little more than bridge devices that are automatically managed by nspawn: when the first container referencing a bridge is started, the bridge device is created, when the last container referencing it is removed the bridge device is removed again. Besides this logic --network-zone= is pretty much identical to --network-bridge=. The usecase for this is to make it easy to run multiple related containers (think MySQL in one and Apache in another) in a common, named virtual Ethernet broadcast zone, that only exists as long as one of them is running, and fully automatically managed otherwise.	2016-05-09 15:45:31 +02:00
Lennart Poettering	ef76dff225	util-lib: add new ifname_valid() call that validates interface names Make use of this in nspawn at a couple of places. A later commit should port more code over to this, including networkd.	2016-05-09 15:45:31 +02:00
Zbigniew Jędrzejewski-Szmek	5ab1cef0db	Merge pull request #3111 from poettering/nspawn-remove-veth	2016-05-03 13:53:00 -04:00
Zbigniew Jędrzejewski-Szmek	c29f959b44	Revert "nspawn: explicitly remove veth links after use (#3111 )" This reverts commit `d2773e59de`. Merge got squashed by mistake.	2016-05-03 13:53:00 -04:00
Evgeny Vereshchagin	e192a2815e	nspawn: convert uuid to string (#3146 ) Fixes: cp /etc/machine-id /var/tmp/systemd-test.HccKPa/nspawn-root/etc systemd-nspawn -D /var/tmp/systemd-test.HccKPa/nspawn-root --link-journal host -b ... Host and machine ids are equal (P�S!V): refusing to link journals	2016-04-29 10:38:35 +02:00
Evgeny Vereshchagin	5aa3eba50c	nspawn: initialize the veth_name (#3141 ) Fixes: $ systemd-nspawn -h ... Failed to remove veth interface ��: Operation not permitted This is a follow-up for `d2773e59de`	2016-04-28 19:48:17 +02:00
Lennart Poettering	d7fe83bbc2	Merge pull request #3093 from poettering/nspawn-userns-magic nspawn automatic user namespaces	2016-04-26 14:57:04 +02:00
Lennart Poettering	d2773e59de	nspawn: explicitly remove veth links after use (#3111 ) * sd-netlink: permit RTM_DELLINK messages with no ifindex This is useful for removing network interfaces by name. * nspawn: explicitly remove veth links we created after use Sometimes the kernel keeps veth links pinned after the namespace they have been joined to died. Let's hence explicitly remove veth links after use. Fixes: #2173	2016-04-25 17:36:51 +02:00
Lennart Poettering	ef3b2aa7a1	nspawn: explicitly remove veth links we created after use Sometimes the kernel keeps veth links pinned after the namespace they have been joined to died. Let's hence explicitly remove veth links after use. Fixes: #2173	2016-04-25 13:44:24 +02:00
Lennart Poettering	ccabee0d64	nspawn: make -U a tiny bit smarter With this change -U will turn on user namespacing only if the kernel actually supports it and otherwise gracefully degrade to non-userns mode.	2016-04-25 12:16:02 +02:00
Lennart Poettering	0de7accea9	nspawn: allow configuration of user namespaces in .nspawn files In order to implement this we change the bool arg_userns into an enum UserNamespaceMode, which can take one of NO, PICK or FIXED, and replace the arg_uid_range_pick bool with it.	2016-04-25 12:16:02 +02:00
Lennart Poettering	19aac838fc	nspawn: add -U as shortcut for --private-users=pick Given that user namespacing is pretty useful now, let's add a shortcut command line switch for the logic.	2016-04-25 12:16:02 +02:00
Lennart Poettering	0e7ac7515f	nspawn: optionally, automatically allocate a UID/GID range for userns containers This adds the new value "pick" to --private-users=. When specified a new UID/GID range of 65536 users is automatically and randomly allocated from the host range 0x00080000-0xDFFF0000 and used for the container. The setting implies --private-users-chown, so that container directory is recursively chown()ed to the newly allocated UID/GID range, if that's necessary. As an optimization before picking a randomized UID/GID the UID of the container's root directory is used as starting point and used if currently not used otherwise. To protect against using the same UID/GID range multiple times a few mechanisms are in place: - The first and the last UID and GID of the range are checked with getpwuid() and getgrgid(). If an entry already exists a different range is picked. Note that by "last" UID the user 65534 is used, as 65535 is the 16bit (uid_t) -1. - A lock file for the range is taken in /run/systemd/nspawn-uid/. Since the ranges are taken in a non-overlapping fashion, and always start on 64K boundaries this allows us to maintain a single lock file for each range that can be randomly picked. This protects nspawn from picking the same range in two parallel instances. - If possible the /etc/passwd lock file is taken while a new range is selected until the container is up. This means adduser/addgroup should safely avoid the range as long as nss-mymachines is used, since the allocated range will then show up in the user database. The UID/GID range nspawn picks from is compiled in and not configurable at the moment. That should probably stay that way, since we already provide ways how users can pick their own ranges manually if they don't like the automatic logic. The new --private-users=pick logic makes user namespacing pretty useful now, as it relieves the user from managing UID/GID ranges.	2016-04-25 12:16:02 +02:00
Lennart Poettering	7336138eed	nspawn: optionally fix up OS tree uid/gids for userns This adds a new --private-userns-chown switch that may be used in combination with --private-userns. If it is passed a recursive chmod() operation is run on the OS tree, fixing all file owner UID/GIDs to the right ranges. This should make user namespacing pretty workable, as the OS trees don't need to be prepared manually anymore.	2016-04-25 12:15:57 +02:00
Thomas H. P. Andersen	0f5e13822d	tree-wide: remove unused variables (#3098 )	2016-04-22 20:49:07 -04:00
Zbigniew Jędrzejewski-Szmek	ccddd104fc	tree-wide: use mdash instead of a two minuses	2016-04-21 23:00:13 -04:00
Zbigniew Jędrzejewski-Szmek	a5f1cb3bad	nspawn: add -E as alias for --setenv v2: - "=" is required, so remove the <optional> tags that v1 added	2016-04-20 09:00:39 -04:00
Lennart Poettering	70a399c43a	Merge pull request #3014 from msekletar/nspawn-empty-machine-id-v3 nspawn: always setup machine id (v3)	2016-04-11 17:27:11 +02:00
Michal Sekletar	e01ff70a77	nspawn: always setup machine id We check /etc/machine-id of the container and if it is already populated we use value from there, possibly ignoring value of --uuid option from the command line. When dealing with R/O image we setup transient machine id. Once we determined machine id of the container, we use this value for registration with systemd-machined and we also export it via container_uuid environment variable. As registration with systemd-machined is done by the main nspawn process we communicate container machine id established by setup_machine_id from outer child to the main process by unix domain socket. Similarly to PID of inner child.	2016-04-11 16:43:16 +02:00
Zbigniew Jędrzejewski-Szmek	d929b0f98b	nspawn: ignore failure to chdir CID #1322380.	2016-04-08 21:09:06 -04:00
Evgeny Vereshchagin	1c1ea21735	nspawn: don't run nspawn --port=... without libiptc support We get $ systemd-nspawn --image /dev/loop1 --port 8080:80 -n -b 3 --port= is not supported, compiled without libiptc support. instead of a ping-nc-iptables debugging session	2016-03-17 21:07:11 +00:00
Dan Walsh	68b020494d	/dev/console must be labeled with SELinux label If the user specifies an selinux_apifs_context all content created in the container including /dev/console should use this label. Currently when this uses the default label it gets labeled user_devpts_t, which would require us to write a policy allowing container processes to manage user_devpts_t. This means that an escaped process would be allowed to attack all users terminals as well as other container terminals. Changing the label to match the apifs_context, means the processes would only be allowed to manage their specific tty. This change fixes a problem preventing RKT containers from working with systemd-nspawn.	2016-03-09 11:19:45 -05:00
Vito Caputo	9ed794a32d	tree-wide: minor formatting inconsistency cleanups	2016-02-23 14:20:34 -08:00
Vito Caputo	313cefa1d9	tree-wide: make ++/-- usage consistent WRT spacing Throughout the tree there's spurious use of spaces separating ++ and -- operators from their respective operands. Make ++ and -- operator consistent with the majority of existing uses; discard the spaces.	2016-02-22 20:32:04 -08:00
Lennart Poettering	91ba5ac7d0	Merge pull request #2589 from keszybz/resolve-tool-2 Better support of OPENPGPKEY, CAA, TLSA packets and tests	2016-02-13 11:15:41 +01:00
Zbigniew Jędrzejewski-Szmek	75f32f047c	Add memcpy_safe ISO/IEC 9899:1999 §7.21.1/2 says: Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. In base64_append_width memcpy was called as memcpy(x, NULL, 0). GCC 4.9 started making use of this and assumes This worked fine under -O0, but does something strange under -O3. This patch fixes a bug in base64_append_width(), fixes a possible bug in journal_file_append_entry_internal(), and makes use of the new function to simplify the code in other places.	2016-02-11 13:07:02 -05:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Lennart Poettering	2b26a72816	nspawn: make sure --help fits it 79ch	2016-02-03 23:58:25 +01:00
Lennart Poettering	7732f92bad	nspawn: optionally run a stub init process as PID 1 This adds a new switch --as-pid2, which allows running commands as PID 2, while a stub init process is run as PID 1. This is useful in order to run arbitrary commands in a container, as PID1's semantics are different from all other processes regarding reaping of unknown children or signal handling.	2016-02-03 23:58:24 +01:00
Lennart Poettering	5f932eb9af	nspawn: add new --chdir= switch Fixes: #2192	2016-02-03 23:58:24 +01:00
Lennart Poettering	ba8e6c4d0e	nspawn: make sure --link-journal=host may be used twice in a row Fixes #2186 This fixes fall-out from `574edc9006`.	2016-01-28 20:24:28 +01:00
Lennart Poettering	8054d749c4	nspawn: make journal linking non-fatal in try and auto modes Fixes #2091	2016-01-28 20:16:44 +01:00
Michal Sekletar	61e741ed3d	nspawn: fix memory leak	2016-01-25 12:06:38 +01:00
Ismo Puustinen	a103496ca5	capabilities: keep bounding set in non-inverted format. Change the capability bounding set parser and logic so that the bounding set is kept as a positive set internally. This means that the set reflects those capabilities that we want to keep instead of drop.	2016-01-12 12:14:50 +02:00
Lennart Poettering	4afd3348c7	tree-wide: expose "p"-suffix unref calls in public APIs to make gcc cleanup easy GLIB has recently started to officially support the gcc cleanup attribute in its public API, hence let's do the same for our APIs. With this patch we'll define an xyz_unrefp() call for each public xyz_unref() call, to make it easy to use inside a __attribute__((cleanup())) expression. Then, all code is ported over to make use of this. The new calls are also documented in the man pages, with examples how to use them (well, I only added docs where the _unref() call itself already had docs, and the examples, only cover sd_bus_unrefp() and sd_event_unrefp()). This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we tend to call our destructors these days. Note that this defines no public macro that wraps gcc's attribute and makes it easier to use. While I think it's our duty in the library to make our stuff easy to use, I figure it's not our duty to make gcc's own features easy to use on its own. Most likely, client code which wants to make use of this should define its own: #define _cleanup_(function) __attribute__((cleanup(function))) Or similar, to make the gcc feature easier to use. Making this logic public has the benefit that we can remove three header files whose only purpose was to define these functions internally. See #2008.	2015-11-27 19:19:36 +01:00
Lennart Poettering	4a0b58c4a3	tree-wide: use right cast macros for UIDs, GIDs and PIDs	2015-11-17 00:52:10 +01:00
Lennart Poettering	f6d6bad146	nspawn: add new --network-veth-extra= switch for defining additional veth links The new switch operates like --network-veth, but may be specified multiple times (to define multiple link pairs) and allows flexible definition of the interface names. This is an independent reimplementation of #1678, but defines different semantics, keeping the behaviour completely independent of --network-veth. It also comes will full hook-up for .nspawn files, and the matching documentation.	2015-11-12 22:04:49 +01:00
Daniel Mack	b0bc8dbd73	Merge pull request #1820 from michich/errno-v2 [v2] treewide: treatment of errno and other cleanups	2015-11-09 21:56:49 +01:00
Michal Schmidt	e1427b138f	treewide: apply errno.cocci with small manual cleanups for style.	2015-11-09 20:01:06 +01:00
Lennart Poettering	6c9e781eba	Merge pull request #1799 from jengelh/doc doc: typo and ortho fixes	2015-11-09 18:16:21 +01:00
Iago López Galeiras	6aadfa4c52	nspawn: support custom container service name We were hardcoding "systemd-nspawn" as the value of the $container env variable and "nspawn" as the service string in machined registration. This commit allows the user to configure it by setting the $SYSTEMD_NSPAWN_CONTAINER_SERVICE env variable when calling systemd-nspawn. If $SYSTEMD_NSPAWN_CONTAINER_SERVICE is not set, we use the string "systemd-nspawn" for both, fixing the previous inconsistency.	2015-11-09 16:40:05 +01:00
Jan Engelhardt	a8eaaee72a	doc: correct orthography, word forms and missing/extraneous words	2015-11-06 13:45:21 +01:00
Jan Engelhardt	b938cb902c	doc: correct punctuation and improve typography in documentation	2015-11-06 13:00:02 +01:00
Michal Schmidt	35607a8d1c	nspawn: save errno before reopening log after exec failure	2015-11-05 13:44:12 +01:00
Michal Schmidt	070edd97f3	nspawn: no fake errno The S_ISREG test does not set errno, so don't use it in the error message.	2015-11-05 13:44:11 +01:00
Michal Schmidt	4314d33f51	nspawn: simplify error returns Use the "return log_error_errno(...)" idiom to have fewer curly braces. The last hunk also fixes the return value of setup_journal(), but the fix has no practical effect.	2015-11-05 13:44:10 +01:00
Michal Schmidt	709f6e46a3	treewide: use the negative error codes returned by our functions Our functions return negative error codes. Do not rely on errno being set after calling our own functions.	2015-11-05 13:44:06 +01:00
Lennart Poettering	97044145b4	core,nspawn: minor coding style fixes	2015-10-31 19:09:20 +01:00
Susant Sahani	6cbe4ed1e1	nspwan: port to extract_first_word	2015-10-28 22:59:01 +05:30
Lennart Poettering	b5efdb8af4	util-lib: split out allocation calls into alloc-util.[ch]	2015-10-27 13:45:53 +01:00
Lennart Poettering	15a5e95075	util-lib: split out printf() helpers to stdio-util.h	2015-10-27 13:25:57 +01:00
Lennart Poettering	430f0182b7	src/basic: rename audit.[ch] → audit-util.[ch] and capability.[ch] → capability-util.[ch] The files are named too generically, so that they might conflict with the upstream project headers. Hence, let's add a "-util" suffix, to clarify that this are just our utility headers and not any official upstream headers.	2015-10-27 13:25:57 +01:00
Lennart Poettering	affb60b1ef	util-lib: split out umask-related code to umask-util.h	2015-10-27 13:25:56 +01:00
Lennart Poettering	8fcde01280	util-lib: split stat()/statfs()/stavfs() related calls into stat-util.[ch]	2015-10-27 13:25:56 +01:00
Lennart Poettering	f4f15635ec	util-lib: move a number of fs operations into fs-util.[ch]	2015-10-27 13:25:56 +01:00
Lennart Poettering	4349cd7c1d	util-lib: move mount related utility calls to mount-util.[ch]	2015-10-27 13:25:55 +01:00
Lennart Poettering	6bedfcbb29	util-lib: split string parsing related calls from util.[ch] into parse-util.[ch]	2015-10-27 13:25:55 +01:00
Lennart Poettering	2583fbea8e	socket-util: move remaining socket-related calls from util.[ch] to socket-util.[ch]	2015-10-26 01:24:39 +01:00
Lennart Poettering	b1d4f8e154	util-lib: split out user/group/uid/gid calls into user-util.[ch]	2015-10-26 01:24:38 +01:00
Lennart Poettering	3ffd4af220	util-lib: split out fd-related operations into fd-util.[ch] There are more than enough to deserve their own .c file, hence move them over.	2015-10-25 13:19:18 +01:00
Lennart Poettering	07630cea1f	util-lib: split our string related calls from util.[ch] into its own file string-util.[ch] There are more than enough calls doing string manipulations to deserve its own files, hence do something about it. This patch also sorts the #include blocks of all files that needed to be updated, according to the sorting suggestions from CODING_STYLE. Since pretty much every file needs our string manipulation functions this effectively means that most files have sorted #include blocks now. Also touches a few unrelated include files.	2015-10-24 23:05:02 +02:00
Lennart Poettering	0f03c2a4c0	path-util: unify how we process paths specified on the command line Let's introduce a common function that makes relative paths absolute and warns about any errors while doing so.	2015-10-24 23:03:49 +02:00
Lennart Poettering	0f47436510	util-lib: get_current_dir_name() can return errors other than ENOMEM get_current_dir_name() can return a variety of errors, not just ENOMEM, hence don't blindly turn its errors to ENOMEM, but return correct errors in path_make_absolute_cwd(). This trickles down into a couple of other functions, some of which receive unrelated minor fixes too with this commit.	2015-10-24 23:03:49 +02:00
Lennart Poettering	16fb773ee3	nspawn: don't try to resolve passed binary before entering namespace Othewise we might follow the symlinks on the host, instead of the container. Fixes #1400	2015-10-22 01:59:25 +02:00
Lennart Poettering	0e2656744f	nspawn: rework how we determine private networking settings Make sure we acquire CAP_NET_ADMIN if we require virtual networking. Make sure we imply virtual ethernet correctly when bridge is request. Fixes: #1511 Fixes: #1554 Fixes: #1590	2015-10-22 01:59:25 +02:00

... 3 4 5 6 7 ...

829 commits