Systemd

Commit Graph

Author	SHA1	Message	Date
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Lennart Poettering	4526113f57	dissect: add dissect_image_and_warn() that unifies error message generation for dissect_image() (#8517 )	2018-03-21 12:10:01 +01:00
Zbigniew Jędrzejewski-Szmek	0441378080	nspawn: move network namespace creation to a separate step (#8430 ) Fixes #8427. Unsharing the namespace in a separate step changes the ownership of /proc/net/ip_tables_names (and related files) from nobody:nobody to root:root. See [1] and [2] for all the details. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f13f2aeed154da8e48f90b85e720f8ba39b1e881 [2] https://bugzilla.netfilter.org/show_bug.cgi?id=1064#c9	2018-03-20 18:07:17 +01:00
Lennart Poettering	2b33ab0957	tree-wide: port various places over to use new rearrange_stdio()	2018-03-02 11:42:10 +01:00
Zbigniew Jędrzejewski-Szmek	8405dcf752	nspawn: make sure we don't leak the fd in chase_symlinks_and_update No callers use CHASE_OPEN right now, but let's be defensive.	2018-02-15 10:18:25 +01:00
Lennart Poettering	d72495759b	tree-wide: port all code to use safe_getcwd()	2018-01-17 11:17:38 +01:00
Lennart Poettering	75152a4d6a	tree-wide: install matches asynchronously Let's remove a number of synchronization points from our service startups: let's drop synchronous match installation, and let's opt for asynchronous instead. Also, let's use sd_bus_match_signal() instead of sd_bus_add_match() where we can.	2018-01-05 13:58:32 +01:00
Lennart Poettering	d2e0ac3d1e	tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork()	2018-01-04 13:27:27 +01:00
Lennart Poettering	7d4904fe7a	process-util: rework wait_for_terminate_and_warn() to take a flags parameter This renames wait_for_terminate_and_warn() to wait_for_terminate_and_check(), and adds a flags parameter, that controls how much to log: there's one flag that means we log about abnormal stuff, and another one that controls whether we log about non-zero exit codes. Finally, there's a shortcut flag value for logging in both cases, as that's what we usually use. All callers are accordingly updated. At three occasions duplicate logging is removed, i.e. where the old function was called but logged in the caller, too.	2018-01-04 13:27:27 +01:00
Zbigniew Jędrzejewski-Szmek	dae8b82eb9	Add mkdir_errno_wrapper() and use instead of mkdir() in various places We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir returns the error in errno while mkdir_label returns the error directly.	2017-12-16 13:28:22 +01:00
Zbigniew Jędrzejewski-Szmek	bdd2bbc445	Merge pull request #7469 from kinvolk/dongsu/nspawn-netns nspawn: introduce an option for specifying network namespace path	2017-12-14 22:47:57 +01:00
Lennart Poettering	fbd0b64f44	tree-wide: make use of new STRLEN() macro everywhere (#7639 ) Let's employ coccinelle to do this for us. Follow-up for #7625.	2017-12-14 19:02:29 +01:00
Dongsu Park	d7bea6b629	nspawn: introduce an option for specifying network namespace path Add a new option `--network-namespace-path` to systemd-nspawn to allow users to specify an arbitrary network namespace, e.g. `/run/netns/foo`. Then systemd-nspawn will open the netns file, pass the fd to outer_child, and enter the namespace represented by the fd before running inner_child. ``` $ sudo ip netns add foo $ mount \| grep /run/netns/foo nsfs on /run/netns/foo type nsfs (rw) ... $ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \ /bin/readlink -f /proc/self/ns/net /proc/1/ns/net:[4026532009] ``` Note that the option `--network-namespace-path=` cannot be used together with other network-related options such as `--private-network` so that the options do not conflict with each other. Fixes https://github.com/systemd/systemd/issues/7361	2017-12-13 10:21:06 +00:00
Lennart Poettering	fba868fa71	tree-wide: unify logging of "Must be root" message Let's unify this in one call, generalizing must_be_root() from bootctl.c.	2017-12-11 23:19:45 +01:00
Lennart Poettering	8fd010bb1b	nspawn: turn on watchdog logic for nspawn too It's a long-running daemon, and it's easy to enable, hence do it.	2017-12-07 12:34:46 +01:00
Lennart Poettering	87d5e4f286	build-sys: make the dynamic UID range, and the container UID range configurable Also, export these ranges in our pkg-config files.	2017-12-06 12:55:37 +01:00
Lennart Poettering	de54e02d5e	nspawn: when in hybrid mode, chown() both the legacy and the unified hierarchy to the root in the container If user namespacing is used, let's make sure that the root user in the container gets access to both /sys/fs/cgroup/systemd and /sys/fs/cgroup/unified. This matches similar logic in cg_set_access().	2017-12-05 13:49:13 +01:00
Lennart Poettering	2d3a5a73e0	nspawn: make sure images containing an ESP are compatible with userns -U mode In -U mode we might need to re-chown() all files and directories to match the UID shift we want for the image. That's problematic on fat partitions, such as the ESP (and which is generated by mkosi's --bootable switch), because fat of course knows no UID/GID file ownership natively. With this change we take benefit of the uid= and gid= mount options FAT knows: instead of chown()ing all files and directories we can just specify the right UID/GID to use at mount time. This beefs up the image dissection logic in two ways: 1. First of all support for mounting relevant file systems with uid=/gid= is added: when a UID is specified during mount it is used for all applicable file systems. 2. Secondly, two new mount flags are added: DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY. If one is specified the mount routine will either only mount the root partition of an image, or all partitions except the root partition. This is used by nspawn: first the root partition is mounted, so that we can determine the UID shift in use so far, based on ownership of the image's root directory. Then, we mount the remaining partitions in a second go, this time with the right UID/GID information.	2017-12-05 13:49:12 +01:00
Lennart Poettering	8199d554c1	nspawn: figure out cgroup mode after mounting image If we operate on a disk image (i.e. --image=) then it's pointless to look into the mount directory before it is actually mounted to see which systemd version is running inside... Unfortunately we only mount the disk image in the child process, but the parent needs to know the cgroup mode, hence add some IPC for this purpose and communicate the cgroup mode determined from the image back to the parent.	2017-12-05 13:49:12 +01:00
Yu Watanabe	62b1e758d3	nspawn: adjust path to static resolv.conf to support split usr Fixes #7302.	2017-11-25 21:11:07 +09:00
Lennart Poettering	d381c8a6bf	nspawn: hash the machine name, when looking for a suitable UID base (#7437 ) When "-U" is used we look for a UID range we can use for our container. We start with the UID the tree is already assigned to, and if that didn't work we'd pick random ranges so far. With this change we'll first try to hash a suitable range from the container name, and use that if it works, in order to make UID assignments more likely to be stable. This follows a similar logic PID 1 follows when using DynamicUser=1.	2017-11-24 20:57:19 +01:00
Lennart Poettering	abdb9b08f6	nspawn: make use of the RequestStop logic of scope units Since time began, scope units had a concept of "Controllers", a bus peer that would be notified when somebody requested a unit to stop. None of our code used that facility so far, let's change that. This way, nspawn can print a nice message when somebody invokes "systemctl stop" on the container's scope unit, and then react with the right action to shut it down.	2017-11-23 21:47:48 +01:00
Shawn Landden	4831981d89	tree-wide: adjust fall through comments so that gcc is happy Distcc removes comments, making the comment silencing not work. I know there was a decision against a macro in commit `ec251fe7d5`	2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	3603efdea5	nspawn: make recursive chown()ing logic safe for being aborted in the middle We currently use the ownership of the top-level directory as a hint whether we need to descent into the whole tree to chown() it recursively or not. This is problematic with the previous chown()ing algorithm, as when descending into the tree we'd first chown() and then descend further down, which meant that the top-level directory would be chowned first, and an aborted recursive chowning would appear on the next invocation as successful, even though it was not. Let's reshuffle things a bit, to make the re-chown()ing safe regarding interruptions: a) We chown() the dir we are looking at last, and descent into all its children first. That way we know that if the top-level dir is properly owned everything inside of it is properly owned too. b) Before starting a chown()ing operation, we mark the top-level directory as owned by a special "busy" UID range, which we can use to recognize whether a tree was fully chowned: if it is marked as busy, it's definitely not fully chowned, as the busy ownership will only be fixed as final step of the chowning. Fixes: #6292	2017-11-17 11:12:33 +01:00
Lennart Poettering	0986658d51	Merge pull request #6866 from sourcejedi/set-linger2 logind: fix `loginctl enable-linger`	2017-11-15 11:15:15 +01:00
Lennart Poettering	759aaedc5c	dissect: when we invoke dissection on a loop device with partscan help the user This adds some simply detection logic for cases where dissection is invoked on an externally created loop device, and partitions have been detected on it, but partition scanning so far was off. If this is detected we now print a brief message indicating what the issue is, instead of failing with a useless EINVAL message the kernel passed to us.	2017-10-26 17:54:56 +02:00
Lennart Poettering	eb38edce88	machine-image: add partial discovery of block devices as images This adds some basic discovery of block device images for nspawn and friends. Note that this doesn't add searching for block devices using udev, but instead expects users to symlink relevant block devices into /var/lib/machines. Discovery is hence done exactly like for dir/subvol/raw file images, except that what is found may be a (symlink to) a block device. For now, we do not support cloning these images, but removal, renaming and read-only flags are supported to the point where that makes sense. Fixe: #6990	2017-10-26 17:54:56 +02:00
Alan Jenkins	8d9c2bca41	nspawn: comment to acknowledge lying about "user session"	2017-10-18 09:47:10 +01:00
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
Andreas Rammhold	3742095b27	tree-wide: use IN_SET where possible In addition to the changes from #6933 this handles cases that could be matched with the included cocci file.	2017-10-02 13:09:54 +02:00
Lennart Poettering	8e5430c4bd	nspawn: set up a new session keyring for the container process keyring material should not leak into the container. So far we relied on seccomp to deny access to the keyring, but given that we now made the seccomp configurable, and access to keyctl() and friends may optionally be permitted to containers now let's make sure we disconnect the callers keyring from the keyring of PID 1 in the container.	2017-09-22 15:28:04 +02:00
Lennart Poettering	960e4569e1	nspawn: implement configurable syscall whitelisting/blacklisting Now that we have ported nspawn's seccomp code to the generic code in seccomp-util, let's extend it to support whitelisting and blacklisting of specific additional syscalls. This uses similar syntax as PID1's support for system call filtering, but in contrast to that always implements a blacklist (and not a whitelist), as we prepopulate the filter with a blacklist, and the unit's system call filter logic does not come with anything prepopulated. (Later on we might actually want to invert the logic here, and whitelist rather than blacklist things, but at this point let's not do that. In case we switch this over later, the syscall add/remove logic of this commit should be compatible conceptually.) Fixes: #5163 Replaces: #5944	2017-09-12 14:06:21 +02:00
Lennart Poettering	21022b9dde	util-lib: wrap personality() to fix up broken glibc error handling (#6766 ) glibc appears to propagate different errors in different ways, let's fix this up, so that our own code doesn't get confused by this. See #6752 + #6737 for details. Fixes: #6755	2017-09-08 17:16:29 +03:00
Lennart Poettering	8cb5743079	nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416 ) Given that we set NOTIFY_SOCKET unconditionally it's not surprising that processes way down the process tree think it's smart to send us a notification message. It's still useful to keep this message, for debugging things, but it shouldn't be generated by default.	2017-07-20 14:46:58 -04:00
Lennart Poettering	cd2dfc6fae	nspawn: register a scope for the unit if --register=no is specified (#6166 ) Previously, only when --register=yes was set (the default) the invoked container would get its own scope, created by machined on behalf of nspawn. With this change if --register=no is set nspawn will still get its own scope (which is a good thing, so that --slice= and --property= take effect), but this is not done through machined but by registering a scope unit directly in PID 1. Summary: --register=yes → allocate a new scope through machined (the default) --register=yes --keep-unit → use the unit we are already running in an register with machined --register=no → allocate a new scope directly, but no machined --register=no --keep-unit → do not allocate nor register anything Fixes: #5823	2017-06-28 13:22:46 -04:00
Zbigniew Jędrzejewski-Szmek	35bca925f9	tree-wide: fix incorrect uses of %m In those cases errno was not set, so we would be logging some unrelated error or "Success".	2017-05-13 15:42:26 -04:00
Zbigniew Jędrzejewski-Szmek	ab8ee0f259	tree-wide: use SET_FLAG in more places (#5892 )	2017-05-07 07:03:28 -04:00
Zbigniew Jędrzejewski-Szmek	399e391fa6	nspawn: check cgroups after parsing options Same justification as in previous commit.	2017-04-25 08:54:00 -04:00
Lennart Poettering	948a3241de	Merge pull request #5708 from vcatechnology/arm-cross-compile ARM32 cross-compile fixes	2017-04-17 15:49:06 +02:00
Matt Clarkson	6b5cf3ea62	build-sys: correct blkid.h includes When using pkg-config to determine the include flags for blkid the flags are returned as: $ pkg-config blkid --cflags -I/usr/include/blkid -I/usr/include/uuid We use the <blkid/blkid.h> include which would be correct when using the default compiler /usr/include header search path. However, when cross-compiling the blkid.h will not be installed at /usr/include and highly likely in a temporary system root. It is futher compounded if the cross-compile packages are split up and the blkid package is not available in the same sysroot as the compiler. Regardless of the compilation setup, the correct include path should be <blkid.h> if using the pkg-config returned CFLAGS.	2017-04-06 14:33:02 +01:00
David Michael	7357272ed1	nspawn: check if the DNS stub is listening for requests	2017-03-31 11:34:32 -07:00
Zbigniew Jędrzejewski-Szmek	78e4f19ebc	Merge pull request #5444 from poettering/cgroups-revert-no-error Revert "core: simplify cg_[all_]unified()" and more.	2017-02-24 18:48:57 -05:00
AsciiWolf	13e785f7a0	Fix missing space in comments (#5439 )	2017-02-24 18:14:02 +01:00
Lennart Poettering	c22800e40e	cgroup: rename cg_unified() → cg_unified_controller() cg_unified() is a bit generic a name, let's make clear that it checks whether a specified controller is in unified mode.	2017-02-24 18:00:04 +01:00
Lennart Poettering	b4cccbc13a	cgroup: change cg_unified() to possibly return errors again We use our cgroup APIs in various contexts, including from our libraries sd-login, sd-bus. As we don#t control those environments we can't rely that the unified cgroup setup logic succeeds, and hence really shouldn't assert on it. This more or less reverts `415fc41cea`.	2017-02-24 17:52:58 +01:00
Tejun Heo	2977724b09	core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd hierarchy Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1 name=systemd hierarchy. While this works fine for systemd itself, it breaks tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd. This patch updates the hybrid mode so that it mounts v2 hierarchy on /sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on /sys/fs/cgroup/systemd for compatibility. systemd itself doesn't depend on the "name=systemd" hierarchy at all. All operations take place on the v2 hierarchy as before but the v1 hierarchy is kept in sync so that any tools which expect it to be there can keep doing so. This allows systemd to take advantage of cgroup v2 process management without requiring other tools to be aware of the hybrid mode. The hybrid mode is implemented by mapping the special systemd controller to /sys/fs/cgroup/unified and making the basic cgroup utility operations - cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the /sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated. While a bit messy, this will allow dropping complications from using cgroup v1 for process management a lot sooner than otherwise possible which should make it a net gain in terms of maintainability. v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount point to /sys/fs/cgroup/unified as suggested by @brauner. v3: chown the compat hierarchy too on delegation. Suggested by @evverx. v4: [zj] - drop the change to default, full "legacy" is still the default.	2017-02-20 12:28:35 -05:00
Tejun Heo	415fc41cea	core: simplify cg_[all_]unified() cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().	2017-02-18 17:51:13 -05:00
Tejun Heo	bd15ab41a1	nspawn: fix cgroup mode detection cgroup mode detection is broken in two different ways. * detect_unified_cgroup_hierarchy() is called too nested in outer_child(). sync_cgroup() which is used by run() also needs to know the requested cgroup mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN. This makes it skip syncing the inner cgroup hierarchy on some config combinations. $ cat /proc/self/cgroup \| grep systemd 1:name=systemd:/user.slice/user-0.slice/session-c1.scope $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container ... [root@container ~]# cat /proc/self/cgroup \| grep systemd 1:name=systemd:/machine.slice/machine-container.x86_64.scope $ exit $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container [root@container ~]# cat /proc/self/cgroup \| grep 0:: 0::/ $ exit Note how the unified hierarchy case's path is not synchronized with the host. This for example can cause issues when there are multiple such containers. Fixed by moving detect_unified_cgroup_hierarchy() invocation to main(). * inner_child() was invoking cg_unified_flush(). inner_child() executes fully scoped and can't determine which cgroup mode the host was in. It doesn't make sense to keep flushing the detected mode when the host mode can't change. Fixed by replacing cg_unified_flush() invocations in outer_child() and inner_child() with one in main().	2017-02-18 17:49:06 -05:00
Zbigniew Jędrzejewski-Szmek	581a07f9f0	Merge pull request #5369 from poettering/nspawn-resolved fixes for running nspawn+resolved in combination	2017-02-18 11:54:34 -05:00

1 2 3 4 5 ...

603 Commits