Systemd

Commit Graph

Author	SHA1	Message	Date
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Yu Watanabe	5cbaad2f67	core: do not free heap-allocated strings (#8391 ) Fixes #8387.	2018-03-08 14:21:54 +01:00
Lennart Poettering	39f305a901	mount-setup: change bpf mount mode to 0700 (#8334 ) After discussing with the kernel folks, we agreed to default to 0700 for this. Better safe than sorry.	2018-03-02 12:55:24 +01:00
Lennart Poettering	6590080851	mount-setup: always use the same source as fstype for the API VFS we mount So far, for all our API VFS mounts we used the fstype also as mount source, let's do that for the cgroupsv2 mounts too. The kernel doesn't really care about the source for API VFS, but it's visible to the user, hence let's clean this up and follow the rule we otherwise follow.	2018-02-21 16:43:36 +01:00
Lennart Poettering	43b7f24b5e	bpf: mount bpffs by default on boot We make heavy use of BPF functionality these days, hence expose the BPF file system too by default now. (Note however, that we don't actually make use bpf file systems object yet, but we might later on too.)	2018-02-21 16:43:36 +01:00
Zbigniew Jędrzejewski-Szmek	56c8d7444a	pid1: do not initialize join_controllers by default We're moving towards unified cgroup hierarchy where this is not necessary. This makes main.c a bit simpler.	2018-02-19 15:18:54 +01:00
Lennart Poettering	713a88757a	mount-setup: fix MNT_CHECK_WRITABLE error handling, and log about the issue Let's correct the error handling (the error is in errno, not r), and let's add logging like the rest of the function has it.	2017-12-15 20:52:28 +01:00
Krzysztof Nowicki	8739f23e3c	Fix SELinux labels in cgroup filesystem root directory (#7496 ) When using SELinux with legacy cgroups the tmpfs on /sys/fs/cgroup is by default labelled as tmpfs_t. This label is also inherited by the "cpu" and "cpuacct" symbolic links. Unfortunately the policy expects them to be labelled as cgroup_t, which is used for all the actual cgroup filesystems. Failure to do so results in a stream of denials. This state cannot be fixed reliably when the cgroup filesystem structure is set-up as the SELinux policy is not yet loaded at this moment. It also cannot be fixed later as the root of the cgroup filesystem is remounted read-only. In order to fix it the root of the cgroup filesystem needs to be temporary remounted read-write, relabelled and remounted back read-only.	2017-11-30 11:59:29 +01:00
Christian Brauner	1ff654e28b	core: remove empty cgroups (#7457 ) When we skip an unwritable cgroup also remove the empty mountpoint.	2017-11-24 21:05:16 +01:00
Christian Brauner	2d56b80a18	cgroup: test whether pure unified hierarchy is writable If it is not writable we should not mount it.	2017-11-22 17:35:21 +01:00
Christian Brauner	e07aefbd67	cgroup: check whether unified hierarchy is writable When systemd is running inside a container employing user namespaces it currently mounts the unified cgroup hierarchy without being able to write to it. This causes systemd to freeze during boot. This patch checks whether the unified cgroup hierarchy is writable. If it is not it will not mount it. This solution is based on a patch by Evgeny Vereshchagin. Closes #6408. Closes https://github.com/lxc/lxc/issues/1678 .	2017-11-22 17:34:25 +01:00
Lennart Poettering	6925a0de4e	cgroup-util: move Set* allocation into cg_kernel_controllers() Previously, callers had to do this on their own. Let's make the call do that instead, making the caller code a bit shorter.	2017-11-21 11:54:08 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek	f9fa32f09c	build-sys: s/HAVE_SMACK/ENABLE_SMACK/ Same justification as for HAVE_UTMP.	2017-10-04 12:09:50 +02:00
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
vliaskov	6c24adfd46	Revert "mount-setup: mount xenfs filesystem (#6491 )" (#6662 ) This reverts commit `b305bd3aab`.	2017-08-28 18:46:01 +02:00
vliaskov	b305bd3aab	mount-setup: mount xenfs filesystem (#6491 )	2017-07-31 15:59:02 +02:00
Tejun Heo	4095205ecc	core: support "nsdelegate" cgroup v2 mount option (#6294 ) cgroup namespace wasn't useful for delegation because it allowed resource control interface files (e.g. memory.high) to be written from inside the namespace - this allowed the namespace parent's resource distribution to be disturbed by its namespace-scoped children. A new mount option, "nsdelegate", was added to cgroup v2 to address this issue. The flag is meangingful only when mounting cgroup v2 in the init namespace and makes a cgroup namespace a delegation boundary. The kernel feature is pending for v4.13. This should have been the default behavior on cgroup namespaces and this commit makes systemd try "nsdelegate" first when trying to mount cgroup v2 and fall back if the option is not supported. Note that this has danger of breaking usages which depend on modifying the parent's resource settings from the namespace root, which isn't a valid thing to do, but such usages may still exist.	2017-07-14 19:27:13 +02:00
Zbigniew Jędrzejewski-Szmek	1b59cf04ae	core/mount-setup: if unified hierarchy is not supported, fall back to legacy We need this to gracefully support older or strangely configured kernels. v2: - do not install a callback handler, just embed the right conditions into cg_is_*_wanted() v3: - fix bug in cg_is_legacy_wanted()	2017-02-22 11:52:31 -05:00
Zbigniew Jędrzejewski-Szmek	a4464b9522	Rename cg_is_unified_systemd_controller_wanted to cg_is_hybrid_wanted Less typing and doesn't make the table so incredibly wide.	2017-02-22 11:52:31 -05:00
Tejun Heo	2977724b09	core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd hierarchy Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1 name=systemd hierarchy. While this works fine for systemd itself, it breaks tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd. This patch updates the hybrid mode so that it mounts v2 hierarchy on /sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on /sys/fs/cgroup/systemd for compatibility. systemd itself doesn't depend on the "name=systemd" hierarchy at all. All operations take place on the v2 hierarchy as before but the v1 hierarchy is kept in sync so that any tools which expect it to be there can keep doing so. This allows systemd to take advantage of cgroup v2 process management without requiring other tools to be aware of the hybrid mode. The hybrid mode is implemented by mapping the special systemd controller to /sys/fs/cgroup/unified and making the basic cgroup utility operations - cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the /sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated. While a bit messy, this will allow dropping complications from using cgroup v1 for process management a lot sooner than otherwise possible which should make it a net gain in terms of maintainability. v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount point to /sys/fs/cgroup/unified as suggested by @brauner. v3: chown the compat hierarchy too on delegation. Suggested by @evverx. v4: [zj] - drop the change to default, full "legacy" is still the default.	2017-02-20 12:28:35 -05:00
Lennart Poettering	dee22f3970	core: add comment why we don't bother with MS_SHARED remounting of / in containers	2016-12-20 20:00:08 +01:00
Lennart Poettering	e187369587	tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead Let's use chase_symlinks() everywhere, and stop using GNU canonicalize_file_name() everywhere. For most cases this should not change behaviour, however increase exposure of our function to get better tested. Most importantly in a few cases (most notably nspawn) it can take the correct root directory into account when chasing symlinks.	2016-12-01 00:25:51 +01:00
Tejun Heo	5da38d0768	core: use the unified hierarchy for the systemd cgroup controller hierarchy Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().	2016-08-17 17:44:36 -04:00
Alessandro Puccetti	c4b4170746	namespace: unify limit behavior on non-directory paths Despite the name, `Read{Write,Only}Directories=` already allows for regular file paths to be masked. This commit adds the same behavior to `InaccessibleDirectories=` and makes it explicit in the doc. This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}` {dile,device}nodes and mounts on the appropriate one the paths specified in `InacessibleDirectories=`. Based on Luca's patch from https://github.com/systemd/systemd/pull/3327	2016-07-19 17:22:02 +02:00
Dave Reisner	222953e87f	Ensure kdbus isn't used (#3501 ) Delete the dbus1 generator and some critical wiring. This prevents kdbus from being loaded or detected. As such, it will never be used, even if the user still has a useful kdbus module loaded on their system. Sort of fixes #3480. Not really, but it's better than the current state.	2016-06-18 17:24:23 -04:00
Harald Hoyer	cacf980ed4	core/mount-setup.c: also relabel /dev/shm for selinux (#3039 ) daemons, which wish to transition state from the initramfs to the real root, might use /dev/shm for their state. As /dev is not relabeled across mount points, /dev/shm has to be relabled explicitly.	2016-04-14 19:14:29 -04:00
Alban Crequy	099619957a	cgroup2: use new fstype for unified hierarchy Since Linux v4.4-rc1, __DEVEL__sane_behavior does not exist anymore and is replaced by a new fstype "cgroup2". With this patch, systemd no longer supports the old (unstable) way of doing unified hierarchy with __DEVEL__sane_behavior and systemd now requires Linux v4.4 for unified hierarchy. Non-unified hierarchy is still the default and is unchanged by this patch. `67e9c74b8a`	2016-03-26 12:05:29 -04:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Lennart Poettering	1411b09467	core: log about path_is_mount_point() errors We really shouldn't fail silently, but print a log message about these errors. Also make sure to attach error codes to all log messages where that makes sense. (While we are at it, add a couple of (void) casts to functions where we knowingly ignore return values.)	2016-02-03 23:58:53 +01:00
Alexander Kuleshov	400fac0609	mount-setup: introduce mount_points_setup The mount_setup_early() and mount_setup() contain almost the same pieces of code which calls mount_one() for a certain mount point from the mount_table. This patch introduces mount_points_setup() helper to prevent code duplication.	2016-02-03 01:03:12 +06:00
Patrick Ohly	ea2b93a8ee	mount-setup.c: fix handling of symlink Smack labelling in cgroup setup The code introduced in `f8c1a81c51` (= systemd 227) failed for me with: Failed to copy smack label from net_cls to /sys/fs/cgroup/net_cls: No such file or directory There is no need for a symlink in this case because source and target are identical. The symlink() call is allowed to fail when the target already exists. When that happens, copying the Smack label must be skipped. But the code also failed when there is a symlink, like "cpu -> cpu,cpuacct", because mac_smack_copy() got called with src="cpu,cpuacct" which fails to find the entry because the current directory is not inside /sys/fs/cgroup. The absolute path to the existing entry must be used instead.	2016-01-05 12:49:48 +01:00
Thomas Hindoe Paaboel Andersen	cf0fbc49e6	tree-wide: sort includes Sort the includes accoding to the new coding style.	2015-11-16 22:09:36 +01:00
Lennart Poettering	b5efdb8af4	util-lib: split out allocation calls into alloc-util.[ch]	2015-10-27 13:45:53 +01:00
Lennart Poettering	ee104e11e3	user-util: move UID/GID related macros from macro.h to user-util.h	2015-10-27 13:25:57 +01:00
Lennart Poettering	4349cd7c1d	util-lib: move mount related utility calls to mount-util.[ch]	2015-10-27 13:25:55 +01:00
David Herrmann	7ff307bc4c	mount: propagate error codes correctly Make sure to propagate error codes from mount-loops correctly. Right now, we return the return-code of the first mount that did _something_. This is not what we want. Make sure we return an error if _any_ mount fails (and then make sure to return the first error to not hide proper errors due to consequential errors like -ENOTDIR). Reported by cee1 <fykcee1@gmail.com>.	2015-09-21 20:03:24 +02:00
Sangjung Woo	f8c1a81c51	smack: bugfix the smack label of symlink when '--with-smack-run-label' is set Even though systemd has its own smack label since '--with-smack-run-label' configuration is set, the smack label of each CGROUP root directory should have the star (i.e. ) label. This is mainly because current Linux Kernel set the label in this way. (Refer to smack_d_instantiate() in security/smack/smack_lsm.c) However, if systemd has its own smack label and arg_join_controllers is explicitly set or initialized by initialize_join_controllers() function, current systemd creates the symlink in CGROUP root directory with its own smack label as below. lrwxrwxrwx. 1 root root System 11 Dec 31 16:00 cpu -> cpu,cpuacct dr-xr-xr-x. 4 root root 0 Dec 31 16:01 cpu,cpuacct lrwxrwxrwx. 1 root root System 11 Dec 31 16:00 cpuacct -> cpu,cpuacct This patch fixes that bug by copying the smack label from the origin.	2015-09-09 20:26:52 +09:00
Lennart Poettering	75f86906c5	basic: rework virtualization detection API Introduce a proper enum, and don't pass around string ids anymore. This simplifies things quite a bit, and makes virtualization detection more similar to architecture detection.	2015-09-07 13:42:47 +02:00
Lennart Poettering	efdb02375b	core: unified cgroup hierarchy support This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.	2015-09-01 23:52:27 +02:00
David Herrmann	6482446281	core: fix missing bus-util.h include Whoopsy, forgot to 'git add' this, sorry.	2015-07-05 12:24:29 +02:00
David Herrmann	1f49dffc0f	core: don't mount kdbusfs if not wanted Just like we conditionalize loading kdbus.ko, we should conditionalize mounting kdbusfs. Otherwise, we might run with kdbus if it is builtin, even though the user didn't want this.	2015-07-05 11:25:38 +02:00
Kay Sievers	1b09f548c7	turn kdbus support into a runtime option ./configure --enable/disable-kdbus can be used to set the default behavior regarding kdbus. If no kdbus kernel support is available, dbus-dameon will be used. With --enable-kdbus, the kernel command line option "kdbus=0" can be used to disable kdbus. With --disable-kdbus, the kernel command line option "kdbus=1" is required to enable kdbus support.	2015-06-17 18:01:49 +02:00
Martin Pitt	e26d6ce517	path-util: Change path_is_mount_point() symlink arg from bool to flags This makes path_is_mount_point() consistent with fd_is_mount_point() wrt. flags.	2015-05-29 17:42:44 +02:00
Lennart Poettering	03cfe0d514	nspawn: finish user namespace support	2015-05-21 16:32:01 +02:00
David Herrmann	64f75d7a28	core: fix mount setup to work with non-existing mount points We must not fail on ENOENT. We properly create the mount-point in mount-setup, so there's really no reason to skip the mount. Make sure we just skip the mount on unexpected failures or if it's already mounted.	2015-04-07 14:03:44 +02:00
Daniel Mack	b604cb9bf6	core: mount-setup: handle non-existing mountpoints gracefully Commit `e792e890f` ("path-util: don't eat up ENOENT in path_is_mount_point()") changed path_is_mount_point() so it doesn't hide -ENOENT from its caller. This causes all boots to fail early in case any of the mount points does not exist (for instance, when kdbus isn't loaded, /sys/fs/kdbus is missing). Fix this by returning 0 from mount_one() if path_is_mount_point() returned -ENOENT.	2015-04-07 00:50:10 +02:00
Thomas Hindoe Paaboel Andersen	2eec67acbb	remove unused includes This patch removes includes that are not used. The removals were found with include-what-you-use which checks if any of the symbols from a header is in use.	2015-02-23 23:53:42 +01:00
Cristian Rodríguez	cb708b1c6d	mount-setup: Do not bother with /proc/bus/usb Current systemd requires kernel >= 3.7 per the README file but CONFIG_USB_DEVICEFS disappeared from the kernel in upstream commit fb28d58b72aa9215b26f1d5478462af394a4d253 (kernel 3.5-rc1)	2015-01-23 19:10:28 +01:00
Lennart Poettering	b4d5b78374	mount-setup: /selinux, /cgroup, /dev/cgroup are sooo old, don't bother with them anymore	2015-01-23 13:47:41 +01:00

1 2 3

103 Commits