cgroup: s/cgroups? ?v?([0-9])/cgroup v\1/gI

Nitpicky, but we've used a lot of random spacings and names in the past,
but we're trying to be completely consistent on "cgroup vN" now.

Generated by `fd -0 | xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups?  ?v?([0-9])/cgroup v\1/gI'`.

I manually ignored places where it's not appropriate to replace (eg.
"cgroup2" fstype and in src/shared/linux).
This commit is contained in:
Chris Down 2019-01-02 20:15:15 +00:00 committed by Yu Watanabe
parent 788291d3b4
commit 4e1dfa45e9
11 changed files with 66 additions and 66 deletions

14
NEWS
View File

@ -133,13 +133,13 @@ CHANGES WITH 240:
* The new "MemoryMin=" unit file property may now be used to set the
memory usage protection limit of processes invoked by the unit. This
controls the cgroupsv2 memory.min attribute. Similarly, the new
controls the cgroup v2 memory.min attribute. Similarly, the new
"IODeviceLatencyTargetSec=" property has been added, wrapping the new
cgroupsv2 io.latency cgroup property for configuring per-service I/O
cgroup v2 io.latency cgroup property for configuring per-service I/O
latency.
* systemd now supports the cgroupsv2 devices BPF logic, as counterpart
to the cgroupsv1 "devices" cgroup controller.
* systemd now supports the cgroup v2 devices BPF logic, as counterpart
to the cgroup v1 "devices" cgroup controller.
* systemd-escape now is able to combine --unescape with --template. It
also learnt a new option --instance for extracting and unescaping the
@ -355,7 +355,7 @@ CHANGES WITH 240:
* The JoinControllers= option in system.conf is no longer supported, as
it didn't work correctly, is hard to support properly, is legacy (as
the concept only exists on cgroupsv1) and apparently wasn't used.
the concept only exists on cgroup v1) and apparently wasn't used.
* Journal messages that are generated whenever a unit enters the failed
state are now tagged with a unique MESSAGE_ID. Similarly, messages
@ -992,7 +992,7 @@ CHANGES WITH 238:
instance to migrate processes if it itself gets the request to
migrate processes and the kernel refuses this due to access
restrictions. Thanks to this "systemd-run --scope --user …" works
again in pure cgroups v2 environments when invoked from the user
again in pure cgroup v2 environments when invoked from the user
session scope.
* A new TemporaryFileSystem= setting can be used to mask out part of
@ -2708,7 +2708,7 @@ CHANGES WITH 231:
desired options.
* systemd now supports the "memory" cgroup controller also on
cgroupsv2.
cgroup v2.
* The systemd-cgtop tool now optionally takes a control group path as
command line argument. If specified, the control group list shown is

2
TODO
View File

@ -58,7 +58,7 @@ Features:
* when a socket unit is spawned with an AF_UNIX path in /var/run, complain and
patch it to use /run instead
* set memory.oom.group in cgroupsv2 for all leaf cgroups (kernel v4.19+)
* set memory.oom.group in cgroup v2 for all leaf cgroups (kernel v4.19+)
* add a new syscall group "@esoteric" for more esoteric stuff such as bpf() and
usefaultd() and make systemd-analyze check for it.

View File

@ -17,7 +17,7 @@ container managers.
Before you read on, please make sure you read the low-level [kernel
documentation about
cgroupsv2](https://www.kernel.org/doc/Documentation/cgroup-v2.txt). This
cgroup v2](https://www.kernel.org/doc/Documentation/cgroup-v2.txt). This
documentation then adds in the higher-level view from systemd.
This document augments the existing documentation we already have:
@ -34,8 +34,8 @@ wiki documentation into this very document, too.)
## Two Key Design Rules
Much of the philosophy behind these concepts is based on a couple of basic
design ideas of cgroupsv2 (which we however try to adapt as far as we can to
cgroupsv1 too). Specifically two cgroupsv2 rules are the most relevant:
design ideas of cgroup v2 (which we however try to adapt as far as we can to
cgroup v1 too). Specifically two cgroup v2 rules are the most relevant:
1. The **no-processes-in-inner-nodes** rule: this means that it's not permitted
to have processes directly attached to a cgroup that also has child cgroups and
@ -58,45 +58,45 @@ your container manager creates and manages cgroups in the system's root cgroup
you violate rule #2, as the root cgroup is managed by systemd and hence off
limits to everybody else.
Note that rule #1 is generally enforced by the kernel if cgroupsv2 is used: as
Note that rule #1 is generally enforced by the kernel if cgroup v2 is used: as
soon as you add a process to a cgroup it is ensured the rule is not
violated. On cgroupsv1 this rule didn't exist, and hence isn't enforced, even
violated. On cgroup v1 this rule didn't exist, and hence isn't enforced, even
though it's a good thing to follow it then too. Rule #2 is not enforced on
either cgroupsv1 nor cgroupsv2 (this is UNIX after all, in the general case
either cgroup v1 nor cgroup v2 (this is UNIX after all, in the general case
root can do anything, modulo SELinux and friends), but if you ignore it you'll
be in constant pain as various pieces of software will fight over cgroup
ownership.
Note that cgroupsv1 is currently the most deployed implementation, even though
Note that cgroup v1 is currently the most deployed implementation, even though
it's semantically broken in many ways, and in many cases doesn't actually do
what people think it does. cgroupsv2 is where things are going, and most new
kernel features in this area are only added to cgroupsv2, and not cgroupsv1
anymore. For example cgroupsv2 provides proper cgroup-empty notifications, has
what people think it does. cgroup v2 is where things are going, and most new
kernel features in this area are only added to cgroup v2, and not cgroup v1
anymore. For example cgroup v2 provides proper cgroup-empty notifications, has
support for all kinds of per-cgroup BPF magic, supports secure delegation of
cgroup trees to less privileged processes and so on, which all are not
available on cgroupsv1.
available on cgroup v1.
## Three Different Tree Setups 🌳
systemd supports three different modes how cgroups are set up. Specifically:
1. **Unified** — this is the simplest mode, and exposes a pure cgroupsv2
1. **Unified** — this is the simplest mode, and exposes a pure cgroup v2
logic. In this mode `/sys/fs/cgroup` is the only mounted cgroup API file system
and all available controllers are exclusively exposed through it.
2. **Legacy** — this is the traditional cgroupsv1 mode. In this mode the
2. **Legacy** — this is the traditional cgroup v1 mode. In this mode the
various controllers each get their own cgroup file system mounted to
`/sys/fs/cgroup/<controller>/`. On top of that systemd manages its own cgroup
hierarchy for managing purposes as `/sys/fs/cgroup/systemd/`.
3. **Hybrid** — this is a hybrid between the unified and legacy mode. It's set
up mostly like legacy, except that there's also an additional hierarchy
`/sys/fs/cgroup/unified/` that contains the cgroupsv2 hierarchy. (Note that in
`/sys/fs/cgroup/unified/` that contains the cgroup v2 hierarchy. (Note that in
this mode the unified hierarchy won't have controllers attached, the
controllers are all mounted as separate hierarchies as in legacy mode,
i.e. `/sys/fs/cgroup/unified/` is purely and exclusively about core cgroupsv2
i.e. `/sys/fs/cgroup/unified/` is purely and exclusively about core cgroup v2
functionality and not about resource management.) In this mode compatibility
with cgroupsv1 is retained while some cgroupsv2 features are available
with cgroup v1 is retained while some cgroup v2 features are available
too. This mode is a stopgap. Don't bother with this too much unless you have
too much free time.
@ -116,7 +116,7 @@ to talk of one specific cgroup and actually mean the same cgroup in all
available controller hierarchies. E.g. if we talk about the cgroup `/foo/bar/`
then we actually mean `/sys/fs/cgroup/cpu/foo/bar/` as well as
`/sys/fs/cgroup/memory/foo/bar/`, `/sys/fs/cgroup/pids/foo/bar/`, and so on.
Note that in cgroupsv2 the controller hierarchies aren't orthogonal, hence
Note that in cgroup v2 the controller hierarchies aren't orthogonal, hence
thinking about them as orthogonal won't help you in the long run anyway.
If you wonder how to detect which of these three modes is currently used, use
@ -168,7 +168,7 @@ cgroup `/foo.slice/foo-bar.slice/foo-bar-baz.slice/quux.service/`.
By default systemd sets up four slice units:
1. `-.slice` is the root slice. i.e. the parent of everything else. On the host
system it maps directly to the top-level directory of cgroupsv2.
system it maps directly to the top-level directory of cgroup v2.
2. `system.slice` is where system services are by default placed, unless
configured otherwise.
@ -187,8 +187,8 @@ above are just the defaults.
Container managers and suchlike often want to control cgroups directly using
the raw kernel APIs. That's entirely fine and supported, as long as proper
*delegation* is followed. Delegation is a concept we inherited from cgroupsv2,
but we expose it on cgroupsv1 too. Delegation means that some parts of the
*delegation* is followed. Delegation is a concept we inherited from cgroup v2,
but we expose it on cgroup v1 too. Delegation means that some parts of the
cgroup tree may be managed by different managers than others. As long as it is
clear which manager manages which part of the tree each one can do within its
sub-graph of the tree whatever it wants.
@ -217,7 +217,7 @@ guarantees:
hierarchy (in unified and hybrid mode) as well as on systemd's own private
hierarchy (in legacy and hybrid mode). It won't pass ownership of the legacy
controller hierarchies. Delegation to less privileges processes is not safe
in cgroupsv1 (as a limitation of the kernel), hence systemd won't facilitate
in cgroup v1 (as a limitation of the kernel), hence systemd won't facilitate
access to it.
3. Any BPF IP filter programs systemd installs will be installed with
@ -322,19 +322,19 @@ to work on that, and widen your horizon a bit. You are welcome.
systemd supports a number of controllers (but not all). Specifically, supported
are:
* on cgroupsv1: `cpu`, `cpuacct`, `blkio`, `memory`, `devices`, `pids`
* on cgroupsv2: `cpu`, `io`, `memory`, `pids`
* on cgroup v1: `cpu`, `cpuacct`, `blkio`, `memory`, `devices`, `pids`
* on cgroup v2: `cpu`, `io`, `memory`, `pids`
It is our intention to natively support all cgroupsv2 controllers as they are
added to the kernel. However, regarding cgroupsv1: at this point we will not
It is our intention to natively support all cgroup v2 controllers as they are
added to the kernel. However, regarding cgroup v1: at this point we will not
add support for any other controllers anymore. This means systemd currently
does not and will never manage the following controllers on cgroupsv1:
does not and will never manage the following controllers on cgroup v1:
`freezer`, `cpuset`, `net_cls`, `perf_event`, `net_prio`, `hugetlb`. Why not?
Depending on the case, either their API semantics or implementations aren't
really usable, or it's very clear they have no future on cgroupsv2, and we
really usable, or it's very clear they have no future on cgroup v2, and we
won't add new code for stuff that clearly has no future.
Effectively this means that all those mentioned cgroupsv1 controllers are up
Effectively this means that all those mentioned cgroup v1 controllers are up
for grabs: systemd won't manage them, and hence won't delegate them to your
code (however, systemd will still mount their hierarchies, simply because it
mounts all controller hierarchies it finds available in the kernel). If you
@ -355,9 +355,9 @@ cgroups in them — from previous runs, and be extra careful with them as they
might still carry settings that might not be valid anymore.
Note a particular asymmetry here: if your systemd version doesn't support a
specific controller on cgroupsv1 you can still make use of it for delegation,
specific controller on cgroup v1 you can still make use of it for delegation,
by directly fiddling with its hierarchy and replicating the cgroup tree there
as necessary (as suggested above). However, on cgroupsv2 this is different:
as necessary (as suggested above). However, on cgroup v2 this is different:
separately mounted hierarchies are not available, and delegation has always to
happen through systemd itself. This means: when you update your kernel and it
adds a new, so far unseen controller, and you want to use it for delegation,
@ -417,7 +417,7 @@ unified you (of course, I guess) need to provide only `/sys/fs/cgroup/` itself.
arbitrary naming, you might need to escape some of the names (for example,
you really don't want to create a cgroup named `tasks`, just because the
user created a container by that name, because `tasks` after all is a magic
attribute in cgroupsv1, and your `mkdir()` will hence fail with `EEXIST`. In
attribute in cgroup v1, and your `mkdir()` will hence fail with `EEXIST`. In
systemd we do escaping by prefixing names that might collide with a kernel
attribute name with an underscore. You might want to do the same, but this
is really up to you how you do it. Just do it, and be careful.
@ -462,9 +462,9 @@ unified you (of course, I guess) need to provide only `/sys/fs/cgroup/` itself.
to get the cgroup for a unit. The method `GetUnitByControlGroup()` may be
used to get the unit for a cgroup.)
6. ⚡ Think twice before delegating cgroupsv1 controllers to less privileged
6. ⚡ Think twice before delegating cgroup v1 controllers to less privileged
containers. It's not safe, you basically allow your containers to freeze the
system with that and worse. Delegation is a strongpoint of cgroupsv2 though,
system with that and worse. Delegation is a strongpoint of cgroup v2 though,
and there it's safe to treat delegation boundaries as privilege boundaries.
And that's it for now. If you have further questions, refer to the systemd

View File

@ -872,7 +872,7 @@ int cg_set_access(
bool fatal;
};
/* cgroupsv1, aka legacy/non-unified */
/* cgroup v1, aka legacy/non-unified */
static const struct Attribute legacy_attributes[] = {
{ "cgroup.procs", true },
{ "tasks", false },
@ -880,7 +880,7 @@ int cg_set_access(
{},
};
/* cgroupsv2, aka unified */
/* cgroup v2, aka unified */
static const struct Attribute unified_attributes[] = {
{ "cgroup.procs", true },
{ "cgroup.subtree_control", true },
@ -2039,7 +2039,7 @@ int cg_get_keyed_attribute(
char **v;
int r;
/* Reads one or more fields of a cgroupsv2 keyed attribute file. The 'keys' parameter should be an strv with
/* Reads one or more fields of a cgroup v2 keyed attribute file. The 'keys' parameter should be an strv with
* all keys to retrieve. The 'ret_values' parameter should be passed as string size with the same number of
* entries as 'keys'. On success each entry will be set to the value of the matching key.
*
@ -2491,7 +2491,7 @@ int cg_kernel_controllers(Set **ret) {
static thread_local CGroupUnified unified_cache = CGROUP_UNIFIED_UNKNOWN;
/* The hybrid mode was initially implemented in v232 and simply mounted cgroup v2 on /sys/fs/cgroup/systemd. This
/* The hybrid mode was initially implemented in v232 and simply mounted cgroup2 on /sys/fs/cgroup/systemd. This
* unfortunately broke other tools (such as docker) which expected the v1 "name=systemd" hierarchy on
* /sys/fs/cgroup/systemd. From v233 and on, the hybrid mode mountnbs v2 on /sys/fs/cgroup/unified and maintains
* "name=systemd" hierarchy on /sys/fs/cgroup/systemd for compatibility with other tools.
@ -2739,13 +2739,13 @@ bool cg_is_legacy_wanted(void) {
if (wanted >= 0)
return wanted;
/* Check if we have cgroups2 already mounted. */
/* Check if we have cgroup v2 already mounted. */
if (cg_unified_flush() >= 0 &&
unified_cache == CGROUP_UNIFIED_ALL)
return (wanted = false);
/* Otherwise, assume that at least partial legacy is wanted,
* since cgroups2 should already be mounted at this point. */
* since cgroup v2 should already be mounted at this point. */
return (wanted = true);
}

View File

@ -48,13 +48,13 @@ typedef enum CGroupMask {
CGROUP_MASK_BPF_FIREWALL = CGROUP_CONTROLLER_TO_MASK(CGROUP_CONTROLLER_BPF_FIREWALL),
CGROUP_MASK_BPF_DEVICES = CGROUP_CONTROLLER_TO_MASK(CGROUP_CONTROLLER_BPF_DEVICES),
/* All real cgroupv1 controllers */
/* All real cgroup v1 controllers */
CGROUP_MASK_V1 = CGROUP_MASK_CPU|CGROUP_MASK_CPUACCT|CGROUP_MASK_BLKIO|CGROUP_MASK_MEMORY|CGROUP_MASK_DEVICES|CGROUP_MASK_PIDS,
/* All real cgroupv2 controllers */
/* All real cgroup v2 controllers */
CGROUP_MASK_V2 = CGROUP_MASK_CPU|CGROUP_MASK_IO|CGROUP_MASK_MEMORY|CGROUP_MASK_PIDS,
/* All cgroupv2 BPF pseudo-controllers */
/* All cgroup v2 BPF pseudo-controllers */
CGROUP_MASK_BPF = CGROUP_MASK_BPF_FIREWALL|CGROUP_MASK_BPF_DEVICES,
_CGROUP_MASK_ALL = CGROUP_CONTROLLER_TO_MASK(_CGROUP_CONTROLLER_MAX) - 1

View File

@ -104,7 +104,7 @@ static const char *maybe_format_bytes(char *buf, size_t l, bool is_valid, uint64
static bool is_root_cgroup(const char *path) {
/* Returns true if the specified path belongs to the root cgroup. The root cgroup is special on cgroupsv2 as it
/* Returns true if the specified path belongs to the root cgroup. The root cgroup is special on cgroup v2 as it
* carries only very few attributes in order not to export multiple truth about system state as most
* information is available elsewhere in /proc anyway. We need to be able to deal with that, and need to get
* our data from different sources in that case.

View File

@ -881,7 +881,7 @@ static void cgroup_context_apply(
/* In fully unified mode these attributes don't exist on the host cgroup root. On legacy the weights exist, but
* setting the weight makes very little sense on the host root cgroup, as there are no other cgroups at this
* level. The quota exists there too, but any attempt to write to it is refused with EINVAL. Inside of
* containers we want to leave control of these to the container manager (and if cgroupsv2 delegation is used
* containers we want to leave control of these to the container manager (and if cgroup v2 delegation is used
* we couldn't even write to them if we wanted to). */
if ((apply_mask & CGROUP_MASK_CPU) && !is_local_root) {
@ -925,7 +925,7 @@ static void cgroup_context_apply(
}
}
/* The 'io' controller attributes are not exported on the host's root cgroup (being a pure cgroupsv2
/* The 'io' controller attributes are not exported on the host's root cgroup (being a pure cgroup v2
* controller), and in case of containers we want to leave control of these attributes to the container manager
* (and we couldn't access that stuff anyway, even if we tried if proper delegation is used). */
if ((apply_mask & CGROUP_MASK_IO) && !is_local_root) {
@ -1067,7 +1067,7 @@ static void cgroup_context_apply(
/* In unified mode 'memory' attributes do not exist on the root cgroup. In legacy mode 'memory.limit_in_bytes'
* exists on the root cgroup, but any writes to it are refused with EINVAL. And if we run in a container we
* want to leave control to the container manager (and if proper cgroupsv2 delegation is used we couldn't even
* want to leave control to the container manager (and if proper cgroup v2 delegation is used we couldn't even
* write to this if we wanted to.) */
if ((apply_mask & CGROUP_MASK_MEMORY) && !is_local_root) {
@ -1109,7 +1109,7 @@ static void cgroup_context_apply(
}
}
/* On cgroupsv2 we can apply BPF everywhere. On cgroupsv1 we apply it everywhere except for the root of
/* On cgroup v2 we can apply BPF everywhere. On cgroup v1 we apply it everywhere except for the root of
* containers, where we leave this to the manager */
if ((apply_mask & (CGROUP_MASK_DEVICES | CGROUP_MASK_BPF_DEVICES)) &&
(is_host_root || cg_all_unified() > 0 || !is_local_root)) {
@ -1841,14 +1841,14 @@ static bool unit_has_mask_realized(
/* Returns true if this unit is fully realized. We check four things:
*
* 1. Whether the cgroup was created at all
* 2. Whether the cgroup was created in all the hierarchies we need it to be created in (in case of cgroupsv1)
* 3. Whether the cgroup has all the right controllers enabled (in case of cgroupsv2)
* 2. Whether the cgroup was created in all the hierarchies we need it to be created in (in case of cgroup v1)
* 3. Whether the cgroup has all the right controllers enabled (in case of cgroup v2)
* 4. Whether the invalidation mask is currently zero
*
* If you wonder why we mask the target realization and enable mask with CGROUP_MASK_V1/CGROUP_MASK_V2: note
* that there are three sets of bitmasks: CGROUP_MASK_V1 (for real cgroupv1 controllers), CGROUP_MASK_V2 (for
* real cgroupv2 controllers) and CGROUP_MASK_BPF (for BPF-based pseudo-controllers). Now, cgroup_realized_mask
* is only matters for cgroupsv1 controllers, and cgroup_enabled_mask only used for cgroupsv2, and if they
* that there are three sets of bitmasks: CGROUP_MASK_V1 (for real cgroup v1 controllers), CGROUP_MASK_V2 (for
* real cgroup v2 controllers) and CGROUP_MASK_BPF (for BPF-based pseudo-controllers). Now, cgroup_realized_mask
* is only matters for cgroup v1 controllers, and cgroup_enabled_mask only used for cgroup v2, and if they
* differ in the others, we don't really care. (After all, the cgroup_enabled_mask tracks with controllers are
* enabled through cgroup.subtree_control, and since the BPF pseudo-controllers don't show up there, they
* simply don't matter. */

View File

@ -3137,9 +3137,9 @@ static int exec_child(
}
}
/* If delegation is enabled we'll pass ownership of the cgroup to the user of the new process. On cgroupsv1
/* If delegation is enabled we'll pass ownership of the cgroup to the user of the new process. On cgroup v1
* this is only about systemd's own hierarchy, i.e. not the controller hierarchies, simply because that's not
* safe. On cgroupsv2 there's only one hierarchy anyway, and delegation is safe there, hence in that case only
* safe. On cgroup v2 there's only one hierarchy anyway, and delegation is safe there, hence in that case only
* touch a single hierarchy too. */
if (params->cgroup_path && context->user && (params->flags & EXEC_CGROUP_DELEGATE)) {
r = cg_set_access(SYSTEMD_CGROUP_CONTROLLER, params->cgroup_path, uid, gid);

View File

@ -248,8 +248,8 @@ typedef struct Unit {
/* Counterparts in the cgroup filesystem */
char *cgroup_path;
CGroupMask cgroup_realized_mask; /* In which hierarchies does this unit's cgroup exist? (only relevant on cgroupsv1) */
CGroupMask cgroup_enabled_mask; /* Which controllers are enabled (or more correctly: enabled for the children) for this unit's cgroup? (only relevant on cgroupsv2) */
CGroupMask cgroup_realized_mask; /* In which hierarchies does this unit's cgroup exist? (only relevant on cgroup v1) */
CGroupMask cgroup_enabled_mask; /* Which controllers are enabled (or more correctly: enabled for the children) for this unit's cgroup? (only relevant on cgroup v2) */
CGroupMask cgroup_invalidated_mask; /* A mask specifiying controllers which shall be considered invalidated, and require re-realization */
CGroupMask cgroup_members_mask; /* A cache for the controllers required by all children of this cgroup (only relevant for slice units) */
int cgroup_inotify_wd;

View File

@ -257,7 +257,7 @@ static int client_context_read_cgroup(Server *s, ClientContext *c, const char *u
/* We use the unit ID passed in as fallback if we have nothing cached yet and cg_pid_get_path_shifted()
* failed or process is running in a root cgroup. Zombie processes are automatically migrated to root cgroup
* on cgroupsv1 and we want to be able to map log messages from them too. */
* on cgroup v1 and we want to be able to map log messages from them too. */
if (unit_id && !c->unit) {
c->unit = strdup(unit_id);
if (c->unit)

View File

@ -33,7 +33,7 @@ if grep -q cgroup2 /proc/filesystems ; then
# And now check again, "io" should have vanished
grep -qv io /sys/fs/cgroup/system.slice/cgroup.controllers
else
echo "Skipping TEST-19-DELEGATE, as the kernel doesn't actually support cgroupsv2" >&2
echo "Skipping TEST-19-DELEGATE, as the kernel doesn't actually support cgroup v2" >&2
fi
echo OK > /testok