From 56f47800d847deef0d3ffbeca5fd774e7819322b Mon Sep 17 00:00:00 2001 From: Benjamin Berg Date: Thu, 23 Jul 2020 12:56:32 +0200 Subject: [PATCH] mount-setup: Enable memory_recursiveprot for cgroup2 When available, enable memory_recursiveprot. Realistically it always makes sense to delegate MemoryLow= and MemoryMin= to all children of a slice/unit. The kernel option is not enabled by default as it might cause regressions in some setups. However, it is the better default in general, and it results in a more flexible and obvious behaviour. The alternative to using this option would be for user's to also set DefaultMemoryLow= on slices when assigning MemoryLow=. However, this makes the effect of MemoryLow= on some children less obvious, as it could result in a lower protection rather than increasing it. From the kernel documentation: memory_recursiveprot Recursively apply memory.min and memory.low protection to entire subtrees, without requiring explicit downward propagation into leaf cgroups. This allows protecting entire subtrees from one another, while retaining free competition within those subtrees. This should have been the default behavior but is a mount-option to avoid regressing setups relying on the original semantics (e.g. specifying bogusly high 'bypass' protection values at higher tree levels). This was added in kernel commit 8a931f801340c (mm: memcontrol: recursive memory.low protection), which became available in 5.7 and was subsequently fixed in kernel 5.7.7 (mm: memcontrol: handle div0 crash race condition in memory.low). --- src/core/mount-setup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/core/mount-setup.c b/src/core/mount-setup.c index feb88f3e6e..ad14fd6aa9 100644 --- a/src/core/mount-setup.c +++ b/src/core/mount-setup.c @@ -85,6 +85,8 @@ static const MountPoint mount_table[] = { #endif { "tmpfs", "/run", "tmpfs", "mode=755" TMPFS_LIMITS_RUN, MS_NOSUID|MS_NODEV|MS_STRICTATIME, NULL, MNT_FATAL|MNT_IN_CONTAINER }, + { "cgroup2", "/sys/fs/cgroup", "cgroup2", "nsdelegate,memory_recursiveprot", MS_NOSUID|MS_NOEXEC|MS_NODEV, + cg_is_unified_wanted, MNT_IN_CONTAINER|MNT_CHECK_WRITABLE }, { "cgroup2", "/sys/fs/cgroup", "cgroup2", "nsdelegate", MS_NOSUID|MS_NOEXEC|MS_NODEV, cg_is_unified_wanted, MNT_IN_CONTAINER|MNT_CHECK_WRITABLE }, { "cgroup2", "/sys/fs/cgroup", "cgroup2", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV,