The implementation is pretty straight-foward: when we get a request to
clean some type of resources we fork off a process doing that, and while
it is running we are in the "cleaning" state.
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
The CPU_SET_S api is pretty bad. In particular, it has a parameter for the size
of the array, but operations which take two (CPU_EQUAL_S) or even three arrays
(CPU_{AND,OR,XOR}_S) still take just one size. This means that all arrays must
be of the same size, or buffer overruns will occur. This is exactly what our
code would do, if it received an array of unexpected size over the network.
("Unexpected" here means anything different from what cpu_set_malloc() detects
as the "right" size.)
Let's rework this, and store the size in bytes of the allocated storage area.
The code will now parse any number up to 8191, independently of what the current
kernel supports. This matches the kernel maximum setting for any architecture,
to make things more portable.
Fixes#12605.
Only changing ownership back to root is not enough we also need to
change the access mode, otherwise the user might have set 666 first, and
thus allow everyone access before and after the chown().
I covered the most obvious paths: those where there's a clear problem
with a path specified by the user.
Prints something like this (at error level):
May 21 20:00:01.040418 systemd[125871]: bad-workdir.service: Failed to set up mount namespacing: /run/systemd/unit-root/etc/tomcat9/Catalina: No such file or directory
May 21 20:00:01.040456 systemd[125871]: bad-workdir.service: Failed at step NAMESPACE spawning /bin/true: No such file or directory
Fixes#10972.
This adds some extra paranoia: when we recursively chown a directory for
use with DynamicUser=1 services we'll now drop suid/sgid from all files
we chown().
Of course, such files should not exist in the first place, and noone
should get access to those dirs who isn't root anyway, but let's better
be safe than sorry, and drop everything we come across.
Add even more suid/sgid protection to DynamicUser= envionments: the
state directories we bind mount from the host will now have the nosuid
flag set, to disable the effect of nosuid on them.
This makes two changes:
1. Instead of resetting the configured service TTY each time after a
process exited, let's do so only when the service goes back to "dead"
state. This should be preferable in case the started processes leave
background child processes around that still reference the TTY.
2. chmod() and chown() the TTY at the same time. This should make it
safe to run "systemd-run -p DynamicUser=1 -p StandardInput=tty -p
TTYPath=/dev/tty8 /bin/bash" without leaving a TTY owned by a dynamic
user around.
It's probably unexpected if we do a recursive chown() when dynamic users
are used but not on static users.
hence, let's tweak the logic slightly, and recursively chown in both
cases, except when operating on the configuration directory.
Fixes: #11842
Let services use a private UTS namespace. In addition, a seccomp filter is
installed on set{host,domain}name and a ro bind mounts on
/proc/sys/kernel/{host,domain}name.
Since commit 0722b35934, the root mountpoint is
unconditionnally turned to slave which breaks units that are using explicitly
MountFlags=shared (and no other options that would implicitly require a slave
root mountpoint).
Here is a test case:
$ systemctl cat test-shared-mount-flag.service
# /etc/systemd/system/test-shared-mount-flag.service
[Service]
Type=simple
ExecStartPre=/usr/bin/mkdir -p /mnt/tmp
ExecStart=/bin/sh -c "/usr/bin/mount -t tmpfs -o size=10M none /mnt/tmp && sleep infinity"
ExecStop=-/bin/sh -c "/usr/bin/umount /mnt/tmp"
MountFlags=shared
$ systemctl start test-shared-mount-flag.service
$ findmnt /mnt/tmp
$
Mount on /mnt/tmp is not visible from the host although MountFlags=shared was
used.
This patch fixes that and turns the root mountpoint to slave when it's really
required.