It's possible for a zombie process to have live threads. These are not listed
in /sys in "cgroup.procs" for cgroupsv2, but they show up in
"cgroup.threads" (cgroupv2) or "tasks" (cgroupv1) nodes. When killing a
cgroup (v2 only) with SIGKILL, let's also kill threads after killing processes,
so the live threads of a zombie get killed too.
Closes#12262.
prefix_root() is equivalent to path_join() in almost all ways, hence
let's remove it.
There are subtle differences though: prefix_root() will try shorten
multiple "/" before and after the prefix. path_join() doesn't do that.
This means prefix_root() might return a string shorter than both its
inputs combined, while path_join() never does that. I like the
path_join() semantics better, hence I think dropping prefix_root() is
totally OK. In the end the strings generated by both functon should
always be identical in terms of path_equal() if not streq().
This leaves prefix_roota() in place. Ideally we'd have path_joina(), but
I don't think we can reasonably implement that as a macro. or maybe we
can? (if so, sounds like something for a later PR)
Also add in a few missing OOM checks
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).
Fixes#12539
cap_last_cap() returns the last valid cap (instead of the number of
valid caps). to iterate through all known caps we hence need to use a <=
check, and not a < check like for all other cases. We got this right
usually, but in three cases we did not.
Using C11 thread-local storage in destructors causes uninitialized
read. Let's avoid that using a direct comparison instead of using
the cached values. As this code path is taken only when compiled
with -DVALGRIND=1, the performance cost shouldn't matter too much.
Fixes#12814
Allocating a pty is done in a couple of places so let's introduce a new helper
which does the job.
Also the new function, as well as openpt_in_namespace(), returns both pty
master and slave so the callers don't need to know about the pty slave
allocation details.
For the same reasons machine_openpt() prototype has also been changed to return
both pty master and slave so callers don't need to allocate a pty slave which
might be in a different namespace.
Finally openpt_in_namespace() has been renamed into
openpt_allocate_in_namespace().
fstat(2) is fine with O_PATH fds.
For changing owership of a file opened with O_PATH, there's fchownat(2).
Only changing permissions is problematic but we introduced fchmod_opath() for
that purpose.
Only changing ownership back to root is not enough we also need to
change the access mode, otherwise the user might have set 666 first, and
thus allow everyone access before and after the chown().
Inspired by #12431 let's also rework chmod_and_chown() and make sure we
never add more rights to a file not owned by the right user.
Also, let's make chmod_and_chown() just a wrapper arond
fchmod_and_chown().
let's also change strategy: instead of chown()ing first and stating
after on failure and supressing errors, let's avoid the chown in the
firts place, in the interest on keeping things minimal.
This restores show_pid_array() output in legacy locales on the console.
Only one call to get_process_cmdline() is changed, all others retain
utf8-only mode. This affects systemd-cgls, systemctl status, etc, when
working locally.
Calls to get_process_cmdline() that cross a process boundary always use
utf8. It's the callers responsibility to convert this to some encoding that
they use. This means that we always pass utf8 over the bus.
It turns out that the kernel allows comm names higher than our expected limit
of 16.
$ wc -c /proc/*/comm|sort -g|tail -n3
35 /proc/1292317/comm
35 /proc/1293610/comm
36 /proc/1287112/comm
$ cat /proc/1287112/comm
kworker/u9:3-kcryptd/253:0
The functions to retrieve and print process cmdlines were based on the
assumption that they contain printable ASCII, and everything else
should be filtered out. That assumption doesn't hold in today's world,
where people are free to use unicode everywhere.
This replaces the custom cmdline reading code with a more generic approach
using utf8_escape_non_printable_full().
For kernel threads, truncation is done on the parenthesized name, so we'll
get "[worker]", "[worker…]", …, "[w…]", "[…", "…" as we reduce the number of
available columns.
This implementation is most likely slower for very long cmdlines, but I don't
think this is very important. The common case is to have short commandlines,
and should print those properly. Absurdly long cmdlines are the exception,
which needs to be handled correctly and safely, but speed is not too important.
Fixes#12532.
v2:
- use size_t for the number of columns. This change propagates into various
other functions that call get_process_cmdline(), increasing the size of the
patch, but the changes are rather trivial.