Quite often we need to set up a number of fds as stdin/stdout/stderr of
a process we are about to start. Add a generic implementation for a
routine doing that that takes care to do so properly:
1. Can handle the case where stdin/stdout/stderr where previously
closed, and the fds to set as stdin/stdout/stderr hence likely in the
0..2 range. handling this properly is nasty, since we need to first
move the fds out of this range in order to later move them back in, to
make things fully robust.
2. Can optionally open /dev/null in case for one or more of the fds, in
a smart way, sharing the open file if possible between multiple of
the fds.
3. Guarantees that O_CLOEXEC is not set on the three fds, even if the fds
already were in the 0..2 range and hence possibly weren't moved.
This adds some paranoia code that moves some of the fds we allocate for
longer periods of times to fds > 2 if they are allocated below this
boundary. This is a paranoid safety thing, in order to avoid that
external code might end up erroneously use our fds under the assumption
they were valid stdin/stdout/stderr. Think: some app closes
stdin/stdout/stderr and then invokes 'fprintf(stderr, …' which causes
writes on our fds.
This both adds the helper to do the moving as well as ports over a
number of users to this new logic. Since we don't want to litter all our
code with invocations of this I tried to strictly focus on fds we keep
open for long periods of times only and only in code that is frequently
loaded into foreign programs (under the assumptions that in our own
codebase we are smart enough to always keep stdin/stdout/stderr
allocated to avoid this pitfall). Specifically this means all code used
by NSS and our sd-xyz API:
1. our logging APIs
2. sd-event
3. sd-bus
4. sd-resolve
5. sd-netlink
This changed was inspired by this:
https://github.com/systemd/systemd/issues/8075#issuecomment-363689755
This shows that apparently IRL there are programs that do close
stdin/stdout/stderr, and we should accomodate for that.
Note that this won't fix any bugs, this just makes sure that buggy
programs are less likely to interfere with out own code.
Using strlen() to declare a buffer results in a variable-length array,
even if the compiler likely optimizes it to be a compile time constant.
When building with -Wvla, certain versions of gcc complain about such
buffers. Compiling with -Wvla has the advantage of preventing variably
length array, which defeat static asserts that are implemented by
declaring an array of negative length.
All this function does is place some data in an in-memory read-only fd,
that may be read back to get the original data back.
Doing this in a way that works everywhere, given the different kernels
we support as well as different privilege levels is surprisingly
complex.
We are using the same pattern at various places: call dup2() on an fd,
and close the old fd, usually in combination with some O_CLOEXEC
fiddling. Let's add a little helper for this, and port a few obvious
cases over.
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
There are some places in the systemd which are use the same pattern:
fd_cloexec(STDIN_FILENO, false);
fd_cloexec(STDOUT_FILENO, false);
fd_cloexec(STDERR_FILENO, false);
to unset CLOEXEC for standard file descriptors. This patch introduces
the stdio_unset_cloexec() function to hide this and make code cleaner.
In standard linux parlance, "hidden" usually means that the file name starts
with ".", and nothing else. Rename the function to convey what the function does
better to casual readers.
Stop exposing hidden_file_allow_backup which is rather ugly and rewrite
hidden_file to extract the suffix first. Note that hidden_file_allow_backup
excluded files with "~" at the end, which is quite confusing. Let's get
rid of it before it gets used in the wrong place.
This should allow tools like rkt to pre-mount read-only subtrees in the OS
tree, without breaking the patching code.
Note that the code will still fail, if the top-level directory is already
read-only.