Allocating a pty is done in a couple of places so let's introduce a new helper
which does the job.
Also the new function, as well as openpt_in_namespace(), returns both pty
master and slave so the callers don't need to know about the pty slave
allocation details.
For the same reasons machine_openpt() prototype has also been changed to return
both pty master and slave so callers don't need to allocate a pty slave which
might be in a different namespace.
Finally openpt_in_namespace() has been renamed into
openpt_allocate_in_namespace().
This doesn't really change much, but feels more correct to do, as it
ensures that all messages currently queued in the bus connections are
definitely unreffed and thus destryoing of the connection object will
follow immediately.
Strictly speaking this change is entirely unnecessary, since nothing
else could have acquired a ref to the connection and queued a message
in, however, now that we have the new sd_bus_close_unref() helper it
makes a lot of sense to use it here, to ensure that whatever happens
nothing that might have been queued fucks with us.
fdopen doesn't accept "e", it's ignored. Let's not mislead people into
believing that it actually sets O_CLOEXEC.
From `man 3 fdopen`:
> e (since glibc 2.7):
> Open the file with the O_CLOEXEC flag. See open(2) for more information. This flag is ignored for fdopen()
As mentioned by @jlebon in #11131.
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.
No code changes, just some rearranging of source files.
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
This is required for /proc/self/fd/xyz to work, but that's what we need
to convert the O_PATH fd returned by chase_symlinks() back to a regular
file fd. Hence, let's do the joining of the namespaces fully and
correctly, by doing fork()+setns()+fork() with the PID and fs
namespaces.
This makes use of the new namespace_fork() helper we just added.
Fixes: #10549
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
This rearranges chase_symlinks() a bit: if no special flags are
specified it will now revert to behaviour before
b12d25a8d6. However, if the new
CHASE_TRAIL_SLASH flag is specified it will follow the behaviour
introduced by that commit.
I wasn't sure which one to make the beaviour that requires specification
of a flag to enable. I opted to make the "append trailing slash"
behaviour the one to enable by a flag, following the thinking that the
function should primarily be used to generate a normalized path, and I
am pretty sure a path without trailing slash is the more "normalized"
one, as the trailing slash is not really a part of it, but merely a
"decorator" that tells various system calls to generate ENOTDIR if the
path doesn't refer to a path.
Or to say this differently: if the slash was part of normalization then
we really should add it in all cases when the final path is a directory,
not just when the user originally specified it.
Fixes: #8544
Replaces: #8545
Using wait_for_terminate_and_check() instead of wait_for_terminate()
let's us simplify, shorten and unify the return value checking and
logging of waitid(). Hence, let's use it all over the place.
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
Already, path_is_safe() refused paths container the "." dir. Doing that
isn't strictly necessary to be "safe" by most definitions of the word.
But it is necessary in order to consider a path "normalized". Hence,
"path_is_safe()" is slightly misleading a name, but
"path_is_normalize()" is more descriptive, hence let's rename things
accordingly.
No functional changes.
If you use machinectl to bind a file into a container, it responds with a confusing error message about a temporary directory not being a directory.
I just swapped it to error with the source that was passed, rather than the tmpdir.
It would also be nice to be able to bind files, but that's a separate issue (#7195).
Before the change:
root@epona /var/lib/sandbox $ cat bar/foo
Hello world!
root@epona /var/lib/sandbox $ machinectl bind testing /var/lib/sandbox/bar/foo /foo
Failed to bind mount: Failed to overmount /tmp/propagate.W5TNsj/mount: Not a directory
After the change:
root@epona /var/lib/sandbox $ machinectl bind testing /var/lib/sandbox/bar/foo /foo
Failed to bind mount: Failed to overmount /var/lib/sandbox/bar/foo: Not a directory
As the kernel won't map the UIDs this is simply not safe, and hence we
should generate a clean error and refuse it.
We can restore this feature later should a "shiftfs" become available in
the kernel.
This changes the file copy logic of machined to set the UID/GID of all
copied files to 0 if the host and container do not share the same user
namespace.
Fixes: #4078
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
UID/GID mapping with userns can be arbitrarily complex. Let's break this
down to a single admin-friendly parameter: let's expose the UID/GID
shift of a container via a new bus call for each container, and let's
show this as part of "machinectl status" if it is not 0.
This should work for pretty much all real-life full OS container setups
(i.e. the stuff machined is suppose to be useful for). For everything
else we generate a clean error, clarifying that we can't expose the
mapping.
This adds a bus call GetImageOSRelease() to the Manager interface that
retrieves the /etc/os-release file of a machine image. It matches the existing
GetMachineOSRelease() call, however operates on a disk image rather than a
running container.
The backend for this call on .raw images is implemented via the generalized
image dissector, which makes this scheme relatively easy to implement.
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
This is a follow-up to 5d2036b5f3, and also makes
the "machinectl clean" verb asynchronous, after all it's little more than a
series of image removals.
The changes required to make this happen are a bit more comprehensive as we
need to pass information about deleted images back to the client, as well as
information about the image we failed on if we failed on one. Hence, create a
temporary file in /tmp, serialize that data into, and read it from the parent
after the operation is complete.
There's no point in explicitly closing the errno pipe, if we exit right after
anyway. It doesn't hurt doing this either, but let's do this the same way for
all cases where we use the "Operation" object right now, and in all other cases
we do not close the pipe explicitly, hence don't do so here either.
This new call returns a file descriptor for the root directory of a container.
This file descriptor may then be used to access the rest of the container's
file system, via openat() and similar calls. Since the file descriptor returned
is for the file system namespace inside of the container it may be used to
access all files of the container exactly the way the container itself would
see them. This is particularly useful for containers run directly from
loopback media, for example via systemd-nspawn's --image= switch. It also
provides access to directories such as /run of a container that are normally
not accessible to the outside of a container.
This replaces PR #2870.
Fixes: #2870