This extends the --bind= and --overlay= syntax so that an empty string as source/upper
directory is taken as request to automatically allocate a temporary directory
below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination
with the "+" path extension this permits a switch "--overlay=+/var::/var" in
order to use the container's shipped /var, combine it with a writable temporary
directory and mount it to the runtime /var of the container.
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.
This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
chase_symlinks() currently expects a fully qualified, absolute path, relative
to the host's root as first argument. Which is useful in many ways, and similar
to the paths unlink(), rename(), open(), … expect. Sometimes it's however
useful to first prefix the specified path with the specified root directory.
Add a new call chase_symlinks_prefix() for this, that is a simple wrapper.
As suggested in PR #3667.
This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.
This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.
Fixes: #3667
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.
Fixes: #2001
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
We generally try to make our destructors robust regarding NULL pointers, much
in the same way as glibc's free(). Do this also for unit_free().
Follow-up for #4748.
So far systemd-nspawn container has been creating files under
/run/systemd/inaccessible, no matter whether it's running in user
namespace or not. That's fine for regular files, dirs, socks, fifos.
However, it's not for block and character devices, because kernel
doesn't allow them to be created under user namespace. It results
in warnings at booting like that:
====
Couldn't stat device /run/systemd/inaccessible/chr
Couldn't stat device /run/systemd/inaccessible/blk
====
Thus we need to have the cgroups whitelisting handler to silently ignore
a file, when the device path is prefixed with "-". That's exactly the
same convention used in directives like ReadOnlyPaths=. Also insert the
prefix "-" to inaccessible entries.
IMA validates file signatures based on the security.ima xattr. As of
Linux-4.7, instead of copying the IMA policy into the securityfs policy,
the IMA policy pathname can be written, allowing the IMA policy file
signature to be validated.
This patch modifies the existing code to first attempt to write the
pathname, but on failure falls back to copying the IMA policy contents.
We stay in the SERVICE_START while no READY=1 notification message has
been received. When we are in the SERVICE_START_POST state, we have
already received a ready notification. Hence we should not fail when the
cgroup becomes empty in that state.
This will allow us to have several managers sharing an event loop
and running in parallel, as if they were running in separate processes.
The long term-aim is to allow networkd to be split into separate
processes, so restructure the code to make this simpler.
For now we drop the exit-on-idle logic, as this was anyway severely
restricted at the moment. Once split, we will revisit this as it may
then make more sense again.
Since a581e45ae8, there's a few function calls to
unit_new_for_name which will unit_free on failure. Prior to this commit,
a failure would result in calling unit_free with a NULL unit, and hit an
assertion failure, seen at least via device_setup_unit:
Assertion 'u' failed at src/core/unit.c:519, function unit_free(). Aborting.
Fixes#4747https://bugs.archlinux.org/task/51950
strtoul() parses leading whitespace and an optional sign;
check that the first character is a digit to prevent odd
specifications like "00: 00: 00" and "-00:+00/-1".
This is a different way to implement the fix proposed by commit
a4021390fe suggested by Lennart Poettering.
In this patch we instruct PID1 to not kill "systemctl switch-root" command
started by initrd-switch-root service using the "argv[0][0]='@'" trick.
See: https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/ for
more details.
We had to backup argv[0] because argv is modified by dispatch_verb().