Commit graph

532 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 4a5567d5d6 Merge pull request #4795 from poettering/dissect
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10 01:08:13 -05:00
Wim de With 2e1f244efd nspawn: add missing -E to getopt_long (#4860) 2016-12-10 07:33:58 +03:00
Franck Bui 5367354dae nspawn: resolv.conf might not be created initially (#4799)
This might happen that resolv.conf is missing in a minimal rootfs and in this
case the following warning is emitted:

 Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory

This patch fixes this case.
2016-12-07 21:36:39 +01:00
Lennart Poettering 4623e8e6ac nspawn/dissect: automatically discover dm-verity verity partitions
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.

Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.

mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:

```
 # mkdir test
 # cd test
 # mkosi --verity
 # systemd-nspawn -i ./image.raw -bn
```

Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07 18:38:41 +01:00
Lennart Poettering 4827ab4854 nspawn: when generating a machine name from an image name, truncate .raw suffix
Let's prettify the machine name we generate for image-based containers: let's
chop off the .raw suffix before using it as machine name.
2016-12-07 18:38:41 +01:00
Lennart Poettering 18b5886e56 dissect: add support for encrypted images
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.

It's pretty basic: only support for passphrase-based encryption.

In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.
2016-12-07 18:38:41 +01:00
Lennart Poettering 2d8457851b nspawn: port nspawn to new generalized image dissection code
Let's make use of the new internal API. This mostly doesn't change anything for
the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new
code can handle disk images with no partition tables, and make any detected
images directly the root.
2016-12-07 18:38:40 +01:00
Lennart Poettering cb638b5e96 util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENT
As suggested by @keszybz
2016-12-01 12:49:55 +01:00
Lennart Poettering 86c0dd4a71 nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.

This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
2016-12-01 12:41:18 +01:00
Lennart Poettering e28c7cd066 tree-wide: set SA_RESTART for signal handlers we install
We already set it in most cases, but make sure to set it in all others too, and
document that that's a good idea.
2016-12-01 12:41:17 +01:00
Lennart Poettering ad85779a50 nspawn: split out overlayfs argument parsing into a function of its own
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and
bind_mount_parse().
2016-12-01 00:25:51 +01:00
Lennart Poettering 8d4aa2bb32 nspawn: make use of CHASE_NON_EXISTING when locking image
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
2016-12-01 00:25:51 +01:00
Lennart Poettering c4f4fce79e fs-util: add flags parameter to chase_symlinks()
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
2016-12-01 00:25:51 +01:00
Lennart Poettering 8cd328d82e nspawn: accept --ephemeral --template= as alternative for --ephemeral --directory=
As suggested in PR #3667.

This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.

This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.

Fixes: #3667
2016-12-01 00:25:51 +01:00
Lennart Poettering 3f342ec4b0 nspawn: properly handle image/directory paths that are symlinks
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.

Fixes: #2001
2016-12-01 00:25:51 +01:00
Lennart Poettering e187369587 tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
2016-12-01 00:25:51 +01:00
Lennart Poettering 17cbb288fa nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.

This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.

Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
2016-11-22 13:35:09 +01:00
Lennart Poettering c67b008273 nspawn: remove temporary root directory on exit
When mountint a loopback image, we need a temporary root directory we can mount
stuff to. Make sure to actually remove it when exiting, so that we don't leave
stuff around in /tmp unnecessarily.

See: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering 6a0f896b97 nspawn: try to wait for the container PID 1 to exit, before we exit
Let's make the shutdown logic synchronous, so that there's a better chance to
detach the loopback device after use.
2016-11-22 13:35:09 +01:00
Lennart Poettering 0f3be6ca4d nspawn: support ephemeral boots from images
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.

As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.

Fixes: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering f4ff4aa800 Merge pull request #4395 from s-urbaniak/rw-support
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18 12:36:46 +01:00
Sergiusz Urbaniak 4f086aab52
nspawn: R/W support for /sys, and /proc/sys
This commit adds the possibility to leave /sys, and /proc/sys read-write.
It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE
to enable this feature.

If set to "yes", /sys, and /proc/sys will be read-write.
If set to "no", /sys, and /proc/sys will be read-only.
If set to "network" /proc/sys/net will be read-write. This is useful in
use-cases, where systemd-nspawn is used in an external network
namespace.

This adds the possibility to start privileged containers which need more
control over settings in the /proc, and /sys filesystem.

This is also a follow-up on the discussion from
https://github.com/systemd/systemd/pull/4018#r76971862 where an
introduction of a simple env var to enable R/W support for those
directories was already discussed.
2016-11-18 09:50:40 +01:00
Zbigniew Jędrzejewski-Szmek 2a49b6120f nspawn: restart the whole systemd-nspawn@.service unit on container reboot (#4613)
Since 133 is now used in a few places, add a #define for it.
Also make the status message a bit informative.

Another issue introduced in b006762. The logic was borked, we were supposed
to return 0 to break the loop, and 133 to restart the container, not the other
way around.

But this doesn't seem to work, reboot fails with:
Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists
So actually the version before this patch worked better, since 133 > 0 and we'd
at least loop internally.
2016-11-14 11:49:49 +01:00
Christian Hesse 7debb05dbe nspawn: fix condition for mounting resolv.conf (#4622)
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us
whether or not systemd-resolved is running or not.
So check for /run/systemd/resolve/resolv.conf as well, which is created
at runtime and hence is a better indication.
2016-11-08 22:01:26 -05:00
Zbigniew Jędrzejewski-Szmek a809cee582 Merge pull request #4612 from keszybz/format-strings
Format string tweaks (and a small fix on 32bit)
2016-11-08 08:09:40 -05:00
Martin Pitt cfed63f60d nspawn: fix exit code for --help and --version (#4609)
Commit b006762 inverted the initial exit code which is relevant for --help and
--version without a particular reason.  For these special options, parse_argv()
returns 0 so that our main() immediately skips to the end without adjusting
"ret". Otherwise, if an actual container is being started, ret is set on error
in run(), which still provides the "non-zero exit on error" behaviour.

Fixes #4605.
2016-11-07 23:31:55 -05:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Lennart Poettering 2bce2acce8 nspawn: if we set up a loopback device, try to mount it with "discard"
Let's make sure that our loopback files remain sparse, hence let's set
"discard" as mount option on file systems that support it if the backing device
is a loopback.
2016-11-02 11:39:49 -06:00
Evgeny Vereshchagin 6d66bd3b2a nspawn: become a new root early
036d523641

> vfs: Don't create inodes with a uid or gid unknown to the vfs
  It is expected that filesystems can not represent uids and gids from
  outside of their user namespace.  Keep things simple by not even
  trying to create filesystem nodes with non-sense uids and gids.

So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955

$ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target

Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide.
Press ^] three times within 1s to kill container.
Child died too early.
Selected user namespace base 1073283072 and range 65536.
Failed to mount to /sys/fs/cgroup/systemd: No such file or directory

Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519
Fixes: #4352
2016-10-23 23:23:42 -04:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 24597ee0e6 nspawn, NEWS: add missing "s" in --private-users-chown (#4438) 2016-10-21 06:03:26 +03:00
Evgeny Vereshchagin f0bef277a4 nspawn: cleanup and chown the synced cgroup hierarchy (#4223)
Fixes: #4181
2016-10-13 09:50:46 -04:00
Zbigniew Jędrzejewski-Szmek 60e76d4897 nspawn,mount-util: add [u]mount_verbose and use it in nspawn
This makes it easier to debug failed nspawn invocations:

Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")...
Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")...
Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")...
Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")...
Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")...
Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11 16:50:07 -04:00
Zbigniew Jędrzejewski-Szmek ada5412039 nspawn: simplify arg_us_cgns passing
We would check the condition cg_ns_supported() twice. No functional
change.
2016-10-11 16:46:58 -04:00
Lennart Poettering 6dca2fe325 Merge pull request #4332 from keszybz/nspawn-arguments-3
nspawn --private-users parsing, v2
2016-10-10 19:51:51 +02:00
Evgeny Vereshchagin a0f72a24e0 Merge pull request #4310 from keszybz/nspawn-autodetect
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10 20:47:25 +03:00
Zbigniew Jędrzejewski-Szmek be7157316c nspawn: better error messages for parsing errors
In particular, the check for arg_uid_range <= 0 is moved to the end, so that
"foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek ae209204d8 nspawn,man: fix parsing of numeric args for --private-users, accept any boolean
This is like the previous reverted commit, but any boolean is still accepted,
not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek 6c2058b35e Revert "nspawn: fix parsing of numeric arguments for --private-users"
This reverts commit bfd292ec35.
2016-10-10 11:17:40 -04:00
Zbigniew Jędrzejewski-Szmek bfd292ec35 nspawn: fix parsing of numeric arguments for --private-users
The documentation says lists "yes", "no", "pick", and numeric arguments.
But parse_boolean was attempted first, so various numeric arguments were
misinterpreted.

In particular, this fixes --private-users=0 to mean the same thing as
--private-users=0:65536.

While at it, use strndupa to avoid some error handling.
Also give a better error for an empty UID range. I think it's likely that
people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09 11:52:35 -04:00
Zbigniew Jędrzejewski-Szmek 27eb8e9028 nspawn: reindent table 2016-10-09 11:51:18 -04:00
Zbigniew Jędrzejewski-Szmek a8725a06e6 nspawn: also fall back to legacy cgroup hierarchy for old containers
Current systemd version detection routine cannot detect systemd 230,
only systmed >= 231. This means that we'll still use the legacy hierarchy
in some cases where we wouldn't have too. If somebody figures out a nice
way to detect systemd 230 this can be later improved.
2016-10-08 19:03:53 -04:00
Zbigniew Jędrzejewski-Szmek 0fd9563fde nspawn: use mixed cgroup hierarchy only when container has new systemd
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy.
So make an educated guess, and use the mixed hierarchy in that case.

Tested by running the host with mixed hierarchy (i.e. simply using a recent
kernel with systemd from git), and booting first a container with older systemd,
and then one with a newer systemd.

Fixes #4008.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 27e29a1e43 nspawn: fix spurious reboot if container process returns 133 2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek b006762524 nspawn: move the main loop body out to a new function
The new function has 416 lines by itself!

"return log_error_errno" is used to nicely reduce the volume of error
handling code.

A few minor issues are fixed on the way:
- positive value was used as error value (EIO), causing systemd-nspawn
  to return success, even though it shouldn't.
- In two places random values were used as error status, when the
  actual value was in an unusual place (etc_password_lock, notify_socket).

Those are the only functional changes.

There is another potential issue, which is marked with a comment, and left
unresolved: the container can also return 133 by itself, causing a spurious
reboot.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 98afd6af3a nspawn: check env var first, detect second
If we are going to use the env var to override the detection result
anyway, there is not point in doing the detection, especially that
it can fail.
2016-10-08 14:48:41 -04:00
Lennart Poettering 7429b2eb83 tree-wide: drop some misleading compiler warnings
gcc at some optimization levels thinks thes variables were used without
initialization. it's wrong, but let's make the message go anyway.
2016-10-06 19:04:10 +02:00
Djalal Harouni 41eb436265 nspawn: add log message to let users know that nspawn needs an empty /dev directory (#4226)
Fixes https://github.com/systemd/systemd/issues/3695

At the same time it adds a protection against userns chown of inodes of
a shared mount point.
2016-10-05 06:57:02 +02:00
Alban Crequy 19caffac75 nspawn: set shared propagation mode for the container 2016-10-03 14:19:27 +02:00