Commit graph

832 commits

Author SHA1 Message Date
Anita Zhang f66ad46066 nspawn: don't hard fail when setting capabilities
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes #12539
2019-06-20 21:46:36 +02:00
Yu Watanabe 657ee2d82b tree-wide: replace strjoin() with path_join() 2019-06-21 03:26:16 +09:00
Franck Bui dc98caea32 nspawn: make use of openpt_allocate() 2019-06-18 09:27:06 +02:00
Franck Bui 3acc84ebd9 nspawn: allocate the pty used for /dev/console within the container
The console tty is now allocated from within the container so it's not
necessary anymore to allocate it from the host and bind mount the pty slave
into the container. The pty master is sent to the host.

/dev/console is now a symlink pointing to the pty slave.

This might also be less confusing for applications running inside the container
and the overall result looks cleaner (we don't need to apply manually the
passed selinux context, if any, to the allocated pty for instance).
2019-06-18 08:17:34 +02:00
Franck Bui ba72801d66 nspawn: use correct error variable when logging errors returned by send_one_fd() 2019-06-18 07:54:51 +02:00
Zbigniew Jędrzejewski-Szmek 0985c7c4e2 Rework cpu affinity parsing
The CPU_SET_S api is pretty bad. In particular, it has a parameter for the size
of the array, but operations which take two (CPU_EQUAL_S) or even three arrays
(CPU_{AND,OR,XOR}_S) still take just one size. This means that all arrays must
be of the same size, or buffer overruns will occur. This is exactly what our
code would do, if it received an array of unexpected size over the network.
("Unexpected" here means anything different from what cpu_set_malloc() detects
as the "right" size.)

Let's rework this, and store the size in bytes of the allocated storage area.

The code will now parse any number up to 8191, independently of what the current
kernel supports. This matches the kernel maximum setting for any architecture,
to make things more portable.

Fixes #12605.
2019-05-29 10:20:42 +02:00
Ben Boeckel 5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Dominick Grift 8f1ed04ad6 nspawn: Fix volatile SELinux label
nspawn should associate the specified nspawn container apifs object label instead of the nspawn container process label with the volatile tmpfs
2019-04-13 12:03:02 +02:00
Anita Zhang 7bc5e0b12b seccomp: check more error codes from seccomp_load()
We noticed in our tests that occasionally SystemCallFilter= would
fail to set and the service would run with no syscall filtering.
Most of the time the same tests would apply the filter and fail
the service as expected. While it's not totally clear why this happens,
we noticed seccomp_load() in the systemd code base would fail open for
all errors except EPERM and EACCES.

ENOMEM, EINVAL, and EFAULT seem like reasonable values to add to the
error set based on what I gather from libseccomp code and man pages:

-ENOMEM: out of memory, failed to allocate space for a libseccomp structure, or would exceed a defined constant
-EINVAL: kernel isn't configured to support the operations, args are invalid (to seccomp_load(), seccomp(), or prctl())
-EFAULT: addresses passed as args are invalid
2019-04-12 10:23:07 +02:00
Lennart Poettering 1eacc47062 nspawn: create boot_id and kmsg files for overmounting in /run, not /tmp
/tmp might not be mounted at all yet (given that we support
SYSTEMD_NSPAWN_TMPFS_TMP=0 to turn this off), and /tmp is a dir systemd
usually tries to unmount during shutdown (unlike /run), and we shouldn't
keep it busy. Hence let's just move these deleted files to /run so that
we don't keep /tmp needlessly busy.
2019-04-07 08:55:31 +02:00
Lennart Poettering c614711386 tree-wide: use SYNTHETIC_ERRNO() where appropriate 2019-04-02 14:54:42 +02:00
Lennart Poettering 8a016c746e util-lib: when copying files make sure to apply some chattrs early, some late
Some chattrs only work sensible if you set them right after opening a
file for create (think: FS_NOCOW_FL). Others only work when they are
applied when the file is fully written (think: FS_IMMUTABLE_FL). Let's
take that into account when copying files and applying a chattr to them.
2019-03-28 18:43:04 +01:00
Zbigniew Jędrzejewski-Szmek e1af3bc62a
Merge pull request #12106 from poettering/nosuidns
add "nosuid" flag to exec directory mounts of DynamicUser=1 services
2019-03-26 08:58:00 +01:00
Lennart Poettering 25e68fd397 nspawn: minor improvements to --help text 2019-03-26 08:06:00 +01:00
Lennart Poettering 64e82c1976 mount-util: beef up bind_remount_recursive() to be able to toggle more than MS_RDONLY
The function is otherwise generic enough to toggle other bind mount
flags beyond MS_RDONLY (for example: MS_NOSUID or MS_NODEV), hence let's
beef it up slightly to support that too.
2019-03-25 19:33:55 +01:00
Lennart Poettering 83276695c6
Merge pull request #12079 from keszybz/fuzz-nspawn-oci
Add fuzzer for nspawn-oci
2019-03-22 21:06:17 +01:00
Lennart Poettering e4077ff6f3 nspawn: don't free "fds" twice
Previously both run() and run_container() would free 'fds'. Let's fix
that, and let run() free it but make run_container() already remove all
fds from it, because that's what we actually want to do.

Fixes: #12073
2019-03-22 18:11:27 +01:00
Zbigniew Jędrzejewski-Szmek b2645747b7 nspawn-oci: fix double free
Also rename function to make it clear that it also frees the array
object itself.
2019-03-22 17:39:12 +01:00
Lennart Poettering 60ffa37a65 main-func: implicitly save argc/argv in DEFINE_MAIN_FUNCTION() functions
Let's remove the risk of forgetting to save argc/argv if
DEFINE_MAIN_FUNCTION() is used.
2019-03-21 18:10:06 +01:00
Lennart Poettering 36fea15565 util: introduce save_argc_argv() helper 2019-03-21 18:08:56 +01:00
Lennart Poettering c82cfae00b
Merge pull request #12062 from poettering/nspawn-main-func
nspawn: port to DEFINE_MAIN_FUNCTION()
2019-03-21 18:08:27 +01:00
Zbigniew Jędrzejewski-Szmek bb068de080 nspawn: add --no-pager switch
It only matters for --help.
2019-03-21 17:42:43 +01:00
Lennart Poettering 04f590a4a4 nspawn: voidify sd_notify() calls 2019-03-21 16:32:46 +01:00
Lennart Poettering 6145bb4f78 nspawn: port to static destructors 2019-03-21 16:32:46 +01:00
Lennart Poettering 44dbef90f1 nspawn: port to main-func.h logic 2019-03-21 16:32:46 +01:00
Lennart Poettering f4e803c809 nspawn: add a few missing flags from --help text 2019-03-21 13:31:09 +01:00
Lennart Poettering 2514865391 nspawn: reorder --help text, and add section
The list is so long, let's add a bit of structure and order things a
bit.
2019-03-21 13:27:19 +01:00
Lennart Poettering a3fc6b55ac nspawn: mask out CAP_NET_ADMIN again if settings file turns off private networking
Fixes: #11755
2019-03-15 15:42:21 +01:00
Lennart Poettering bd4b15f274 nspawn: use right constant for shifting for uint64_t caps 2019-03-15 15:42:20 +01:00
Lennart Poettering de40a3037a nspawn: add support for executing OCI runtime bundles with nspawn
This is a pretty large patch, and adds support for OCI runtime bundles
to nspawn. A new switch --oci-bundle= is added that takes a path to an
OCI bundle. The JSON file included therein is read similar to a .nspawn
settings files, however with a different feature set.

Implementation-wise this mostly extends the pre-existing Settings object
to carry additional properties for OCI. However, OCI supports some
concepts .nspawn files did not support yet, which this patch also adds:

1. Support for "masking" files and directories. This functionatly is now
   also available via the new --inaccesible= cmdline command, and
   Inaccessible= in .nspawn files.

2. Support for mounting arbitrary file systems. (not exposed through
   nspawn cmdline nor .nspawn files, because probably not a good idea)

3. Ability to configure the console settings for a container. This
   functionality is now also available on the nspawn cmdline in the new
   --console= switch (not added to .nspawn for now, as it is something
   specific to the invocation really, not a property of the container)

4. Console width/height configuration. Not exposed through
   .nspawn/cmdline, but this may be controlled through $COLUMNS and
   $LINES like in most other UNIX tools.

5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on
   the cmdline, since containers likely have different user tables, and
   the existing --user= switch appears to be the better option)

6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to
   OCI)

7. Creation of additional devices nodes in /dev. Most likely not a good
   idea, hence not exposed in .nspawn/cmdline. There's already --bind=
   to achieve the same, which is the better alternative.

8. Explicit syscall filters. This is not a good idea, due to the skewed
   arch support, hence not exposed through .nspawn/cmdline.

9. Configuration of some sysctls on a whitelist. Questionnable, not
   supported in .nspawn/cmdline for now.

10. Configuration of all 5 types of capabilities. Not a useful concept,
    since the kernel will reduce the caps on execve() anyway. Not
    exposed through .nspawn/cmdline as this is not very useful hence.

Note that this only implements the OCI runtime logic itself. It does not
provide a runc-compatible command line tool. This is left for a later
PR. Only with that in place tools such as "buildah" can use the OCI
support in nspawn as drop-in replacement.

Currently still missing is OCI hook support, but it's already parsed and
everything, and should be easy to add. Other than that it's OCI is
implemented pretty comprehensively.

There's a list of incompatibilities in the nspawn-oci.c file. In a later
PR I'd like to convert this into proper markdown and add it to the
documentation directory.
2019-03-15 15:41:28 +01:00
Lennart Poettering d8b4d14df4 util: split out nulstr related stuff to nulstr-util.[ch] 2019-03-14 13:25:52 +01:00
Lennart Poettering 0cb8e3d118 util: split out namespace related stuff into a new namespace-util.[ch] pair
Just some minor reorganiztion.
2019-03-13 12:16:38 +01:00
Lennart Poettering 27da7ef0d0 nspawn: move payload to sub-cgroup first, then sync cgroup trees
if we sync the legacy and unified trees before moving to the right
subcgroup then ultimately the cgroup paths in the hierarchies will be
out-of-sync... Hence, let's move the payload first, and sync then.

Addresses: https://github.com/systemd/systemd/pull/9762#issuecomment-441187979
2019-03-07 11:26:17 +01:00
Lennart Poettering adc6f43b14 copy: don't synthesize a 'user.crtime_usec' xattr on copy unless explicitly requested
Previously, when we'd copy an individual file we'd synthesize a
user.crtime_usec xattr with the source's creation time if we can
determine it. As the creation/birth time was until recently not
queriable form userspace this effectively just propagated the same xattr
on the source to the same xattr on the destination. However, current
kernels now allow to query the birthtime using statx() and we do make
use of that now. Which means that suddenly we started synthesizing these
xattrs much more regularly.

Doing this actually does make sense, but only in very few cases:
not for the typical regular files we copy, but certainly when dealing
with disk images. Hence, let's keep this kind of propagation, but let's
make it a flag and default to off. Then turn it on whenever we deal with
disk images, and leave it off otherwise.

This is particularly relevant as overlayfs combining a real fs, and a
tmpfs on top will result in EOPNOTSUPP when it is attempted to open a
file with xattrs for writing, as tmpfs does not support xattrs, and
hence the copy-up cannot work. Hence, let's avoid synthesizing this
needlessly, to increase compat with overlayfs.
2019-03-01 14:11:07 +01:00
Lennart Poettering e5a4bb0d4e nspawn: rework how arg_read_only is initialized in --volatile= mode
Previously, we'd refuse the combination, and claimed we'd imply it, but
actually didn't. Let's allow the combination and imply read-only from
--volatile=, because that's what's documented, what we claim we do, and
what makes sense.
2019-03-01 14:11:07 +01:00
Lennart Poettering 83205269c0 nspawn: refactor how we determine whether it's OK to write to /etc 2019-03-01 14:11:07 +01:00
Lennart Poettering e50cd82f68 nspawn: no need to make top-level directory a bind mount if we just dissected an image 2019-03-01 14:11:07 +01:00
Lennart Poettering 7d0ecdd62d nspawn: slightly reorder mount logic
Let's first setup the volatile logic, and only then mount secondary
partitions of the image in.
2019-03-01 14:11:07 +01:00
Lennart Poettering 6c610acaaa nspawn: add --volatile=overlay support
Fixes: #11054 #3847
2019-03-01 14:11:06 +01:00
Lennart Poettering e5b43a04b6 nspawn: add volatile mode multiplexer call setup_volatile_mode()
Just some refactoring, no change in behaviour.
2019-03-01 14:11:06 +01:00
Lennart Poettering 2949ff2691 nspawn: ignore SIGPIPE for nspawn itself
Let's not abort due to a dead stdout.

Fixes: #11533
2019-01-26 13:54:44 +01:00
Lennart Poettering b2238e380e test,systemctl,nspawn: use "const char*" instead of "char*" as iterator for FOREACH_STRING()
The macro iterates through literal strings (i.e. constant strings),
hence it's more correct to have the iterator const too.
2019-01-16 12:29:30 +01:00
Zbigniew Jędrzejewski-Szmek 489fae526d nspawn: check cg_ns_supported() just once
cg_ns_supported() caches, so the condition was really checked just once, but
it looks weird to assign the return value to arg_use_cgns (if the variable is not present),
because then the other checks are effectively equivalent to
  if (cg_ns_supported() && cg_ns_supported()) { ...
and later
  if (!cg_ns_supported() || !cg_ns_supported()) { ...
2018-12-11 13:37:41 +00:00
Lennart Poettering 60f1ec13ed nspawn: move most validation checks and configuration mangling into verify_arguments()
That's what the function is for after all, and only if it's done there
we can verify the effect of .nspawn files correctly too: after all we
should not just validate that everything configured on the command line
makes sense, but the stuff configured in the .nspawn files, too.
2018-12-10 12:54:56 +01:00
Lennart Poettering d5455d2f98 nspawn: split out code parsing env vars into a function of its own
This then let's us to ensure it's called after we parsed the cmdline,
and after we loaded the settings file, so that it these env var settings
override everything loaded from there.
2018-12-10 12:54:56 +01:00
Lennart Poettering 5eee829043 nspawn: move cg_unified_flush() invocation out of parse_argv()
It has nothing to do with argument parsing, and hence shouldn't be
there.
2018-12-10 12:54:56 +01:00
Yu Watanabe 503f480f8e missing: move fs or mount related definitions to missing_fs.h
This also fixes errnous definition MS_REC -> MS_SLAVE.
2018-12-06 13:30:43 +01:00
Lennart Poettering e4de72876e util-lib: split out all temporary file related calls into tmpfiles-util.c
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.

No code changes, just some rearranging of source files.
2018-12-02 13:22:29 +01:00
Zbigniew Jędrzejewski-Szmek 049af8ad0c Split out part of mount-util.c into mountpoint-util.c
The idea is that anything which is related to actually manipulating mounts is
in mount-util.c, but functions for mountpoint introspection are moved to the
new file. Anything which requires libmount must be in mount-util.c.

This was supposed to be a preparation for further changes, with no functional
difference, but it results in a significant change in linkage:

$ ldd build/libnss_*.so.2
(before)
build/libnss_myhostname.so.2:
	linux-vdso.so.1 (0x00007fff77bf5000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000)
build/libnss_mymachines.so.2:
	linux-vdso.so.1 (0x00007ffc19cc0000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000)
build/libnss_resolve.so.2:
	linux-vdso.so.1 (0x00007ffe8ef8e000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000)
build/libnss_systemd.so.2:
	linux-vdso.so.1 (0x00007ffda6d17000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f610b995000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000)

(after)
build/libnss_myhostname.so.2:
	linux-vdso.so.1 (0x00007fff0b5e2000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000)
build/libnss_mymachines.so.2:
	linux-vdso.so.1 (0x00007ffdc30a7000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000)
build/libnss_resolve.so.2:
	linux-vdso.so.1 (0x00007ffe95747000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000)
build/libnss_systemd.so.2:
	linux-vdso.so.1 (0x00007ffe67b51000)
	librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000)

I don't quite understand what is going on here, but let's not be too picky.
2018-11-29 21:03:44 +01:00
Yu Watanabe acf4d15893 util: make *_from_name() returns negative errno on error 2018-11-28 20:20:50 +09:00
Lennart Poettering da9fc98ded tree-wide: port more code over to PATH_STARTSWITH_SET() 2018-11-26 14:08:46 +01:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Zbigniew Jędrzejewski-Szmek 294bf0c34a Split out pretty-print.c and move pager.c and main-func.h to shared/
This is high-level functionality, and fits better in shared/ (which is for
our executables), than in basic/ (which is also for libraries).
2018-11-20 18:40:02 +01:00
Lennart Poettering 042cad5737
Merge pull request #10753 from keszybz/pager-no-interrupt
Add mode in journalctl where ^C is handled by the pager
2018-11-14 20:09:39 +01:00
Zbigniew Jędrzejewski-Szmek 0221d68a13 basic/pager: convert the pager options to a flags argument
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
2018-11-14 16:25:11 +01:00
Zbigniew Jędrzejewski-Szmek bd897e729a nspawn: add a hint to the message we emit when a child dies
From #10526:

$ sudo systemd-nspawn -i image
Spawning container image on /home/zbyszek/src/mkosi/image.
Press ^] three times within 1s to kill container.
Short read while reading cgroup mode.
2018-11-13 11:58:44 +01:00
Lennart Poettering 1d78fea2d6 nspawn: rework how we allocate/kill scopes
Fixes: #6347
2018-11-09 17:08:59 +01:00
Lennart Poettering 11d81e506e nspawn: simplify machine terminate bus call
We have the machine name anyway, let's use TerminateMachine() on
machined's Manager object directly with it. That way it's a single
method call only, instead of two, to terminate the machine.
2018-11-09 17:08:59 +01:00
Lennart Poettering e5a2d8b5b5 nspawn: make use of the new sd_bus_set_close_on_exit() call in nspawn 2018-11-09 17:08:59 +01:00
Yu Watanabe 57512c893e tree-wide: set WRITE_STRING_FILE_DISABLE_BUFFER flag when we write files under /proc or /sys 2018-11-06 21:24:03 +09:00
Lennart Poettering 6619ad889d nspawn: beef up netns checking a bit, for compat with old kernels
Fixes: #10544
2018-10-31 21:42:45 +03:00
Lennart Poettering e2d39e549f nspawn: add proper error message if setns() on network namespace fd fails
Addresses: https://github.com/systemd/systemd/pull/10589#issuecomment-434670595
2018-10-31 18:07:30 +01:00
Jiuyang liu a2f577fca0 add ephemeral to nspawn-settings. 2018-10-24 10:22:20 +02:00
Zbigniew Jędrzejewski-Szmek 369ca6dab1 systemd-nspawn: do not crash on /var/log/journal creation if not required
When running a read-only file system, we might not be able to create
/var/log/journal. Do not fail on this, unless actually requested by the
--link-journal options.

$ systemd-nspawn --image=image.squashfs ...
2018-10-22 15:07:08 +02:00
Yu Watanabe b0b8c9a5a4
Merge pull request #10389 from poettering/nspawn-path-fix
nspawn $PATH execvpe() fix
2018-10-19 08:48:37 +09:00
Lennart Poettering 2ff48e981e tree-wide: introduce setsockopt_int() helper and make use of it everywhere
As suggested by @heftig:

6d5e65f645 (commitcomment-30938667)
2018-10-18 19:50:29 +02:00
Lennart Poettering b6b180b77b nspawn: use container $PATH (not host $PATH) when searching for PID 1 binaries to execute
Fixes: #10377
2018-10-18 16:40:12 +02:00
Lennart Poettering 271f518f35 nspawn: TAKE_FD() is your friend 2018-10-15 19:45:37 +02:00
Lennart Poettering fbda85b078 tree-wide: use sockaddr_un_unlink() at two more places where appropriate 2018-10-15 19:44:34 +02:00
Lennart Poettering 6d5e65f645 tree-wide: add a single version of "static const int one = 1"
All over the place we define local variables for the various sockopts
that take a bool-like "int" value. Sometimes they are const, sometimes
static, sometimes both, sometimes neither.

Let's clean this up, introduce a common const variable "const_int_one"
(as well as one matching "const_int_zero") and use it everywhere, all
acorss the codebase.
2018-10-15 19:40:51 +02:00
Lennart Poettering 44ed5214ad tree-wide: use structured initialization for sockaddr_un 2018-10-15 19:35:00 +02:00
David Tardon f369f47c26 be consistent about sun_path length
Most places use the whole buffer for name, without leaving extra space
for the trailing NUL.
2018-10-12 12:38:49 +02:00
Lennart Poettering b37469d7d1 nspawn: add comments explaining the namespacing situation and the inner/outer children 2018-10-09 10:52:17 +02:00
Lennart Poettering 1099ceebce nspawn: optionally don't mount a tmpfs over /tmp (#10294)
nspawn: optionally, don't mount a tmpfs on /tmp

Fixes: #10260
2018-10-08 18:32:03 +02:00
Lennart Poettering ff6c6cc117 nspawn: when --quiet is passed, simply downgrade log messages to LOG_DEBUG (#10181)
With this change almost all log messages that are suppressed through
--quiet are not actually suppressed anymore, but simply downgraded to
LOG_DEBUG. Previously we did it this way for some log messages and fully
suppressed them for others. With this it's pretty much systematic.

Inspired by #10122.
2018-09-26 23:40:39 +02:00
Yu Watanabe cf37f937ee nspawn: suppress one more log message when --quiet is passed
Fixes #10119.
2018-09-19 08:42:17 +02:00
afg 27b620b7db nspawn: use copy-static if systemd-resolved is up and image is writable 2018-09-12 20:48:21 +02:00
Yu Watanabe f55b0d3fd6 nspawn: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Zbigniew Jędrzejewski-Szmek 7692fed98b
Merge pull request #9783 from poettering/get-user-creds-flags
beef up get_user_creds() a bit and other improvements
2018-08-21 10:09:33 +02:00
Lennart Poettering 8967f29169 nspawn: add two missing OOM checks 2018-08-20 15:58:11 +02:00
Lennart Poettering 8dfce114ab nspawn: make sure to create /dev/char/x:y symlinks in nspawn containers too
On the host udev creates these, but they are useful API, hence create
them in nspawn containers too.
2018-08-20 15:58:11 +02:00
Lennart Poettering 37ec0fdd34 tree-wide: add clickable man page link to all --help texts
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.

I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
2018-08-20 11:33:04 +02:00
Luke Shumaker 2fa017f169 nspawn: Simplify tmpfs_patch_options() usage, and trickle that up
One of the things that tmpfs_patch_options does is take an (optional) UID,
and insert "uid=${UID},gid=${UID}" into the options string.  So we need a
uid_t argument, and a way of telling if we should use it.  Fortunately,
that is built in to the uid_t type by having UID_INVALID as a possible
value.

So this is really a feature that requires one argument.  Yet, it is somehow
taking 4!  That is absurd.  Simplify it to only take one argument, and have
that trickle all the way up to mount_all()'s usage.

Now, in may of the uses, the argument becomes

    uid_shift == 0 ? UID_INVALID : uid_shift

because it used to treat uid_shift=0 as invalid unless the patch_ids flag
was also set.  This keeps the behavior the same.  Note that in all cases
where it is invoked, if !use_userns (sometimes called !userns), then
uid_shift is 0; we don't have to add any checks for that.

That said, I'm pretty sure that "uid=0" and not setting "uid=" are the
same, but Christian Brauner seemed to not think so when implementing the
cgns support.  https://github.com/systemd/systemd/pull/3589
2018-07-20 12:12:02 -04:00
Lennart Poettering a7e2e50d35 summary: update nspawn description string a bit
nspawn as it is now is a generally useful tool, hence let's drop the
comments about it being useful for debug and so on only.

The new wording just makes the first sentence of the main page also the
summary.
2018-06-28 11:55:44 +09:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering df1fac6dea nspawn: free global variables before exiting
This doesn't really matter much, but is prettier for valgrind
2018-06-13 17:51:40 +02:00
Lennart Poettering b8b846d7b4 tree-wide: fix a number of log calls that use %m but have no errno set
This is mostly fall-out from d1a1f0aaf0,
however some cases are older bugs.

There might be more issues lurking, this was a simple grep for "%m"
across the tree, with all lines removed that mention "errno" at all.
2018-06-07 15:29:17 +02:00
Lennart Poettering 669fc4e5c5 tree-wide: some O_NDELAY → O_NONBLOCK fixes
Somehow the coccinelle script misses these, hence fix them manually.
2018-05-31 12:04:39 +02:00
Zbigniew Jędrzejewski-Szmek 83e803a9ef nspawn: reset umask early
Fixes #8911.
2018-05-28 11:01:43 +02:00
Zbigniew Jędrzejewski-Szmek 667c1baff5 nspawn: remove some vertical whitespace
Sometimes an empty line is good for readability, but here I think
they all can be removed without any loss.
2018-05-28 11:01:43 +02:00
Lennart Poettering 3a6ce860ac machine-image: rework error handling
Let's rework error handling a bit in image_find() and friends: when we
can't find an image, return -ENOENT rather than 0. That's better as
before we violated the usual rule in our codebase that return parameters
are initialized when the return value is >= 0 and otherwise not touched.

This also makes enumeration and validation a bit more strict: we'll only
accept ".raw" as suffix for regular files, and filter out this suffix
handling on directories/subvolumes, where it makes no sense.
2018-05-24 17:01:57 +02:00
Lennart Poettering 5ef46e5f65 machine-image: introduce two different classes of images
This distuingishes two different classes of images, one for the purpose
of npsawn-like containers, i.e. "machines", and one for portable
services.

This distinction is mostly about search paths. We look for machine
images in /var/lib/machines and for portable images in
/var/lib/portables.
2018-05-24 17:01:57 +02:00
Lennart Poettering d58ad743f9 os-util: add helpers for finding /etc/os-release
Place this new helpers in a new source file os-util.[ch], and move the
existing and related call path_is_os_tree() to it as well.
2018-05-24 17:01:57 +02:00
Lennart Poettering 03bcb6d408 dissect: optionally, validate that the image we dissect is a valid OS image
We already do this kind of validation in nspawn when we operate on a
plain directory, let's also do this on raw images under the same
condition: that we are about too boot the image. Also, do this when we
are about to read OS metadata from it.
2018-05-24 17:01:57 +02:00
Zbigniew Jędrzejewski-Szmek 17c1b9a93f
Merge pull request #9024 from poettering/nspawn-attrs-more
make even more nspawn concepts configurable
2018-05-24 16:27:27 +02:00
Zbigniew Jędrzejewski-Szmek 7cd92e2e9d
Merge pull request #9068 from poettering/nspawn-pty-deadlock
nspawn logging deadlock fix
2018-05-24 16:25:22 +02:00
Lennart Poettering 17cac366ae nspawn: make sure our container PID 1 keeps logging to the original stderr as long as possible
If we log to the pty that is configured as stdin/stdout/stderr of the
container too early we risk filling it up in full before we start
processing the pty from the parent process, resulting in deadlocks.
Let's hence keep a copy of the original tty we were started on before
setting up stdin/stdout/stderr, so that we can log to it, and keep using
it as long as we can.

Since the kernel's pty internal buffer is pretty small this actually
triggered deadlocks when we debug logged at lot from nspawn's child
processes, see: https://github.com/systemd/systemd/pull/9024#issuecomment-390403674

With this change we won't use the pty at all, only the actual payload we
start will, and hence we won't deadlock on it, ever.
2018-05-22 16:52:50 +02:00
Lennart Poettering 8ca082b49a nspawn: make use of log_set_open_when_needed() in nspawn too
Let's make use of log_set_open_when_needed() in nspawn too, i.e. at the
point where we close logging because we are about to rearrange fds,
let's automatically reopen the logging fds when we need them, the same
way as we do that in the service manager. This makes things simpler and
more robust.
2018-05-22 16:51:28 +02:00
Lennart Poettering 1688841f46 nspawn: similar to the previous patches, also make /etc/localtime handling more configurable
Fixes: #9009
2018-05-22 16:21:26 +02:00
Lennart Poettering 63d1c29ffa nspawn: complain if people still use --share-system 2018-05-22 16:20:08 +02:00
Lennart Poettering 4e1d6aa983 nspawn: make --link-journal= configurable through .nspawn files, too 2018-05-22 16:20:08 +02:00
Lennart Poettering b8ea7a6e12 nspawn: add a bit of debug logging to resolved_listening() 2018-05-22 16:19:26 +02:00
Lennart Poettering 09d423e921 nspawn: add greater control over how /etc/resolv.conf is handled
Fixes: #8014 #1781
2018-05-22 16:19:26 +02:00
Lennart Poettering a5201ed6ce tree-wide: fix a couple of TABs 2018-05-22 16:13:45 +02:00
Arnaud Rebillout c9fe05e07d nspawn: support pivot-root option during directory validation
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
2018-05-22 14:42:10 +02:00
Lennart Poettering 5c828e66b5 tree-wide: port various bits of the tree over to the new DUMP_STRING_TABLE() macro 2018-05-22 13:14:18 +02:00
Lennart Poettering 919f5ae0c7 nspawn: voidify more things 2018-05-17 20:48:55 +02:00
Lennart Poettering 5d9614077d nspawn: split out merging of settings object
Let's separate the loading of the settings object and the merging into
our arg_xyz fields into two.

This will become particularly useful when we eventually are able to load
settings from OCI runtime files in addition to .nspawn files.
2018-05-17 20:48:55 +02:00
Lennart Poettering d107bb7d63 nspawn: add a new --cpu-affinity= switch
Similar as the other options added before, this is primarily useful to
provide comprehensive OCI runtime compatbility, but might be useful
otherwise, too.
2018-05-17 20:48:54 +02:00
Lennart Poettering 50ebcf6cb7 nspawn: show --help text in a pager
The text is long enough now, and we do auto-paging for systemctl
already, hence let's do it here too.
2018-05-17 20:48:13 +02:00
Lennart Poettering 81f345dfed nspawn: add a new --oom-score-adjust= command line switch
This is primarily useful in order to provide comprehensive OCI runtime
compatibility with nspawn, but might have uses outside of it.
2018-05-17 20:48:12 +02:00
Lennart Poettering c818eef1cd nspawn: properly handle and log about hostname setting errors 2018-05-17 20:47:21 +02:00
Lennart Poettering 66edd96310 nspawn: add a new --no-new-privileges= cmdline option to nspawn
This simply controls the PR_SET_NO_NEW_PRIVS flag for the container.
This too is primarily relevant to provide OCI runtime compaitiblity, but
might have other uses too, in particular as it nicely complements the
existing --capability= and --drop-capability= flags.
2018-05-17 20:47:20 +02:00
Lennart Poettering 3a9530e5f1 nspawn: make the hostname of the container explicitly configurable with a new --hostname= switch
Previously, the container's hostname was exclusively initialized from
the machine name configured with --machine=, i.e. the internal name and
the external name used for and by the container was synchronized. This
adds a new option --hostname= that optionally allows the internal name
to deviate from the external name.

This new option is mainly useful to ultimately implement the OCI runtime
spec directly in nspawn, but it might be useful on its own for some
other usecases too.
2018-05-17 20:46:45 +02:00
Lennart Poettering bf428efb07 nspawn: add new --rlimit= switch, and always set resource limits explicitly for our container payloads
This ensures we set the various resource limits of our container
explicitly on each invocation so that we inherit less from our callers
into the payload.

By default resource limits are now set to the same values Linux
generally passes to the host PID 1, thus minimizing needless differences
between host and container environments.

The limits are now also configurable using a new --rlimit= switch. This
is preparation for teaching nspawn native OCI runtime support as OCI
permits setting resource limits for container payloads, and it hence
probably makes sense if we do too.
2018-05-17 20:45:54 +02:00
Yu Watanabe 130d3d22e9 tree-wide: use strv_free_and_replace() macro 2018-05-10 00:57:34 +09:00
Lennart Poettering 720f0a2f3c nspawn: move nspawn cgroup hierarchy one level down unconditionally
We need to do this in all cases, including on cgroupsv1 in order to
ensure the host systemd and any systemd in the payload won't fight for
the cgroup attributes of the top-level cgroup of the payload.

This is because systemd for Delegate=yes units will only delegate the
right to create children as well as their attributes. However, nspawn
expects that the cgroup delegated covers both the right to create
children and the attributes of the cgroup itself. Hence, to clear this
up, let's unconditionally insert a intermediary cgroup, on cgroupsv1 as
well as cgroupsv2, unconditionally.

This is also nice as it reduces the differences in the various setups
and exposes very close behaviour everywhere.
2018-05-03 17:45:42 +02:00
Lennart Poettering 9ec5a93c98 nspawn: don't make /proc/kmsg node too special
Similar to the previous commit, let's just use our regular calls for
managing temporary nodes take care of this.
2018-05-03 17:45:42 +02:00
Lennart Poettering cdde6ba6b6 nspawn: mount boot ID from temporary file in /tmp
Let's not make /run too special and let's make sure the source file is
not guessable: let's use our regular temporary file helper calls to
create the source node.
2018-05-03 17:45:42 +02:00
Lennart Poettering 88614c8a28 nspawn: size_t more stuff
A follow-up for #8840
2018-05-03 17:19:46 +02:00
Yu Watanabe 29a3db75fd util: rename signal_from_string_try_harder() to signal_from_string()
Also this makes the new `signal_from_string()` function reject
e.g, `SIG3` or `SIG+5`.
2018-05-03 16:52:49 +09:00
Yu Watanabe 1e4f1671c2 nspawn: fix warning by -Wnonnull (#8877) 2018-05-02 10:03:31 +02:00
Lennart Poettering 8e766630f0 tree-wide: drop redundant _cleanup_ macros (#8810)
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.

In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.

Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.

Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.

Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
2018-04-25 12:31:45 +02:00
Lennart Poettering 0c300adfa4 nspawn: when running nspawn, set a $PATH including both bin + sbin by default (#8756)
We don't know what the container payload needs, hence default to a PATH
with both bin and sbin included, as well as / and /usr.

Follow-up for #8324

Fixes: #8698
2018-04-20 11:36:25 +02:00
Lennart Poettering 5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Philip Sequeira 7511655807 nspawn: wait for network namespace creation before interface setup (#8633)
Otherwise, network interfaces can be "moved" into the container's
namespace while it's still the same as the host namespace, in which case
e.g. host0 for a veth ends up on the host side instead of inside the
container.

Regression introduced in 0441378080.

Fixes #8599.
2018-04-05 07:04:27 -07:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Lennart Poettering 4526113f57 dissect: add dissect_image_and_warn() that unifies error message generation for dissect_image() (#8517) 2018-03-21 12:10:01 +01:00
Zbigniew Jędrzejewski-Szmek 0441378080 nspawn: move network namespace creation to a separate step (#8430)
Fixes #8427.

Unsharing the namespace in a separate step changes the ownership of
/proc/net/ip_tables_names (and related files) from nobody:nobody to
root:root. See [1] and [2] for all the details.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f13f2aeed154da8e48f90b85e720f8ba39b1e881
[2] https://bugzilla.netfilter.org/show_bug.cgi?id=1064#c9
2018-03-20 18:07:17 +01:00
Lennart Poettering 2b33ab0957 tree-wide: port various places over to use new rearrange_stdio() 2018-03-02 11:42:10 +01:00
Zbigniew Jędrzejewski-Szmek 8405dcf752 nspawn: make sure we don't leak the fd in chase_symlinks_and_update
No callers use CHASE_OPEN right now, but let's be defensive.
2018-02-15 10:18:25 +01:00
Lennart Poettering d72495759b tree-wide: port all code to use safe_getcwd() 2018-01-17 11:17:38 +01:00
Lennart Poettering 75152a4d6a tree-wide: install matches asynchronously
Let's remove a number of synchronization points from our service
startups: let's drop synchronous match installation, and let's opt for
asynchronous instead.

Also, let's use sd_bus_match_signal() instead of sd_bus_add_match()
where we can.
2018-01-05 13:58:32 +01:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Zbigniew Jędrzejewski-Szmek dae8b82eb9 Add mkdir_errno_wrapper() and use instead of mkdir() in various places
We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir
returns the error in errno while mkdir_label returns the error directly.
2017-12-16 13:28:22 +01:00
Zbigniew Jędrzejewski-Szmek bdd2bbc445
Merge pull request #7469 from kinvolk/dongsu/nspawn-netns
nspawn: introduce an option for specifying network namespace path
2017-12-14 22:47:57 +01:00
Lennart Poettering fbd0b64f44
tree-wide: make use of new STRLEN() macro everywhere (#7639)
Let's employ coccinelle to do this for us.

Follow-up for #7625.
2017-12-14 19:02:29 +01:00
Dongsu Park d7bea6b629 nspawn: introduce an option for specifying network namespace path
Add a new option `--network-namespace-path` to systemd-nspawn to allow
users to specify an arbitrary network namespace, e.g. `/run/netns/foo`.
Then systemd-nspawn will open the netns file, pass the fd to
outer_child, and enter the namespace represented by the fd before
running inner_child.

```
$ sudo ip netns add foo
$ mount | grep /run/netns/foo
nsfs on /run/netns/foo type nsfs (rw)
...
$ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \
  /bin/readlink -f /proc/self/ns/net
/proc/1/ns/net:[4026532009]
```

Note that the option `--network-namespace-path=` cannot be used together
with other network-related options such as `--private-network` so that
the options do not conflict with each other.

Fixes https://github.com/systemd/systemd/issues/7361
2017-12-13 10:21:06 +00:00
Lennart Poettering fba868fa71 tree-wide: unify logging of "Must be root" message
Let's unify this in one call, generalizing must_be_root() from
bootctl.c.
2017-12-11 23:19:45 +01:00
Lennart Poettering 8fd010bb1b nspawn: turn on watchdog logic for nspawn too
It's a long-running daemon, and it's easy to enable, hence do it.
2017-12-07 12:34:46 +01:00
Lennart Poettering 87d5e4f286 build-sys: make the dynamic UID range, and the container UID range configurable
Also, export these ranges in our pkg-config files.
2017-12-06 12:55:37 +01:00
Lennart Poettering de54e02d5e nspawn: when in hybrid mode, chown() both the legacy and the unified hierarchy to the root in the container
If user namespacing is used, let's make sure that the root user in the
container gets access to both /sys/fs/cgroup/systemd and
/sys/fs/cgroup/unified.

This matches similar logic in cg_set_access().
2017-12-05 13:49:13 +01:00
Lennart Poettering 2d3a5a73e0 nspawn: make sure images containing an ESP are compatible with userns -U mode
In -U mode we might need to re-chown() all files and directories to
match the UID shift we want for the image. That's problematic on fat
partitions, such as the ESP (and which is generated by mkosi's
--bootable switch), because fat of course knows no UID/GID file
ownership natively.

With this change we take benefit of the uid= and gid= mount options FAT
knows: instead of chown()ing all files and directories we can just
specify the right UID/GID to use at mount time.

This beefs up the image dissection logic in two ways:

1. First of all support for mounting relevant file systems with
   uid=/gid= is added: when a UID is specified during mount it is used for
   all applicable file systems.

2. Secondly, two new mount flags are added:
   DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY.
   If one is specified the mount routine will either only mount the root
   partition of an image, or all partitions except the root partition.
   This is used by nspawn: first the root partition is mounted, so that
   we can determine the UID shift in use so far, based on ownership of
   the image's root directory. Then, we mount the remaining partitions
   in a second go, this time with the right UID/GID information.
2017-12-05 13:49:12 +01:00
Lennart Poettering 8199d554c1 nspawn: figure out cgroup mode *after* mounting image
If we operate on a disk image (i.e. --image=) then it's pointless to
look into the mount directory before it is actually mounted to see which
systemd version is running inside...

Unfortunately we only mount the disk image in the child process, but the
parent needs to know the cgroup mode, hence add some IPC for this
purpose and communicate the cgroup mode determined from the image back
to the parent.
2017-12-05 13:49:12 +01:00
Yu Watanabe 62b1e758d3 nspawn: adjust path to static resolv.conf to support split usr
Fixes #7302.
2017-11-25 21:11:07 +09:00
Lennart Poettering d381c8a6bf nspawn: hash the machine name, when looking for a suitable UID base (#7437)
When "-U" is used we look for a UID range we can use for our container.
We start with the UID the tree is already assigned to, and if that
didn't work we'd pick random ranges so far. With this change we'll first
try to hash a suitable range from the container name, and use that if it
works, in order to make UID assignments more likely to be stable.

This follows a similar logic PID 1 follows when using DynamicUser=1.
2017-11-24 20:57:19 +01:00