Commit graph

1115 commits

Author SHA1 Message Date
Lennart Poettering e4077ff6f3 nspawn: don't free "fds" twice
Previously both run() and run_container() would free 'fds'. Let's fix
that, and let run() free it but make run_container() already remove all
fds from it, because that's what we actually want to do.

Fixes: #12073
2019-03-22 18:11:27 +01:00
Zbigniew Jędrzejewski-Szmek b2645747b7 nspawn-oci: fix double free
Also rename function to make it clear that it also frees the array
object itself.
2019-03-22 17:39:12 +01:00
Zbigniew Jędrzejewski-Szmek 094eecd29d
Merge pull request #12055 from poettering/save-argc-argv
main-func.h and systemctl argc/argv improvements
2019-03-22 16:58:18 +01:00
Zbigniew Jędrzejewski-Szmek b1f13b0e75 nspawn-oci: mount source is optional 2019-03-22 12:04:32 +01:00
Zbigniew Jędrzejewski-Szmek b2e07b1a02 nspawn-oci: use _cleanup_ in one more place 2019-03-22 11:51:21 +01:00
Lennart Poettering ae408d77a9 nspawn: conditionalize libseccomp use
We support compilation without libseccomp, hence don't rely on its
symbols.
2019-03-22 11:07:03 +01:00
Lennart Poettering 60ffa37a65 main-func: implicitly save argc/argv in DEFINE_MAIN_FUNCTION() functions
Let's remove the risk of forgetting to save argc/argv if
DEFINE_MAIN_FUNCTION() is used.
2019-03-21 18:10:06 +01:00
Lennart Poettering 36fea15565 util: introduce save_argc_argv() helper 2019-03-21 18:08:56 +01:00
Lennart Poettering c82cfae00b
Merge pull request #12062 from poettering/nspawn-main-func
nspawn: port to DEFINE_MAIN_FUNCTION()
2019-03-21 18:08:27 +01:00
Zbigniew Jędrzejewski-Szmek bb068de080 nspawn: add --no-pager switch
It only matters for --help.
2019-03-21 17:42:43 +01:00
Lennart Poettering 04f590a4a4 nspawn: voidify sd_notify() calls 2019-03-21 16:32:46 +01:00
Lennart Poettering 6145bb4f78 nspawn: port to static destructors 2019-03-21 16:32:46 +01:00
Lennart Poettering 44dbef90f1 nspawn: port to main-func.h logic 2019-03-21 16:32:46 +01:00
Zbigniew Jędrzejewski-Szmek fa28e4e377
Merge pull request #12059 from poettering/nspawn-typos
some typo and other fixes result of the OCI nspawn merge
2019-03-21 15:14:11 +01:00
Lennart Poettering c3d13d2ad5
Merge pull request #12058 from keszybz/oci-simplifications
Follow-ups for nspawn-oci review
2019-03-21 13:55:09 +01:00
Lennart Poettering f4e803c809 nspawn: add a few missing flags from --help text 2019-03-21 13:31:09 +01:00
Lennart Poettering 2514865391 nspawn: reorder --help text, and add section
The list is so long, let's add a bit of structure and order things a
bit.
2019-03-21 13:27:19 +01:00
Lennart Poettering 2c9b7a7e62 mount: when we fail to establish an inaccessible mount gracefully, undo the mount 2019-03-21 12:41:02 +01:00
Zbigniew Jędrzejewski-Szmek 6757a01356 util-lib: get rid of a helper variable 2019-03-21 11:08:58 +01:00
Zbigniew Jędrzejewski-Szmek f1531db5af nspawn-oci: add helper function for free_and_strdup with oom check 2019-03-21 11:08:58 +01:00
Zbigniew Jędrzejewski-Szmek d0b6a10c00
Merge pull request #9762 from poettering/nspawn-oci
OCI runtime support for nspawn
2019-03-21 11:01:53 +01:00
Zbigniew Jędrzejewski-Szmek 19130626a0 nspawn-oci: use SYNTHETIC_ERRNO 2019-03-21 10:51:43 +01:00
Topi Miettinen ebcf697685 tree-wide: fix false search hits with ppp (typos) 2019-03-18 14:25:56 +01:00
Lennart Poettering 95658673a0
Merge pull request #12016 from yuwata/fix-two-memleaks-found-by-oss-fuzz
Fix two memleaks found by oss fuzz
2019-03-15 17:33:48 +01:00
Yu Watanabe 1d0c1146ea nspawn: fix memleak
Fixes oss-fuzz#13691.
2019-03-15 23:53:05 +09:00
Zbigniew Jędrzejewski-Szmek 7acf581a58 Handle or voidify all calls to close_all_fds()
In activate, it is important that we close the fds. In other cases, meh.
2019-03-15 15:46:41 +01:00
Lennart Poettering a3fc6b55ac nspawn: mask out CAP_NET_ADMIN again if settings file turns off private networking
Fixes: #11755
2019-03-15 15:42:21 +01:00
Lennart Poettering bd4b15f274 nspawn: use right constant for shifting for uint64_t caps 2019-03-15 15:42:20 +01:00
Lennart Poettering de40a3037a nspawn: add support for executing OCI runtime bundles with nspawn
This is a pretty large patch, and adds support for OCI runtime bundles
to nspawn. A new switch --oci-bundle= is added that takes a path to an
OCI bundle. The JSON file included therein is read similar to a .nspawn
settings files, however with a different feature set.

Implementation-wise this mostly extends the pre-existing Settings object
to carry additional properties for OCI. However, OCI supports some
concepts .nspawn files did not support yet, which this patch also adds:

1. Support for "masking" files and directories. This functionatly is now
   also available via the new --inaccesible= cmdline command, and
   Inaccessible= in .nspawn files.

2. Support for mounting arbitrary file systems. (not exposed through
   nspawn cmdline nor .nspawn files, because probably not a good idea)

3. Ability to configure the console settings for a container. This
   functionality is now also available on the nspawn cmdline in the new
   --console= switch (not added to .nspawn for now, as it is something
   specific to the invocation really, not a property of the container)

4. Console width/height configuration. Not exposed through
   .nspawn/cmdline, but this may be controlled through $COLUMNS and
   $LINES like in most other UNIX tools.

5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on
   the cmdline, since containers likely have different user tables, and
   the existing --user= switch appears to be the better option)

6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to
   OCI)

7. Creation of additional devices nodes in /dev. Most likely not a good
   idea, hence not exposed in .nspawn/cmdline. There's already --bind=
   to achieve the same, which is the better alternative.

8. Explicit syscall filters. This is not a good idea, due to the skewed
   arch support, hence not exposed through .nspawn/cmdline.

9. Configuration of some sysctls on a whitelist. Questionnable, not
   supported in .nspawn/cmdline for now.

10. Configuration of all 5 types of capabilities. Not a useful concept,
    since the kernel will reduce the caps on execve() anyway. Not
    exposed through .nspawn/cmdline as this is not very useful hence.

Note that this only implements the OCI runtime logic itself. It does not
provide a runc-compatible command line tool. This is left for a later
PR. Only with that in place tools such as "buildah" can use the OCI
support in nspawn as drop-in replacement.

Currently still missing is OCI hook support, but it's already parsed and
everything, and should be easy to add. Other than that it's OCI is
implemented pretty comprehensively.

There's a list of incompatibilities in the nspawn-oci.c file. In a later
PR I'd like to convert this into proper markdown and add it to the
documentation directory.
2019-03-15 15:41:28 +01:00
Lennart Poettering 5ef4cb7ad0 nspawn: (void)ify more stuff 2019-03-15 15:33:09 +01:00
Lennart Poettering 61b4443361 nspawn: refactor setuid code a bit
Let's separate out the raw uid_t/gid_t handling from the username
handling. This is useful later on.

Also, let's use the right gid_t type for group types wherever
appropriate.
2019-03-15 15:33:09 +01:00
Lennart Poettering d8b4d14df4 util: split out nulstr related stuff to nulstr-util.[ch] 2019-03-14 13:25:52 +01:00
Lennart Poettering e45c81b8bc shared: split out code to wait for jobs to complet into its own source file
It's complex enough and quite a few functions. Let's hence split this
out.

No code change, just some rearranging of source files.
2019-03-13 17:39:24 +01:00
Lennart Poettering 760877e90c util: split out sorting related calls to new sort-util.[ch] 2019-03-13 12:16:43 +01:00
Lennart Poettering 0cb8e3d118 util: split out namespace related stuff into a new namespace-util.[ch] pair
Just some minor reorganiztion.
2019-03-13 12:16:38 +01:00
Zbigniew Jędrzejewski-Szmek 0e636bf51a nspawn: fix memleak uncovered by fuzzer
Also use TAKE_PTR as appropriate.
2019-03-11 14:29:30 +01:00
Lennart Poettering 27da7ef0d0 nspawn: move payload to sub-cgroup first, then sync cgroup trees
if we sync the legacy and unified trees before moving to the right
subcgroup then ultimately the cgroup paths in the hierarchies will be
out-of-sync... Hence, let's move the payload first, and sync then.

Addresses: https://github.com/systemd/systemd/pull/9762#issuecomment-441187979
2019-03-07 11:26:17 +01:00
Lennart Poettering adc6f43b14 copy: don't synthesize a 'user.crtime_usec' xattr on copy unless explicitly requested
Previously, when we'd copy an individual file we'd synthesize a
user.crtime_usec xattr with the source's creation time if we can
determine it. As the creation/birth time was until recently not
queriable form userspace this effectively just propagated the same xattr
on the source to the same xattr on the destination. However, current
kernels now allow to query the birthtime using statx() and we do make
use of that now. Which means that suddenly we started synthesizing these
xattrs much more regularly.

Doing this actually does make sense, but only in very few cases:
not for the typical regular files we copy, but certainly when dealing
with disk images. Hence, let's keep this kind of propagation, but let's
make it a flag and default to off. Then turn it on whenever we deal with
disk images, and leave it off otherwise.

This is particularly relevant as overlayfs combining a real fs, and a
tmpfs on top will result in EOPNOTSUPP when it is attempted to open a
file with xattrs for writing, as tmpfs does not support xattrs, and
hence the copy-up cannot work. Hence, let's avoid synthesizing this
needlessly, to increase compat with overlayfs.
2019-03-01 14:11:07 +01:00
Lennart Poettering e5a4bb0d4e nspawn: rework how arg_read_only is initialized in --volatile= mode
Previously, we'd refuse the combination, and claimed we'd imply it, but
actually didn't. Let's allow the combination and imply read-only from
--volatile=, because that's what's documented, what we claim we do, and
what makes sense.
2019-03-01 14:11:07 +01:00
Lennart Poettering 83205269c0 nspawn: refactor how we determine whether it's OK to write to /etc 2019-03-01 14:11:07 +01:00
Lennart Poettering e50cd82f68 nspawn: no need to make top-level directory a bind mount if we just dissected an image 2019-03-01 14:11:07 +01:00
Lennart Poettering 7d0ecdd62d nspawn: slightly reorder mount logic
Let's first setup the volatile logic, and only then mount secondary
partitions of the image in.
2019-03-01 14:11:07 +01:00
Lennart Poettering 6c610acaaa nspawn: add --volatile=overlay support
Fixes: #11054 #3847
2019-03-01 14:11:06 +01:00
Lennart Poettering c55d0ae764 nspawn: fix an error path 2019-03-01 14:11:06 +01:00
Lennart Poettering e5b43a04b6 nspawn: add volatile mode multiplexer call setup_volatile_mode()
Just some refactoring, no change in behaviour.
2019-03-01 14:11:06 +01:00
Lennart Poettering 0646d3c3dd nspawn: explicitly refuse mounts over /
Previously this would fail later on, but let's filter this out at the
time of parsing.
2019-03-01 14:11:06 +01:00
Lennart Poettering 6e9417f5b4 tree-wide: use newa() instead of alloca() wherever we can
Typesafety is nice. And this way we can take benefit of the new size
assert() the previous commit added.
2019-01-26 16:17:04 +01:00
Lennart Poettering 2949ff2691 nspawn: ignore SIGPIPE for nspawn itself
Let's not abort due to a dead stdout.

Fixes: #11533
2019-01-26 13:54:44 +01:00
Lennart Poettering b2238e380e test,systemctl,nspawn: use "const char*" instead of "char*" as iterator for FOREACH_STRING()
The macro iterates through literal strings (i.e. constant strings),
hence it's more correct to have the iterator const too.
2019-01-16 12:29:30 +01:00
Chris Down e92aaed30e tree-wide: Remove O_CLOEXEC from fdopen
fdopen doesn't accept "e", it's ignored. Let's not mislead people into
believing that it actually sets O_CLOEXEC.

From `man 3 fdopen`:

> e (since glibc 2.7):
> Open the file with the O_CLOEXEC flag. See open(2) for more information. This flag is ignored for fdopen()

As mentioned by @jlebon in #11131.
2018-12-12 20:47:40 +01:00
Zbigniew Jędrzejewski-Szmek 489fae526d nspawn: check cg_ns_supported() just once
cg_ns_supported() caches, so the condition was really checked just once, but
it looks weird to assign the return value to arg_use_cgns (if the variable is not present),
because then the other checks are effectively equivalent to
  if (cg_ns_supported() && cg_ns_supported()) { ...
and later
  if (!cg_ns_supported() || !cg_ns_supported()) { ...
2018-12-11 13:37:41 +00:00
Lennart Poettering 60f1ec13ed nspawn: move most validation checks and configuration mangling into verify_arguments()
That's what the function is for after all, and only if it's done there
we can verify the effect of .nspawn files correctly too: after all we
should not just validate that everything configured on the command line
makes sense, but the stuff configured in the .nspawn files, too.
2018-12-10 12:54:56 +01:00
Lennart Poettering d5455d2f98 nspawn: split out code parsing env vars into a function of its own
This then let's us to ensure it's called after we parsed the cmdline,
and after we loaded the settings file, so that it these env var settings
override everything loaded from there.
2018-12-10 12:54:56 +01:00
Lennart Poettering 5eee829043 nspawn: move cg_unified_flush() invocation out of parse_argv()
It has nothing to do with argument parsing, and hence shouldn't be
there.
2018-12-10 12:54:56 +01:00
Zbigniew Jędrzejewski-Szmek 871fa294ff Merge pull request #10935 from poettering/rlimit-nofile-safe
Merged by hand to resolve a trivial conflict in TODO.
2018-12-06 17:19:21 +01:00
Yu Watanabe e93672eeac tree-wide: drop missing.h from headers and use relevant missing_*.h 2018-12-06 13:31:16 +01:00
Yu Watanabe 204f52e32d lockfile: drop unnecessary headers from lockfile-util.h 2018-12-06 13:31:16 +01:00
Yu Watanabe 503f480f8e missing: move fs or mount related definitions to missing_fs.h
This also fixes errnous definition MS_REC -> MS_SLAVE.
2018-12-06 13:30:43 +01:00
Yu Watanabe 36dd5ffd5d util: drop missing.h from util.h 2018-12-04 10:00:34 +01:00
Lennart Poettering e4de72876e util-lib: split out all temporary file related calls into tmpfiles-util.c
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.

No code changes, just some rearranging of source files.
2018-12-02 13:22:29 +01:00
Lennart Poettering 5dd9527883 tree-wide: remove various unused functions
All found with "cppcheck --enable=unusedFunction".
2018-12-02 13:35:34 +09:00
Lennart Poettering 595225af7a tree-wide: invoke rlimit_nofile_safe() before various exec{v,ve,l}() invocations
Whenever we invoke external, foreign code from code that has
RLIMIT_NOFILE's soft limit bumped to high values, revert it to 1024
first. This is a safety precaution for compatibility with programs using
select() which cannot operate with fds > 1024.

This commit adds the call to rlimit_nofile_safe() to all invocations of
exec{v,ve,l}() and friends that either are in code that we know runs
with RLIMIT_NOFILE bumped up (which is PID 1 and all journal code for
starters) or that is part of shared code that might end up there.

The calls are placed as early as we can in processes invoking a flavour
of execve(), but after the last time we do fd manipulations, so that we
can still take benefit of the high fd limits for that.
2018-12-01 12:50:45 +01:00
Zbigniew Jędrzejewski-Szmek b2ac2b01c8
Merge pull request #10996 from poettering/oci-prep
Preparation for the nspawn-OCI work
2018-11-30 10:09:00 +01:00
Zbigniew Jędrzejewski-Szmek 049af8ad0c Split out part of mount-util.c into mountpoint-util.c
The idea is that anything which is related to actually manipulating mounts is
in mount-util.c, but functions for mountpoint introspection are moved to the
new file. Anything which requires libmount must be in mount-util.c.

This was supposed to be a preparation for further changes, with no functional
difference, but it results in a significant change in linkage:

$ ldd build/libnss_*.so.2
(before)
build/libnss_myhostname.so.2:
	linux-vdso.so.1 (0x00007fff77bf5000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000)
build/libnss_mymachines.so.2:
	linux-vdso.so.1 (0x00007ffc19cc0000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000)
build/libnss_resolve.so.2:
	linux-vdso.so.1 (0x00007ffe8ef8e000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000)
build/libnss_systemd.so.2:
	linux-vdso.so.1 (0x00007ffda6d17000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f610b995000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000)

(after)
build/libnss_myhostname.so.2:
	linux-vdso.so.1 (0x00007fff0b5e2000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000)
build/libnss_mymachines.so.2:
	linux-vdso.so.1 (0x00007ffdc30a7000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000)
build/libnss_resolve.so.2:
	linux-vdso.so.1 (0x00007ffe95747000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000)
build/libnss_systemd.so.2:
	linux-vdso.so.1 (0x00007ffe67b51000)
	librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000)

I don't quite understand what is going on here, but let's not be too picky.
2018-11-29 21:03:44 +01:00
Lennart Poettering 17c58ba97b nspawn: let's also pre-mount /dev/mqueue 2018-11-29 20:21:40 +01:00
Yu Watanabe acf4d15893 util: make *_from_name() returns negative errno on error 2018-11-28 20:20:50 +09:00
Yu Watanabe 938dbb292a
Merge pull request #10901 from poettering/startswith-list
add new STARTSWITH_SET() macro
2018-11-26 22:40:51 +09:00
Lennart Poettering da9fc98ded tree-wide: port more code over to PATH_STARTSWITH_SET() 2018-11-26 14:08:46 +01:00
Lennart Poettering 27adcc9737 cgroup: be more careful with which controllers we can enable/disable on a cgroup
This changes cg_enable_everywhere() to return which controllers are
enabled for the specified cgroup. This information is then used to
correctly track the enablement mask currently in effect for a unit.
Moreover, when we try to turn off a controller, and this works, then
this is indicates that the parent unit might succesfully turn it off
now, too as our unit might have kept it busy.

So far, when realizing cgroups, i.e. when syncing up the kernel
representation of relevant cgroups with our own idea we would strictly
work from the root to the leaves. This is generally a good approach, as
when controllers are enabled this has to happen in root-to-leaves order.
However, when controllers are disabled this has to happen in the
opposite order: in leaves-to-root order (this is because controllers can
only be enabled in a child if it is already enabled in the parent, and
if it shall be disabled in the parent then it has to be disabled in the
child first, otherwise it is considered busy when it is attempted to
remove it in the parent).

To make things complicated when invalidating a unit's cgroup membershup
systemd can actually turn off some controllers previously turned on at
the very same time as it turns on other controllers previously turned
off. In such a case we have to work up leaves-to-root *and*
root-to-leaves right after each other. With this patch this is
implemented: we still generally operate root-to-leaves, but as soon as
we noticed we successfully turned off a controller previously turned on
for a cgroup we'll re-enqueue the cgroup realization for all parents of
a unit, thus implementing leaves-to-root where necessary.
2018-11-23 13:41:37 +01:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Lennart Poettering 818623aca5
Merge pull request #10860 from keszybz/more-cleanup-2
Do more stuff from main macros
2018-11-21 11:07:31 +01:00
Zbigniew Jędrzejewski-Szmek 294bf0c34a Split out pretty-print.c and move pager.c and main-func.h to shared/
This is high-level functionality, and fits better in shared/ (which is for
our executables), than in basic/ (which is also for libraries).
2018-11-20 18:40:02 +01:00
Lennart Poettering f2fb2ec942 nspawn: use EXIT_EXCEPTION where appropriate 2018-11-20 17:04:07 +01:00
Lennart Poettering 042cad5737
Merge pull request #10753 from keszybz/pager-no-interrupt
Add mode in journalctl where ^C is handled by the pager
2018-11-14 20:09:39 +01:00
Zbigniew Jędrzejewski-Szmek 0221d68a13 basic/pager: convert the pager options to a flags argument
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
2018-11-14 16:25:11 +01:00
Zbigniew Jędrzejewski-Szmek bd897e729a nspawn: add a hint to the message we emit when a child dies
From #10526:

$ sudo systemd-nspawn -i image
Spawning container image on /home/zbyszek/src/mkosi/image.
Press ^] three times within 1s to kill container.
Short read while reading cgroup mode.
2018-11-13 11:58:44 +01:00
Lennart Poettering 1d78fea2d6 nspawn: rework how we allocate/kill scopes
Fixes: #6347
2018-11-09 17:08:59 +01:00
Lennart Poettering df61bc5e4a nspawn: merge two variable declaration lines 2018-11-09 17:08:59 +01:00
Lennart Poettering 11d81e506e nspawn: simplify machine terminate bus call
We have the machine name anyway, let's use TerminateMachine() on
machined's Manager object directly with it. That way it's a single
method call only, instead of two, to terminate the machine.
2018-11-09 17:08:59 +01:00
Lennart Poettering e5a2d8b5b5 nspawn: make use of the new sd_bus_set_close_on_exit() call in nspawn 2018-11-09 17:08:59 +01:00
Yu Watanabe 57512c893e tree-wide: set WRITE_STRING_FILE_DISABLE_BUFFER flag when we write files under /proc or /sys 2018-11-06 21:24:03 +09:00
Lennart Poettering 6619ad889d nspawn: beef up netns checking a bit, for compat with old kernels
Fixes: #10544
2018-10-31 21:42:45 +03:00
Lennart Poettering e2d39e549f nspawn: add proper error message if setns() on network namespace fd fails
Addresses: https://github.com/systemd/systemd/pull/10589#issuecomment-434670595
2018-10-31 18:07:30 +01:00
Yu Watanabe 5a937ea2f6 sd-device: make sd_device_get_is_initialized() returns is_initialized by return value 2018-10-29 17:33:33 +09:00
Jiuyang liu a2f577fca0 add ephemeral to nspawn-settings. 2018-10-24 10:22:20 +02:00
Zbigniew Jędrzejewski-Szmek 369ca6dab1 systemd-nspawn: do not crash on /var/log/journal creation if not required
When running a read-only file system, we might not be able to create
/var/log/journal. Do not fail on this, unless actually requested by the
--link-journal options.

$ systemd-nspawn --image=image.squashfs ...
2018-10-22 15:07:08 +02:00
Yu Watanabe c65ac075ef nspawn: do not include '%m' in log message if errno is zero 2018-10-20 02:01:15 +09:00
Yu Watanabe b0b8c9a5a4
Merge pull request #10389 from poettering/nspawn-path-fix
nspawn $PATH execvpe() fix
2018-10-19 08:48:37 +09:00
Lennart Poettering 2ff48e981e tree-wide: introduce setsockopt_int() helper and make use of it everywhere
As suggested by @heftig:

6d5e65f645 (commitcomment-30938667)
2018-10-18 19:50:29 +02:00
Lennart Poettering c0815ca93d
Merge pull request #10407 from yuwata/netlink-slot
sd-netlink: introduce sd_netlink_slot object and relevant functions
2018-10-18 18:05:58 +02:00
Lennart Poettering b6b180b77b nspawn: use container $PATH (not host $PATH) when searching for PID 1 binaries to execute
Fixes: #10377
2018-10-18 16:40:12 +02:00
Yu Watanabe 8190a388a6 sd-netlink: make sd_netlink_slot take its description 2018-10-16 18:42:23 +09:00
Lennart Poettering 271f518f35 nspawn: TAKE_FD() is your friend 2018-10-15 19:45:37 +02:00
Lennart Poettering fbda85b078 tree-wide: use sockaddr_un_unlink() at two more places where appropriate 2018-10-15 19:44:34 +02:00
Lennart Poettering 6d5e65f645 tree-wide: add a single version of "static const int one = 1"
All over the place we define local variables for the various sockopts
that take a bool-like "int" value. Sometimes they are const, sometimes
static, sometimes both, sometimes neither.

Let's clean this up, introduce a common const variable "const_int_one"
(as well as one matching "const_int_zero") and use it everywhere, all
acorss the codebase.
2018-10-15 19:40:51 +02:00
Lennart Poettering 44ed5214ad tree-wide: use structured initialization for sockaddr_un 2018-10-15 19:35:00 +02:00
Yu Watanabe ee38400bba sd-netlink: introduce sd_netlink_slot 2018-10-15 18:10:04 +09:00
David Tardon f369f47c26 be consistent about sun_path length
Most places use the whole buffer for name, without leaving extra space
for the trailing NUL.
2018-10-12 12:38:49 +02:00
Lennart Poettering b37469d7d1 nspawn: add comments explaining the namespacing situation and the inner/outer children 2018-10-09 10:52:17 +02:00
Lennart Poettering 1099ceebce nspawn: optionally don't mount a tmpfs over /tmp (#10294)
nspawn: optionally, don't mount a tmpfs on /tmp

Fixes: #10260
2018-10-08 18:32:03 +02:00
Lennart Poettering ff6c6cc117 nspawn: when --quiet is passed, simply downgrade log messages to LOG_DEBUG (#10181)
With this change almost all log messages that are suppressed through
--quiet are not actually suppressed anymore, but simply downgraded to
LOG_DEBUG. Previously we did it this way for some log messages and fully
suppressed them for others. With this it's pretty much systematic.

Inspired by #10122.
2018-09-26 23:40:39 +02:00
Evgeny Vereshchagin 89f180201c nspawn: chown() the legacy hierarchy when it's used in a container
This is a follow-up to 720f0a2f3c.

Closes https://github.com/systemd/systemd/issues/10026
Closes https://github.com/systemd/systemd/issues/9563
2018-09-26 17:29:17 +02:00
Lennart Poettering ee8d493cbd
Merge pull request #10158 from keszybz/seccomp-log-tightening
Seccomp log tightening
2018-09-26 15:56:32 +02:00
Yu Watanabe 6c9c51e5e2 fs-util: make symlink_idempotent() optionally create relative link 2018-09-24 18:52:53 +03:00
Zbigniew Jędrzejewski-Szmek 7e86bd73a4 seccomp: tighten checking of seccomp filter creation
In seccomp code, the code is changed to propagate errors which are about
anything other than unknown/unimplemented syscalls. I *think* such errors
should not happen in normal usage, but so far we would summarilly ignore all
errors, so that part is uncertain. If it turns out that other errors occur and
should be ignored, this should be added later.

In nspawn, we would count the number of added filters, but didn't use this for
anything. Drop that part.

The comments suggested that seccomp_add_syscall_filter_item() returned negative
if the syscall is unknown, but this wasn't true: it returns 0.

The error at this point can only be if the syscall was known but couldn't be
added. If the error comes from our internal whitelist in nspawn, treat this as
error, because it means that our internal table is wrong. If the error comes
from user arguments, warn and ignore. (If some syscall is not known at current
architecture, it is still silently ignored.)
2018-09-24 17:21:09 +02:00
Zbigniew Jędrzejewski-Szmek b54f36c604 seccomp: reduce logging about failure to add syscall to seccomp
Our logs are full of:
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain
...
This is pointless and makes debug logs hard to read. Let's keep the logs
in test code, but disable it in nspawn and pid1. This is done through a function
parameter because those functions operate recursively and it's not possible to
make the caller to log meaningfully.


There should be no functional change, except the skipped debug logs.
2018-09-24 17:21:09 +02:00
Yu Watanabe cf37f937ee nspawn: suppress one more log message when --quiet is passed
Fixes #10119.
2018-09-19 08:42:17 +02:00
Yu Watanabe 93bab28895 tree-wide: use typesafe_qsort() 2018-09-19 08:02:52 +09:00
Zbigniew Jędrzejewski-Szmek 6d7c403324 tests: use a helper function to parse environment and open logging
The advantages are that we save a few lines, and that we can override
logging using environment variables in more test executables.
2018-09-14 09:29:57 +02:00
afg 27b620b7db nspawn: use copy-static if systemd-resolved is up and image is writable 2018-09-12 20:48:21 +02:00
Franck Bui 03d0f4b58e nspawn: always use mode 555 for /sys
When a network namespace is needed, /sys is mounted as tmpfs (see commit
d8fc6a000f for details).

But in this case mode 755 was used as initial permissions for /sys whereas the
default mode for sysfs is 555.

In practice using 755 doesn't have any impact because /sys is mounted read-only
too but for consistency, let's use the correct mode.

Fixes: #10050
2018-09-11 00:34:00 +02:00
Yu Watanabe f55b0d3fd6 nspawn: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Zbigniew Jędrzejewski-Szmek 7692fed98b
Merge pull request #9783 from poettering/get-user-creds-flags
beef up get_user_creds() a bit and other improvements
2018-08-21 10:09:33 +02:00
Lennart Poettering 8967f29169 nspawn: add two missing OOM checks 2018-08-20 15:58:11 +02:00
Lennart Poettering 8dfce114ab nspawn: make sure to create /dev/char/x:y symlinks in nspawn containers too
On the host udev creates these, but they are useful API, hence create
them in nspawn containers too.
2018-08-20 15:58:11 +02:00
Lennart Poettering 37ec0fdd34 tree-wide: add clickable man page link to all --help texts
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.

I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
2018-08-20 11:33:04 +02:00
Yu Watanabe 4ae25393f3 tree-wide: shorten error logging a bit
Continuation of 4027f96aa0.
2018-08-07 10:14:33 +09:00
Luke Shumaker 677a72cd3e nspawn: mount_sysfs(): Unconditionally mkdir /sys/fs/cgroup
Currently, mount_sysfs() only creates /sys/fs/cgroup if cg_ns_supported().
The comment explains that we need to "Create mountpoint for
cgroups. Otherwise we are not allowed since we remount /sys read-only.";
that is: that we need to do it now, rather than later.  However, the
comment doesn't do anything to explain why we only need to do this if
cg_ns_supported(); shouldn't we _always_ need to do it?

The answer is that if !use_cgns, then this was already done by the outer
child, so mount_sysfs() only needs to do it if use_cgns.  Now,
mount_sysfs() doesn't know whether use_cgns, but !cg_ns_supported() implies
!use_cgns, so we can optimize" the case where we _know_ !use_cgns, and deal
with a no-op mkdir_p() in the false-positive where cgns_supported() but
!use_cgns.

But is it really much of an optimization?  We're potentially spending an
access(2) (cg_ns_supported() could be cached from a previous call) to
potentially save an lstat(2) and mkdir(2); and all of them are on virtual
fileystems, so they should all be pretty cheap.

So, simplify and drop the conditional.  It's a dubious optimization that
requires more text to explain than it's worth.
2018-07-20 12:12:03 -04:00
Luke Shumaker 93dbdf6cb1 nspawn: sync_cgroup(): Rename arg_uid_shift -> uid_shift
Naming it arg_uid_shift is confusing because of the global arg_uid_shift in
nspawn.c
2018-07-20 12:12:02 -04:00
Luke Shumaker 0402948206 nspawn: Move cgroup mount stuff from nspawn-mount.c to nspawn-cgroup.c 2018-07-20 12:12:02 -04:00
Luke Shumaker 2fa017f169 nspawn: Simplify tmpfs_patch_options() usage, and trickle that up
One of the things that tmpfs_patch_options does is take an (optional) UID,
and insert "uid=${UID},gid=${UID}" into the options string.  So we need a
uid_t argument, and a way of telling if we should use it.  Fortunately,
that is built in to the uid_t type by having UID_INVALID as a possible
value.

So this is really a feature that requires one argument.  Yet, it is somehow
taking 4!  That is absurd.  Simplify it to only take one argument, and have
that trickle all the way up to mount_all()'s usage.

Now, in may of the uses, the argument becomes

    uid_shift == 0 ? UID_INVALID : uid_shift

because it used to treat uid_shift=0 as invalid unless the patch_ids flag
was also set.  This keeps the behavior the same.  Note that in all cases
where it is invoked, if !use_userns (sometimes called !userns), then
uid_shift is 0; we don't have to add any checks for that.

That said, I'm pretty sure that "uid=0" and not setting "uid=" are the
same, but Christian Brauner seemed to not think so when implementing the
cgns support.  https://github.com/systemd/systemd/pull/3589
2018-07-20 12:12:02 -04:00
Luke Shumaker 9c0fad5fb5 nspawn: Simplify mkdir_userns() usage, and trickle that up
One of the things that mkdir_userns{,_p}() does is take an (optional) UID,
and chown the directory to that.  So we need a uid_t argument, and a way of
telling if we should use that uid_t argument.  Fortunately, that is built
in to the uid_t type by having UID_INVALID as a possible value.

However, currently mkdir_userns() also takes a MountSettingsMask and checks
a couple of bits in it to decide if it should perform the chown.

Drop the mask argument, and instead have the caller pass UID_INVALID if it
shouldn't chown.
2018-07-20 12:12:02 -04:00
Lennart Poettering a7e2e50d35 summary: update nspawn description string a bit
nspawn as it is now is a generally useful tool, hence let's drop the
comments about it being useful for debug and so on only.

The new wording just makes the first sentence of the main page also the
summary.
2018-06-28 11:55:44 +09:00
Zbigniew Jędrzejewski-Szmek 0cd41d4dff Drop my copyright headers
perl -i -0pe 's/\s*Copyright © .... Zbigniew Jędrzejewski.*?\n/\n/gms' man/*xml
git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/(#\n)?# +Copyright © [0-9, -]+ Zbigniew Jędrzejewski.*?\n//gms'
git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/\s*\/\*\*\*\s+Copyright © [0-9, -]+ Zbigniew Jędrzejewski[^\n]*?\s*\*\*\*\/\s*/\n\n/gms'
git grep -e 'Copyright.*Jędrzejewski' -l | xargs perl -i -0pe 's/\s+Copyright © [0-9, -]+ Zbigniew Jędrzejewski[^\n]*//gms'
2018-06-14 13:03:20 +02:00
Lennart Poettering 96b2fb93c5 tree-wide: beautify remaining copyright statements
Let's unify an beautify our remaining copyright statements, with a
unicode ©. This means our copyright statements are now always formatted
the same way. Yay.
2018-06-14 10:20:21 +02:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering df1fac6dea nspawn: free global variables before exiting
This doesn't really matter much, but is prettier for valgrind
2018-06-13 17:51:40 +02:00
Lennart Poettering 2f14e52f08 nspawn: drop unused parameter from one call 2018-06-13 17:42:16 +02:00
Lennart Poettering ef31828d06 tree-wide: unify how we define bit mak enums
Let's always write "1 << 0", "1 << 1" and so on, except where we need
more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force
64bit values.
2018-06-12 21:44:00 +02:00
Lennart Poettering b8b846d7b4 tree-wide: fix a number of log calls that use %m but have no errno set
This is mostly fall-out from d1a1f0aaf0,
however some cases are older bugs.

There might be more issues lurking, this was a simple grep for "%m"
across the tree, with all lines removed that mention "errno" at all.
2018-06-07 15:29:17 +02:00
Lennart Poettering 669fc4e5c5 tree-wide: some O_NDELAY → O_NONBLOCK fixes
Somehow the coccinelle script misses these, hence fix them manually.
2018-05-31 12:04:39 +02:00
Lennart Poettering d32d473d66
Merge pull request #9103 from keszybz/more-tables-tests
More tables tests
2018-05-28 14:24:19 +02:00
Zbigniew Jędrzejewski-Szmek 83e803a9ef nspawn: reset umask early
Fixes #8911.
2018-05-28 11:01:43 +02:00
Zbigniew Jędrzejewski-Szmek 667c1baff5 nspawn: remove some vertical whitespace
Sometimes an empty line is good for readability, but here I think
they all can be removed without any loss.
2018-05-28 11:01:43 +02:00
Zbigniew Jędrzejewski-Szmek 8514095fe6 test-nspawn-tables: add another "tables" test 2018-05-28 10:40:00 +02:00
Zbigniew Jędrzejewski-Szmek 97d9061563 meson: use a convenience static library for nspawn core
This makes it easier to link the nspawn implementation to the tests.
Right now this just means that nspawn-patch-uid.c is not compiled
twice, which is nice, but results in test-patch-uid being slightly bigger,
which is not nice. But in general, we should use convenience libs to
compile everything just once, as far as possible. Otherwise, once we
start compiling a few files here twice, and a few file there thrice, we
soon end up in a state where we are doing hundreds of extra compilations.
So let's do the "right" thing, even if is might not be more efficient.
2018-05-28 10:40:00 +02:00
Lennart Poettering 3a6ce860ac machine-image: rework error handling
Let's rework error handling a bit in image_find() and friends: when we
can't find an image, return -ENOENT rather than 0. That's better as
before we violated the usual rule in our codebase that return parameters
are initialized when the return value is >= 0 and otherwise not touched.

This also makes enumeration and validation a bit more strict: we'll only
accept ".raw" as suffix for regular files, and filter out this suffix
handling on directories/subvolumes, where it makes no sense.
2018-05-24 17:01:57 +02:00
Lennart Poettering 5ef46e5f65 machine-image: introduce two different classes of images
This distuingishes two different classes of images, one for the purpose
of npsawn-like containers, i.e. "machines", and one for portable
services.

This distinction is mostly about search paths. We look for machine
images in /var/lib/machines and for portable images in
/var/lib/portables.
2018-05-24 17:01:57 +02:00
Lennart Poettering d58ad743f9 os-util: add helpers for finding /etc/os-release
Place this new helpers in a new source file os-util.[ch], and move the
existing and related call path_is_os_tree() to it as well.
2018-05-24 17:01:57 +02:00
Lennart Poettering 03bcb6d408 dissect: optionally, validate that the image we dissect is a valid OS image
We already do this kind of validation in nspawn when we operate on a
plain directory, let's also do this on raw images under the same
condition: that we are about too boot the image. Also, do this when we
are about to read OS metadata from it.
2018-05-24 17:01:57 +02:00
Zbigniew Jędrzejewski-Szmek 17c1b9a93f
Merge pull request #9024 from poettering/nspawn-attrs-more
make even more nspawn concepts configurable
2018-05-24 16:27:27 +02:00
Zbigniew Jędrzejewski-Szmek 7cd92e2e9d
Merge pull request #9068 from poettering/nspawn-pty-deadlock
nspawn logging deadlock fix
2018-05-24 16:25:22 +02:00
Zbigniew Jędrzejewski-Szmek 14d0afb94d
Merge pull request #9065 from poettering/fixup-tab-double-newline
tree-wide: fix some TABs and double newlines
2018-05-22 17:14:48 +02:00
Lennart Poettering 17cac366ae nspawn: make sure our container PID 1 keeps logging to the original stderr as long as possible
If we log to the pty that is configured as stdin/stdout/stderr of the
container too early we risk filling it up in full before we start
processing the pty from the parent process, resulting in deadlocks.
Let's hence keep a copy of the original tty we were started on before
setting up stdin/stdout/stderr, so that we can log to it, and keep using
it as long as we can.

Since the kernel's pty internal buffer is pretty small this actually
triggered deadlocks when we debug logged at lot from nspawn's child
processes, see: https://github.com/systemd/systemd/pull/9024#issuecomment-390403674

With this change we won't use the pty at all, only the actual payload we
start will, and hence we won't deadlock on it, ever.
2018-05-22 16:52:50 +02:00
Lennart Poettering 8ca082b49a nspawn: make use of log_set_open_when_needed() in nspawn too
Let's make use of log_set_open_when_needed() in nspawn too, i.e. at the
point where we close logging because we are about to rearrange fds,
let's automatically reopen the logging fds when we need them, the same
way as we do that in the service manager. This makes things simpler and
more robust.
2018-05-22 16:51:28 +02:00
Lennart Poettering f728ab1724 nspawn: let's rename _FORCE_ENUM_WIDTH → _SETTING_FORCE_ENUM_WIDTH
Just some preparation in case we need a similar hack in another enum one
day.
2018-05-22 16:21:26 +02:00
Lennart Poettering 1688841f46 nspawn: similar to the previous patches, also make /etc/localtime handling more configurable
Fixes: #9009
2018-05-22 16:21:26 +02:00
Lennart Poettering 63d1c29ffa nspawn: complain if people still use --share-system 2018-05-22 16:20:08 +02:00
Lennart Poettering 4e1d6aa983 nspawn: make --link-journal= configurable through .nspawn files, too 2018-05-22 16:20:08 +02:00
Lennart Poettering b8ea7a6e12 nspawn: add a bit of debug logging to resolved_listening() 2018-05-22 16:19:26 +02:00
Lennart Poettering 09d423e921 nspawn: add greater control over how /etc/resolv.conf is handled
Fixes: #8014 #1781
2018-05-22 16:19:26 +02:00
Lennart Poettering 8904ab86b0
Merge pull request #9062 from poettering/parse-conf-macro
add new CONFIG_PARSER_PROTOTYPE() macro
2018-05-22 16:14:49 +02:00
Lennart Poettering a5201ed6ce tree-wide: fix a couple of TABs 2018-05-22 16:13:45 +02:00
Arnaud Rebillout c9fe05e07d nspawn: support pivot-root option during directory validation
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
2018-05-22 14:42:10 +02:00
Lennart Poettering c0d7a4f0cd
Merge pull request #9061 from poettering/dump-string-table
add new DUMP_STRING_TABLE() macro and make use of it everywhere
2018-05-22 14:28:38 +02:00
Lennart Poettering a210692525 tree-wide: port over all code to the new CONFIG_PARSER_PROTOTYPE() macro
This makes most header files easier to look at. Also Emacs gets really
slow when browsing through large sections of overly long prototypes,
which is much improved by this macro.

We should probably not do something similar with too many other cases,
as macros like this might help readability for some, but make it worse
for others. But I think given the complexity of this specific prototype
and how often we use it, it's worth doing.
2018-05-22 13:18:44 +02:00
Lennart Poettering 5c828e66b5 tree-wide: port various bits of the tree over to the new DUMP_STRING_TABLE() macro 2018-05-22 13:14:18 +02:00
Zbigniew Jędrzejewski-Szmek b49c6ca089 systemd-nspawn: make SettingsMask 64 bit wide
The use of UINT64_C() in the SettingsMask enum definition is misleading:
it does not mean that individual fields have this width. E.g., with
enum {
   FOO = UINT64_C(1)
}
sizeof(FOO) gives 4. It only means that the shift is done properly. So
1 << 35 is undefined, but UINT64_C(1) << 35 is the expected 64 bit
constant. Thus, the use UINT64_C() is useful, because we know that the shifts
are done properly, no matter what the value of _RLIMIT_MAX is, but when those
fields are used in expressions, we don't know what size they will be
(probably 4). Let's add a define which "hides" the enum definition behind a
define which gives the same value but is actually 64 bit. I think this is a
nicer solution than requiring all users to cast SETTING_RLIMIT_FIRST before
use.

Fixes #9035.
2018-05-22 10:51:49 +02:00
Lennart Poettering 919f5ae0c7 nspawn: voidify more things 2018-05-17 20:48:55 +02:00
Lennart Poettering 5d9614077d nspawn: split out merging of settings object
Let's separate the loading of the settings object and the merging into
our arg_xyz fields into two.

This will become particularly useful when we eventually are able to load
settings from OCI runtime files in addition to .nspawn files.
2018-05-17 20:48:55 +02:00
Lennart Poettering d107bb7d63 nspawn: add a new --cpu-affinity= switch
Similar as the other options added before, this is primarily useful to
provide comprehensive OCI runtime compatbility, but might be useful
otherwise, too.
2018-05-17 20:48:54 +02:00
Lennart Poettering 50ebcf6cb7 nspawn: show --help text in a pager
The text is long enough now, and we do auto-paging for systemctl
already, hence let's do it here too.
2018-05-17 20:48:13 +02:00
Lennart Poettering 81f345dfed nspawn: add a new --oom-score-adjust= command line switch
This is primarily useful in order to provide comprehensive OCI runtime
compatibility with nspawn, but might have uses outside of it.
2018-05-17 20:48:12 +02:00
Lennart Poettering c818eef1cd nspawn: properly handle and log about hostname setting errors 2018-05-17 20:47:21 +02:00
Lennart Poettering 66edd96310 nspawn: add a new --no-new-privileges= cmdline option to nspawn
This simply controls the PR_SET_NO_NEW_PRIVS flag for the container.
This too is primarily relevant to provide OCI runtime compaitiblity, but
might have other uses too, in particular as it nicely complements the
existing --capability= and --drop-capability= flags.
2018-05-17 20:47:20 +02:00
Lennart Poettering 3a9530e5f1 nspawn: make the hostname of the container explicitly configurable with a new --hostname= switch
Previously, the container's hostname was exclusively initialized from
the machine name configured with --machine=, i.e. the internal name and
the external name used for and by the container was synchronized. This
adds a new option --hostname= that optionally allows the internal name
to deviate from the external name.

This new option is mainly useful to ultimately implement the OCI runtime
spec directly in nspawn, but it might be useful on its own for some
other usecases too.
2018-05-17 20:46:45 +02:00
Lennart Poettering bf428efb07 nspawn: add new --rlimit= switch, and always set resource limits explicitly for our container payloads
This ensures we set the various resource limits of our container
explicitly on each invocation so that we inherit less from our callers
into the payload.

By default resource limits are now set to the same values Linux
generally passes to the host PID 1, thus minimizing needless differences
between host and container environments.

The limits are now also configurable using a new --rlimit= switch. This
is preparation for teaching nspawn native OCI runtime support as OCI
permits setting resource limits for container payloads, and it hence
probably makes sense if we do too.
2018-05-17 20:45:54 +02:00
Yu Watanabe 130d3d22e9 tree-wide: use strv_free_and_replace() macro 2018-05-10 00:57:34 +09:00
Lennart Poettering 720f0a2f3c nspawn: move nspawn cgroup hierarchy one level down unconditionally
We need to do this in all cases, including on cgroupsv1 in order to
ensure the host systemd and any systemd in the payload won't fight for
the cgroup attributes of the top-level cgroup of the payload.

This is because systemd for Delegate=yes units will only delegate the
right to create children as well as their attributes. However, nspawn
expects that the cgroup delegated covers both the right to create
children and the attributes of the cgroup itself. Hence, to clear this
up, let's unconditionally insert a intermediary cgroup, on cgroupsv1 as
well as cgroupsv2, unconditionally.

This is also nice as it reduces the differences in the various setups
and exposes very close behaviour everywhere.
2018-05-03 17:45:42 +02:00
Lennart Poettering 910384c821 nspawn: let's make use of SPECIAL_MACHINE_SLICE macro, after all we already set it 2018-05-03 17:45:42 +02:00
Lennart Poettering 9ec5a93c98 nspawn: don't make /proc/kmsg node too special
Similar to the previous commit, let's just use our regular calls for
managing temporary nodes take care of this.
2018-05-03 17:45:42 +02:00
Lennart Poettering cdde6ba6b6 nspawn: mount boot ID from temporary file in /tmp
Let's not make /run too special and let's make sure the source file is
not guessable: let's use our regular temporary file helper calls to
create the source node.
2018-05-03 17:45:42 +02:00
Lennart Poettering d4b653c589 nspawn: lock down a few things in /proc by default
This tightens security on /proc: a couple of files exposed there are now
made inaccessible. These files might potentially leak kernel internals
or expose non-virtualized concepts, hence lock them down by default.
Moreover, a couple of dirs in /proc that expose stuff also exposed in
/sys are now marked read-only, similar to how we handle /sys.

The list is taken from what docker/runc based container managers
generally apply, but slightly extended.
2018-05-03 17:45:42 +02:00
Lennart Poettering 10af01a5ff nspawn: use free_and_replace() at more places 2018-05-03 17:19:46 +02:00
Lennart Poettering 88614c8a28 nspawn: size_t more stuff
A follow-up for #8840
2018-05-03 17:19:46 +02:00
Yu Watanabe 29a3db75fd util: rename signal_from_string_try_harder() to signal_from_string()
Also this makes the new `signal_from_string()` function reject
e.g, `SIG3` or `SIG+5`.
2018-05-03 16:52:49 +09:00
Yu Watanabe 1e4f1671c2 nspawn: fix warning by -Wnonnull (#8877) 2018-05-02 10:03:31 +02:00
Lennart Poettering 8e766630f0 tree-wide: drop redundant _cleanup_ macros (#8810)
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.

In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.

Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.

Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.

Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
2018-04-25 12:31:45 +02:00
Lennart Poettering 0c300adfa4 nspawn: when running nspawn, set a $PATH including both bin + sbin by default (#8756)
We don't know what the container payload needs, hence default to a PATH
with both bin and sbin included, as well as / and /usr.

Follow-up for #8324

Fixes: #8698
2018-04-20 11:36:25 +02:00
Lennart Poettering 5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Philip Sequeira 7511655807 nspawn: wait for network namespace creation before interface setup (#8633)
Otherwise, network interfaces can be "moved" into the container's
namespace while it's still the same as the host namespace, in which case
e.g. host0 for a veth ends up on the host side instead of inside the
container.

Regression introduced in 0441378080.

Fixes #8599.
2018-04-05 07:04:27 -07:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Lennart Poettering 959071cac2
Merge pull request #8552 from keszybz/test-improvements
Test and diagnostics improvements
2018-03-23 15:26:54 +01:00
Zbigniew Jędrzejewski-Szmek 37c1d5e97d tree-wide: warn when a directory path already exists but has bad mode/owner/type
When we are attempting to create directory somewhere in the bowels of /var/lib
and get an error that it already exists, it can be quite hard to diagnose what
is wrong (especially for a user who is not aware that the directory must have
the specified owner, and permissions not looser than what was requested). Let's
print a warning in most cases. A warning is appropriate, because such state is
usually a sign of borked installation and needs to be resolved by the adminstrator.

$ build/test-fs-util

Path "/tmp/test-readlink_and_make_absolute" already exists and is not a directory, refusing.
   (or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but has mode 0775 that is too permissive (0755 was requested), refusing.
   (or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but is owned by 1001:1000 (1000:1000 was requested), refusing.

Assertion 'mkdir_safe(tempdir, 0755, getuid(), getgid(), MKDIR_WARN_MODE) >= 0' failed at ../src/test/test-fs-util.c:320, function test_readlink_and_make_absolute(). Aborting.

No functional change except for the new log lines.
2018-03-23 10:26:38 +01:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Zbigniew Jędrzejewski-Szmek d50b5839b0 basic/mkdir: convert bool flag to enum
In preparation for subsequent changes...
2018-03-22 15:57:56 +01:00
Zbigniew Jędrzejewski-Szmek 37cbc1d579 When mangling names, optionally emit a warning (#8400)
The warning is not emitted for absolute paths like /dev/sda or /home, which are
converted to .device and .mount unit names without any fuss.

Most of the time it's unlikely that users use invalid unit names on purpose,
so let's warn them. Warnings are silenced when --quiet is used.

$ build/systemctl show -p Id hello@foo-bar/baz
Invalid unit name "hello@foo-bar/baz" was escaped as "hello@foo-bar-baz" (maybe you should use systemd-escape?)
Id=hello@foo-bar-baz.service

$ build/systemd-run --user --slice foo-bar/baz --unit foo-bar/foo true
Invalid unit name "foo-bar/foo" was escaped as "foo-bar-foo" (maybe you should use systemd-escape?)
Invalid unit name "foo-bar/baz" was escaped as "foo-bar-baz" (maybe you should use systemd-escape?)
Running as unit: foo-bar-foo.service

Fixes #8302.
2018-03-21 15:26:47 +01:00
Lennart Poettering 4526113f57 dissect: add dissect_image_and_warn() that unifies error message generation for dissect_image() (#8517) 2018-03-21 12:10:01 +01:00
Zbigniew Jędrzejewski-Szmek 0441378080 nspawn: move network namespace creation to a separate step (#8430)
Fixes #8427.

Unsharing the namespace in a separate step changes the ownership of
/proc/net/ip_tables_names (and related files) from nobody:nobody to
root:root. See [1] and [2] for all the details.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f13f2aeed154da8e48f90b85e720f8ba39b1e881
[2] https://bugzilla.netfilter.org/show_bug.cgi?id=1064#c9
2018-03-20 18:07:17 +01:00
Lennart Poettering 2b33ab0957 tree-wide: port various places over to use new rearrange_stdio() 2018-03-02 11:42:10 +01:00
Lennart Poettering 05a8b3305f nspawn: close pipe on error 2018-02-28 10:01:16 +01:00
Lennart Poettering e7685a77b4 util: add new safe_close_above_stdio() wrapper
At various places we only want to close fds if they are not
stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call
for that, and port everything over.
2018-02-28 10:00:50 +01:00
Lennart Poettering c7f9a8d270 nspawn: propagate original error. No need to make up -EIO 2018-02-28 10:00:50 +01:00
Lennart Poettering 5018c0c9e8 nspawn: use STR_IN_SET() where we can 2018-02-28 10:00:50 +01:00
Lennart Poettering c5b82d86b5 nspawn: port some code to use read_line()
This shortens our code a bit. Which is always nice.
2018-02-28 10:00:50 +01:00
Zbigniew Jędrzejewski-Szmek aa484f3561 tree-wide: use reallocarray instead of our home-grown realloc_multiply (#8279)
There isn't much difference, but in general we prefer to use the standard
functions. glibc provides reallocarray since version 2.26.

I moved explicit_bzero is configure test to the bottom, so that the two stdlib
functions are at the bottom.
2018-02-26 21:20:00 +01:00
Yu Watanabe 72d967df3e nspawn: remove unnecessary mount option parsing logic 2018-02-21 09:06:55 +09:00
Yu Watanabe 30ffb010ff nspawn: fix indentation 2018-02-21 09:05:33 +09:00
Zbigniew Jędrzejewski-Szmek 8405dcf752 nspawn: make sure we don't leak the fd in chase_symlinks_and_update
No callers use CHASE_OPEN right now, but let's be defensive.
2018-02-15 10:18:25 +01:00
Lennart Poettering d72495759b tree-wide: port all code to use safe_getcwd() 2018-01-17 11:17:38 +01:00
Lennart Poettering dccca82b1a log: minimize includes in log.h
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.

Let's hence drop inclusion of:

1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
   declaration
4. process-util.h which was needed for getpid_cached() which we now hide
   in a funciton log_emergency_level() instead, which nicely abstracts
   the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
   forward declaration suffices for that too.

Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.

(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
2018-01-11 14:44:31 +01:00
Lennart Poettering 75152a4d6a tree-wide: install matches asynchronously
Let's remove a number of synchronization points from our service
startups: let's drop synchronous match installation, and let's opt for
asynchronous instead.

Also, let's use sd_bus_match_signal() instead of sd_bus_add_match()
where we can.
2018-01-05 13:58:32 +01:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Lennart Poettering b6e1fff13d process-util: add another fork_safe() flag for enabling LOG_ERR/LOG_WARN logging 2018-01-04 13:27:26 +01:00
Lennart Poettering 4c253ed1ca tree-wide: introduce new safe_fork() helper and port everything over
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:

1. Optionally resets signal handlers/mask in the child

2. Sets a name on all processes we fork off right after forking off (and
   the patch assigns useful names for all processes we fork off now,
   following a systematic naming scheme: always enclosed in () – in order
   to indicate that these are not proper, exec()ed processes, but only
   forked off children, and if the process is long-running with only our
   own code, without execve()'ing something else, it gets am "sd-" prefix.)

3. Optionally closes all file descriptors in the child

4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
   way so that the parent dying before this happens being handled
   safely.

5. Optionally reopens the logs

6. Optionally connects stdin/stdout/stderr to /dev/null

7. Debug logs about the forked off processes.
2017-12-25 11:48:21 +01:00
Lennart Poettering ebe6ff658d
Merge pull request #7663 from keszybz/mkdir-return-value
util-lib: fix return value in mkdir_parents()
2017-12-24 11:59:58 +01:00
Yu Watanabe 89ada3ba08 bus-unit-util: add socket unit related options
Also, split bus_append_unit_property_assignment().
2017-12-23 18:48:16 +09:00
Henrik Grindal Bakken cacc0d7a78 nspawn: Include missing.h 2017-12-18 14:15:17 +01:00
Zbigniew Jędrzejewski-Szmek dae8b82eb9 Add mkdir_errno_wrapper() and use instead of mkdir() in various places
We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir
returns the error in errno while mkdir_label returns the error directly.
2017-12-16 13:28:22 +01:00
Zbigniew Jędrzejewski-Szmek bdd2bbc445
Merge pull request #7469 from kinvolk/dongsu/nspawn-netns
nspawn: introduce an option for specifying network namespace path
2017-12-14 22:47:57 +01:00
Lennart Poettering fbd0b64f44
tree-wide: make use of new STRLEN() macro everywhere (#7639)
Let's employ coccinelle to do this for us.

Follow-up for #7625.
2017-12-14 19:02:29 +01:00
Dongsu Park d7bea6b629 nspawn: introduce an option for specifying network namespace path
Add a new option `--network-namespace-path` to systemd-nspawn to allow
users to specify an arbitrary network namespace, e.g. `/run/netns/foo`.
Then systemd-nspawn will open the netns file, pass the fd to
outer_child, and enter the namespace represented by the fd before
running inner_child.

```
$ sudo ip netns add foo
$ mount | grep /run/netns/foo
nsfs on /run/netns/foo type nsfs (rw)
...
$ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \
  /bin/readlink -f /proc/self/ns/net
/proc/1/ns/net:[4026532009]
```

Note that the option `--network-namespace-path=` cannot be used together
with other network-related options such as `--private-network` so that
the options do not conflict with each other.

Fixes https://github.com/systemd/systemd/issues/7361
2017-12-13 10:21:06 +00:00
Lennart Poettering fba868fa71 tree-wide: unify logging of "Must be root" message
Let's unify this in one call, generalizing must_be_root() from
bootctl.c.
2017-12-11 23:19:45 +01:00
Lennart Poettering 8fd010bb1b nspawn: turn on watchdog logic for nspawn too
It's a long-running daemon, and it's easy to enable, hence do it.
2017-12-07 12:34:46 +01:00
Lennart Poettering 87d5e4f286 build-sys: make the dynamic UID range, and the container UID range configurable
Also, export these ranges in our pkg-config files.
2017-12-06 12:55:37 +01:00
Lennart Poettering de54e02d5e nspawn: when in hybrid mode, chown() both the legacy and the unified hierarchy to the root in the container
If user namespacing is used, let's make sure that the root user in the
container gets access to both /sys/fs/cgroup/systemd and
/sys/fs/cgroup/unified.

This matches similar logic in cg_set_access().
2017-12-05 13:49:13 +01:00
Lennart Poettering 2d3a5a73e0 nspawn: make sure images containing an ESP are compatible with userns -U mode
In -U mode we might need to re-chown() all files and directories to
match the UID shift we want for the image. That's problematic on fat
partitions, such as the ESP (and which is generated by mkosi's
--bootable switch), because fat of course knows no UID/GID file
ownership natively.

With this change we take benefit of the uid= and gid= mount options FAT
knows: instead of chown()ing all files and directories we can just
specify the right UID/GID to use at mount time.

This beefs up the image dissection logic in two ways:

1. First of all support for mounting relevant file systems with
   uid=/gid= is added: when a UID is specified during mount it is used for
   all applicable file systems.

2. Secondly, two new mount flags are added:
   DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY.
   If one is specified the mount routine will either only mount the root
   partition of an image, or all partitions except the root partition.
   This is used by nspawn: first the root partition is mounted, so that
   we can determine the UID shift in use so far, based on ownership of
   the image's root directory. Then, we mount the remaining partitions
   in a second go, this time with the right UID/GID information.
2017-12-05 13:49:12 +01:00
Lennart Poettering 1cfdbe293f cgroup: also include "cgroups.threads" in the list of files to chown
Also, add "cgroups.stat". It's read-only anyway, hence its UID/GID
ownership matters little, but it's probably a good idea to keep it
ownership in sync with the other read-only files such as
"cgroups.controllers".

Also, order the list of files alphabetically.
2017-12-05 13:49:12 +01:00
Lennart Poettering 8199d554c1 nspawn: figure out cgroup mode *after* mounting image
If we operate on a disk image (i.e. --image=) then it's pointless to
look into the mount directory before it is actually mounted to see which
systemd version is running inside...

Unfortunately we only mount the disk image in the child process, but the
parent needs to know the cgroup mode, hence add some IPC for this
purpose and communicate the cgroup mode determined from the image back
to the parent.
2017-12-05 13:49:12 +01:00
Zbigniew Jędrzejewski-Szmek 40fd52f28d util-lib: rename path_check_fstype to path_is_fs_type 2017-11-30 20:43:25 +01:00
Yu Watanabe 62b1e758d3 nspawn: adjust path to static resolv.conf to support split usr
Fixes #7302.
2017-11-25 21:11:07 +09:00
Lennart Poettering d381c8a6bf nspawn: hash the machine name, when looking for a suitable UID base (#7437)
When "-U" is used we look for a UID range we can use for our container.
We start with the UID the tree is already assigned to, and if that
didn't work we'd pick random ranges so far. With this change we'll first
try to hash a suitable range from the container name, and use that if it
works, in order to make UID assignments more likely to be stable.

This follows a similar logic PID 1 follows when using DynamicUser=1.
2017-11-24 20:57:19 +01:00
Lennart Poettering a8027a18f1
Merge pull request #7442 from poettering/scope-fixes
some fixes to the scope unit type
2017-11-24 17:15:09 +01:00
Lennart Poettering f170504825
Merge pull request #7453 from neosilky/coccinelle-fixes
Applied fixes from Coccinelle
2017-11-24 13:29:48 +01:00
Daniel Lockyer f9ecfd3bbe Replace free and reassignment with free_and_replace 2017-11-24 10:33:41 +00:00
Daniel Lockyer 87e4e28dcf Replace empty ternary with helper method 2017-11-24 09:31:08 +00:00
Lennart Poettering abdb9b08f6 nspawn: make use of the RequestStop logic of scope units
Since time began, scope units had a concept of "Controllers", a bus peer
that would be notified when somebody requested a unit to stop. None of
our code used that facility so far, let's change that.

This way, nspawn can print a nice message when somebody invokes
"systemctl stop" on the container's scope unit, and then react with the
right action to shut it down.
2017-11-23 21:47:48 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Lennart Poettering 6925a0de4e cgroup-util: move Set* allocation into cg_kernel_controllers()
Previously, callers had to do this on their own. Let's make the call do
that instead, making the caller code a bit shorter.
2017-11-21 11:54:08 +01:00
Lennart Poettering bf516294c8 nspawn: minor optimization
no need to prepare the target path if we quite the loop anyway one step
later.
2017-11-21 11:54:08 +01:00
Lennart Poettering d7c9693a3e nspawn-mount: rework get_controllers() a bit
Let's rename get_controllers() → get_process_controllers(), in order to
underline the difference to cg_kernel_controllers(). After all, one
returns the controllers available to the process, the other the
controllers enabled in the kernel at all).

Let's also update the code to use read_line() and set_put_strdup() to
shorten the code a bit, and make it more robust.
2017-11-21 11:54:08 +01:00
Lennart Poettering ea9053c5f8 nspawn: rework mount_systemd_cgroup_writable() a bit
We shouldn't call alloca() as part of function calls, that's not really
defined in C. Hence, let's first do our stack allocations, and then
invoke functions.

Also, some coding style fixes, and minor shuffling around.

No functional changes.
2017-11-21 11:54:08 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek 3a726fcd08 Add license headers and SPDX identifiers to meson.build files
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering 3603efdea5 nspawn: make recursive chown()ing logic safe for being aborted in the middle
We currently use the ownership of the top-level directory as a hint
whether we need to descent into the whole tree to chown() it recursively
or not. This is problematic with the previous chown()ing algorithm, as
when descending into the tree we'd first chown() and then descend
further down, which meant that the top-level directory would be chowned
first, and an aborted recursive chowning would appear on the next
invocation as successful, even though it was not. Let's reshuffle things
a bit, to make the re-chown()ing safe regarding interruptions:

a) We chown() the dir we are looking at last, and descent into all its
   children first. That way we know that if the top-level dir is
   properly owned everything inside of it is properly owned too.

b) Before starting a chown()ing operation, we mark the top-level
   directory as owned by a special "busy" UID range, which we can use to
   recognize whether a tree was fully chowned: if it is marked as busy,
   it's definitely not fully chowned, as the busy ownership will only be
   fixed as final step of the chowning.

Fixes: #6292
2017-11-17 11:12:33 +01:00
Lennart Poettering 14f8ccc755 nspawn: add missing #pragma once to header file 2017-11-17 11:12:33 +01:00
Lennart Poettering 0986658d51
Merge pull request #6866 from sourcejedi/set-linger2
logind: fix `loginctl enable-linger`
2017-11-15 11:15:15 +01:00
Lennart Poettering bcde742e78 conf-parser: turn three bool function params into a flags fields
This makes things more readable and fixes some issues with incorrect
flag propagation between the various flavours of config_parse().
2017-11-13 10:24:03 +01:00
Lennart Poettering 759aaedc5c dissect: when we invoke dissection on a loop device with partscan help the user
This adds some simply detection logic for cases where dissection is
invoked on an externally created loop device, and partitions have been
detected on it, but partition scanning so far was off. If this is
detected we now print a brief message indicating what the issue is,
instead of failing with a useless EINVAL message the kernel passed to
us.
2017-10-26 17:54:56 +02:00
Lennart Poettering eb38edce88 machine-image: add partial discovery of block devices as images
This adds some basic discovery of block device images for nspawn and
friends. Note that this doesn't add searching for block devices using
udev, but instead expects users to symlink relevant block devices into
/var/lib/machines. Discovery is hence done exactly like for
dir/subvol/raw file images, except that what is found may be a (symlink
to) a block device.

For now, we do not support cloning these images, but removal, renaming
and read-only flags are supported to the point where that makes sense.

Fixe: #6990
2017-10-26 17:54:56 +02:00
Lauri Tirkkonen 4f13e53428 nspawn: EROFS for chowning mount points is not fatal (#7122)
This fixes --read-only with --private-users. mkdir_userns_p may return
-EROFS if either mkdir or lchown fails; lchown failing is fine as the
mount point will just be overmounted, and if mkdir fails then the
following mount() will also fail (with ENOENT).
2017-10-24 19:40:50 +02:00
myrkr 1898e5f9a3 nspawn: Fix calculation of capabilities for configuration file (#7087)
The current code shifting an integer 1 failed for capabilities like
CAP_MAC_ADMIN (numerical value 33). This caused issues when specifying
them in the nspawn configuration file. Using an uint64_t 1 instead.

The similar code for processing the --capability command line option
was already correctly working.
2017-10-24 09:56:40 +02:00
Alan Jenkins 8d9c2bca41 nspawn: comment to acknowledge lying about "user session" 2017-10-18 09:47:10 +01:00
Yu Watanabe c31ad02403 mkdir: introduce follow_symlink flag to mkdir_safe{,_label}() 2017-10-06 16:03:33 +09:00
Lennart Poettering 44898c5358 seccomp: add three more seccomp groups
@aio → asynchronous IO calls
@sync → msync/fsync/... and friends
@chown → changing file ownership

(Also, change @privileged to reference @chown now, instead of the
individual syscalls it contains)
2017-10-05 15:42:48 +02:00
Lennart Poettering 4c3a917617 seccomp: include prlimit64 and ugetrlimit in @default
Also, move prlimit64() out of @resources.

prlimit64() may be used both for getting and setting resource limits, and
is implicitly called by glibc at various places, on some archs, the same
was as getrlimit(). SImilar, igetrlimit() is an arch-specific
replacement for getrlimit(), and hence should be whitelisted at the same
place as getrlimit() and prlimit64().

Also see: https://lists.freedesktop.org/archives/systemd-devel/2017-September/039543.html
2017-10-05 11:27:34 +02:00