Commit Graph

832 Commits

Author SHA1 Message Date
Lennart Poettering da9fc98ded tree-wide: port more code over to PATH_STARTSWITH_SET() 2018-11-26 14:08:46 +01:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Zbigniew Jędrzejewski-Szmek 294bf0c34a Split out pretty-print.c and move pager.c and main-func.h to shared/
This is high-level functionality, and fits better in shared/ (which is for
our executables), than in basic/ (which is also for libraries).
2018-11-20 18:40:02 +01:00
Lennart Poettering 042cad5737
Merge pull request #10753 from keszybz/pager-no-interrupt
Add mode in journalctl where ^C is handled by the pager
2018-11-14 20:09:39 +01:00
Zbigniew Jędrzejewski-Szmek 0221d68a13 basic/pager: convert the pager options to a flags argument
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
2018-11-14 16:25:11 +01:00
Zbigniew Jędrzejewski-Szmek bd897e729a nspawn: add a hint to the message we emit when a child dies
From #10526:

$ sudo systemd-nspawn -i image
Spawning container image on /home/zbyszek/src/mkosi/image.
Press ^] three times within 1s to kill container.
Short read while reading cgroup mode.
2018-11-13 11:58:44 +01:00
Lennart Poettering 1d78fea2d6 nspawn: rework how we allocate/kill scopes
Fixes: #6347
2018-11-09 17:08:59 +01:00
Lennart Poettering 11d81e506e nspawn: simplify machine terminate bus call
We have the machine name anyway, let's use TerminateMachine() on
machined's Manager object directly with it. That way it's a single
method call only, instead of two, to terminate the machine.
2018-11-09 17:08:59 +01:00
Lennart Poettering e5a2d8b5b5 nspawn: make use of the new sd_bus_set_close_on_exit() call in nspawn 2018-11-09 17:08:59 +01:00
Yu Watanabe 57512c893e tree-wide: set WRITE_STRING_FILE_DISABLE_BUFFER flag when we write files under /proc or /sys 2018-11-06 21:24:03 +09:00
Lennart Poettering 6619ad889d nspawn: beef up netns checking a bit, for compat with old kernels
Fixes: #10544
2018-10-31 21:42:45 +03:00
Lennart Poettering e2d39e549f nspawn: add proper error message if setns() on network namespace fd fails
Addresses: https://github.com/systemd/systemd/pull/10589#issuecomment-434670595
2018-10-31 18:07:30 +01:00
Jiuyang liu a2f577fca0 add ephemeral to nspawn-settings. 2018-10-24 10:22:20 +02:00
Zbigniew Jędrzejewski-Szmek 369ca6dab1 systemd-nspawn: do not crash on /var/log/journal creation if not required
When running a read-only file system, we might not be able to create
/var/log/journal. Do not fail on this, unless actually requested by the
--link-journal options.

$ systemd-nspawn --image=image.squashfs ...
2018-10-22 15:07:08 +02:00
Yu Watanabe b0b8c9a5a4
Merge pull request #10389 from poettering/nspawn-path-fix
nspawn $PATH execvpe() fix
2018-10-19 08:48:37 +09:00
Lennart Poettering 2ff48e981e tree-wide: introduce setsockopt_int() helper and make use of it everywhere
As suggested by @heftig:

6d5e65f645 (commitcomment-30938667)
2018-10-18 19:50:29 +02:00
Lennart Poettering b6b180b77b nspawn: use container $PATH (not host $PATH) when searching for PID 1 binaries to execute
Fixes: #10377
2018-10-18 16:40:12 +02:00
Lennart Poettering 271f518f35 nspawn: TAKE_FD() is your friend 2018-10-15 19:45:37 +02:00
Lennart Poettering fbda85b078 tree-wide: use sockaddr_un_unlink() at two more places where appropriate 2018-10-15 19:44:34 +02:00
Lennart Poettering 6d5e65f645 tree-wide: add a single version of "static const int one = 1"
All over the place we define local variables for the various sockopts
that take a bool-like "int" value. Sometimes they are const, sometimes
static, sometimes both, sometimes neither.

Let's clean this up, introduce a common const variable "const_int_one"
(as well as one matching "const_int_zero") and use it everywhere, all
acorss the codebase.
2018-10-15 19:40:51 +02:00
Lennart Poettering 44ed5214ad tree-wide: use structured initialization for sockaddr_un 2018-10-15 19:35:00 +02:00
David Tardon f369f47c26 be consistent about sun_path length
Most places use the whole buffer for name, without leaving extra space
for the trailing NUL.
2018-10-12 12:38:49 +02:00
Lennart Poettering b37469d7d1 nspawn: add comments explaining the namespacing situation and the inner/outer children 2018-10-09 10:52:17 +02:00
Lennart Poettering 1099ceebce nspawn: optionally don't mount a tmpfs over /tmp (#10294)
nspawn: optionally, don't mount a tmpfs on /tmp

Fixes: #10260
2018-10-08 18:32:03 +02:00
Lennart Poettering ff6c6cc117 nspawn: when --quiet is passed, simply downgrade log messages to LOG_DEBUG (#10181)
With this change almost all log messages that are suppressed through
--quiet are not actually suppressed anymore, but simply downgraded to
LOG_DEBUG. Previously we did it this way for some log messages and fully
suppressed them for others. With this it's pretty much systematic.

Inspired by #10122.
2018-09-26 23:40:39 +02:00
Yu Watanabe cf37f937ee nspawn: suppress one more log message when --quiet is passed
Fixes #10119.
2018-09-19 08:42:17 +02:00
afg 27b620b7db nspawn: use copy-static if systemd-resolved is up and image is writable 2018-09-12 20:48:21 +02:00
Yu Watanabe f55b0d3fd6 nspawn: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Zbigniew Jędrzejewski-Szmek 7692fed98b
Merge pull request #9783 from poettering/get-user-creds-flags
beef up get_user_creds() a bit and other improvements
2018-08-21 10:09:33 +02:00
Lennart Poettering 8967f29169 nspawn: add two missing OOM checks 2018-08-20 15:58:11 +02:00
Lennart Poettering 8dfce114ab nspawn: make sure to create /dev/char/x:y symlinks in nspawn containers too
On the host udev creates these, but they are useful API, hence create
them in nspawn containers too.
2018-08-20 15:58:11 +02:00
Lennart Poettering 37ec0fdd34 tree-wide: add clickable man page link to all --help texts
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.

I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
2018-08-20 11:33:04 +02:00
Luke Shumaker 2fa017f169 nspawn: Simplify tmpfs_patch_options() usage, and trickle that up
One of the things that tmpfs_patch_options does is take an (optional) UID,
and insert "uid=${UID},gid=${UID}" into the options string.  So we need a
uid_t argument, and a way of telling if we should use it.  Fortunately,
that is built in to the uid_t type by having UID_INVALID as a possible
value.

So this is really a feature that requires one argument.  Yet, it is somehow
taking 4!  That is absurd.  Simplify it to only take one argument, and have
that trickle all the way up to mount_all()'s usage.

Now, in may of the uses, the argument becomes

    uid_shift == 0 ? UID_INVALID : uid_shift

because it used to treat uid_shift=0 as invalid unless the patch_ids flag
was also set.  This keeps the behavior the same.  Note that in all cases
where it is invoked, if !use_userns (sometimes called !userns), then
uid_shift is 0; we don't have to add any checks for that.

That said, I'm pretty sure that "uid=0" and not setting "uid=" are the
same, but Christian Brauner seemed to not think so when implementing the
cgns support.  https://github.com/systemd/systemd/pull/3589
2018-07-20 12:12:02 -04:00
Lennart Poettering a7e2e50d35 summary: update nspawn description string a bit
nspawn as it is now is a generally useful tool, hence let's drop the
comments about it being useful for debug and so on only.

The new wording just makes the first sentence of the main page also the
summary.
2018-06-28 11:55:44 +09:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering df1fac6dea nspawn: free global variables before exiting
This doesn't really matter much, but is prettier for valgrind
2018-06-13 17:51:40 +02:00
Lennart Poettering b8b846d7b4 tree-wide: fix a number of log calls that use %m but have no errno set
This is mostly fall-out from d1a1f0aaf0,
however some cases are older bugs.

There might be more issues lurking, this was a simple grep for "%m"
across the tree, with all lines removed that mention "errno" at all.
2018-06-07 15:29:17 +02:00
Lennart Poettering 669fc4e5c5 tree-wide: some O_NDELAY → O_NONBLOCK fixes
Somehow the coccinelle script misses these, hence fix them manually.
2018-05-31 12:04:39 +02:00
Zbigniew Jędrzejewski-Szmek 83e803a9ef nspawn: reset umask early
Fixes #8911.
2018-05-28 11:01:43 +02:00
Zbigniew Jędrzejewski-Szmek 667c1baff5 nspawn: remove some vertical whitespace
Sometimes an empty line is good for readability, but here I think
they all can be removed without any loss.
2018-05-28 11:01:43 +02:00
Lennart Poettering 3a6ce860ac machine-image: rework error handling
Let's rework error handling a bit in image_find() and friends: when we
can't find an image, return -ENOENT rather than 0. That's better as
before we violated the usual rule in our codebase that return parameters
are initialized when the return value is >= 0 and otherwise not touched.

This also makes enumeration and validation a bit more strict: we'll only
accept ".raw" as suffix for regular files, and filter out this suffix
handling on directories/subvolumes, where it makes no sense.
2018-05-24 17:01:57 +02:00
Lennart Poettering 5ef46e5f65 machine-image: introduce two different classes of images
This distuingishes two different classes of images, one for the purpose
of npsawn-like containers, i.e. "machines", and one for portable
services.

This distinction is mostly about search paths. We look for machine
images in /var/lib/machines and for portable images in
/var/lib/portables.
2018-05-24 17:01:57 +02:00
Lennart Poettering d58ad743f9 os-util: add helpers for finding /etc/os-release
Place this new helpers in a new source file os-util.[ch], and move the
existing and related call path_is_os_tree() to it as well.
2018-05-24 17:01:57 +02:00
Lennart Poettering 03bcb6d408 dissect: optionally, validate that the image we dissect is a valid OS image
We already do this kind of validation in nspawn when we operate on a
plain directory, let's also do this on raw images under the same
condition: that we are about too boot the image. Also, do this when we
are about to read OS metadata from it.
2018-05-24 17:01:57 +02:00
Zbigniew Jędrzejewski-Szmek 17c1b9a93f
Merge pull request #9024 from poettering/nspawn-attrs-more
make even more nspawn concepts configurable
2018-05-24 16:27:27 +02:00
Zbigniew Jędrzejewski-Szmek 7cd92e2e9d
Merge pull request #9068 from poettering/nspawn-pty-deadlock
nspawn logging deadlock fix
2018-05-24 16:25:22 +02:00
Lennart Poettering 17cac366ae nspawn: make sure our container PID 1 keeps logging to the original stderr as long as possible
If we log to the pty that is configured as stdin/stdout/stderr of the
container too early we risk filling it up in full before we start
processing the pty from the parent process, resulting in deadlocks.
Let's hence keep a copy of the original tty we were started on before
setting up stdin/stdout/stderr, so that we can log to it, and keep using
it as long as we can.

Since the kernel's pty internal buffer is pretty small this actually
triggered deadlocks when we debug logged at lot from nspawn's child
processes, see: https://github.com/systemd/systemd/pull/9024#issuecomment-390403674

With this change we won't use the pty at all, only the actual payload we
start will, and hence we won't deadlock on it, ever.
2018-05-22 16:52:50 +02:00
Lennart Poettering 8ca082b49a nspawn: make use of log_set_open_when_needed() in nspawn too
Let's make use of log_set_open_when_needed() in nspawn too, i.e. at the
point where we close logging because we are about to rearrange fds,
let's automatically reopen the logging fds when we need them, the same
way as we do that in the service manager. This makes things simpler and
more robust.
2018-05-22 16:51:28 +02:00
Lennart Poettering 1688841f46 nspawn: similar to the previous patches, also make /etc/localtime handling more configurable
Fixes: #9009
2018-05-22 16:21:26 +02:00
Lennart Poettering 63d1c29ffa nspawn: complain if people still use --share-system 2018-05-22 16:20:08 +02:00
Lennart Poettering 4e1d6aa983 nspawn: make --link-journal= configurable through .nspawn files, too 2018-05-22 16:20:08 +02:00
Lennart Poettering b8ea7a6e12 nspawn: add a bit of debug logging to resolved_listening() 2018-05-22 16:19:26 +02:00
Lennart Poettering 09d423e921 nspawn: add greater control over how /etc/resolv.conf is handled
Fixes: #8014 #1781
2018-05-22 16:19:26 +02:00
Lennart Poettering a5201ed6ce tree-wide: fix a couple of TABs 2018-05-22 16:13:45 +02:00
Arnaud Rebillout c9fe05e07d nspawn: support pivot-root option during directory validation
Signed-off-by: Arnaud Rebillout <arnaud.rebillout@collabora.com>
2018-05-22 14:42:10 +02:00
Lennart Poettering 5c828e66b5 tree-wide: port various bits of the tree over to the new DUMP_STRING_TABLE() macro 2018-05-22 13:14:18 +02:00
Lennart Poettering 919f5ae0c7 nspawn: voidify more things 2018-05-17 20:48:55 +02:00
Lennart Poettering 5d9614077d nspawn: split out merging of settings object
Let's separate the loading of the settings object and the merging into
our arg_xyz fields into two.

This will become particularly useful when we eventually are able to load
settings from OCI runtime files in addition to .nspawn files.
2018-05-17 20:48:55 +02:00
Lennart Poettering d107bb7d63 nspawn: add a new --cpu-affinity= switch
Similar as the other options added before, this is primarily useful to
provide comprehensive OCI runtime compatbility, but might be useful
otherwise, too.
2018-05-17 20:48:54 +02:00
Lennart Poettering 50ebcf6cb7 nspawn: show --help text in a pager
The text is long enough now, and we do auto-paging for systemctl
already, hence let's do it here too.
2018-05-17 20:48:13 +02:00
Lennart Poettering 81f345dfed nspawn: add a new --oom-score-adjust= command line switch
This is primarily useful in order to provide comprehensive OCI runtime
compatibility with nspawn, but might have uses outside of it.
2018-05-17 20:48:12 +02:00
Lennart Poettering c818eef1cd nspawn: properly handle and log about hostname setting errors 2018-05-17 20:47:21 +02:00
Lennart Poettering 66edd96310 nspawn: add a new --no-new-privileges= cmdline option to nspawn
This simply controls the PR_SET_NO_NEW_PRIVS flag for the container.
This too is primarily relevant to provide OCI runtime compaitiblity, but
might have other uses too, in particular as it nicely complements the
existing --capability= and --drop-capability= flags.
2018-05-17 20:47:20 +02:00
Lennart Poettering 3a9530e5f1 nspawn: make the hostname of the container explicitly configurable with a new --hostname= switch
Previously, the container's hostname was exclusively initialized from
the machine name configured with --machine=, i.e. the internal name and
the external name used for and by the container was synchronized. This
adds a new option --hostname= that optionally allows the internal name
to deviate from the external name.

This new option is mainly useful to ultimately implement the OCI runtime
spec directly in nspawn, but it might be useful on its own for some
other usecases too.
2018-05-17 20:46:45 +02:00
Lennart Poettering bf428efb07 nspawn: add new --rlimit= switch, and always set resource limits explicitly for our container payloads
This ensures we set the various resource limits of our container
explicitly on each invocation so that we inherit less from our callers
into the payload.

By default resource limits are now set to the same values Linux
generally passes to the host PID 1, thus minimizing needless differences
between host and container environments.

The limits are now also configurable using a new --rlimit= switch. This
is preparation for teaching nspawn native OCI runtime support as OCI
permits setting resource limits for container payloads, and it hence
probably makes sense if we do too.
2018-05-17 20:45:54 +02:00
Yu Watanabe 130d3d22e9 tree-wide: use strv_free_and_replace() macro 2018-05-10 00:57:34 +09:00
Lennart Poettering 720f0a2f3c nspawn: move nspawn cgroup hierarchy one level down unconditionally
We need to do this in all cases, including on cgroupsv1 in order to
ensure the host systemd and any systemd in the payload won't fight for
the cgroup attributes of the top-level cgroup of the payload.

This is because systemd for Delegate=yes units will only delegate the
right to create children as well as their attributes. However, nspawn
expects that the cgroup delegated covers both the right to create
children and the attributes of the cgroup itself. Hence, to clear this
up, let's unconditionally insert a intermediary cgroup, on cgroupsv1 as
well as cgroupsv2, unconditionally.

This is also nice as it reduces the differences in the various setups
and exposes very close behaviour everywhere.
2018-05-03 17:45:42 +02:00
Lennart Poettering 9ec5a93c98 nspawn: don't make /proc/kmsg node too special
Similar to the previous commit, let's just use our regular calls for
managing temporary nodes take care of this.
2018-05-03 17:45:42 +02:00
Lennart Poettering cdde6ba6b6 nspawn: mount boot ID from temporary file in /tmp
Let's not make /run too special and let's make sure the source file is
not guessable: let's use our regular temporary file helper calls to
create the source node.
2018-05-03 17:45:42 +02:00
Lennart Poettering 88614c8a28 nspawn: size_t more stuff
A follow-up for #8840
2018-05-03 17:19:46 +02:00
Yu Watanabe 29a3db75fd util: rename signal_from_string_try_harder() to signal_from_string()
Also this makes the new `signal_from_string()` function reject
e.g, `SIG3` or `SIG+5`.
2018-05-03 16:52:49 +09:00
Yu Watanabe 1e4f1671c2 nspawn: fix warning by -Wnonnull (#8877) 2018-05-02 10:03:31 +02:00
Lennart Poettering 8e766630f0 tree-wide: drop redundant _cleanup_ macros (#8810)
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.

In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.

Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.

Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.

Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
2018-04-25 12:31:45 +02:00
Lennart Poettering 0c300adfa4 nspawn: when running nspawn, set a $PATH including both bin + sbin by default (#8756)
We don't know what the container payload needs, hence default to a PATH
with both bin and sbin included, as well as / and /usr.

Follow-up for #8324

Fixes: #8698
2018-04-20 11:36:25 +02:00
Lennart Poettering 5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Philip Sequeira 7511655807 nspawn: wait for network namespace creation before interface setup (#8633)
Otherwise, network interfaces can be "moved" into the container's
namespace while it's still the same as the host namespace, in which case
e.g. host0 for a veth ends up on the host side instead of inside the
container.

Regression introduced in 0441378080.

Fixes #8599.
2018-04-05 07:04:27 -07:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Lennart Poettering 4526113f57 dissect: add dissect_image_and_warn() that unifies error message generation for dissect_image() (#8517) 2018-03-21 12:10:01 +01:00
Zbigniew Jędrzejewski-Szmek 0441378080 nspawn: move network namespace creation to a separate step (#8430)
Fixes #8427.

Unsharing the namespace in a separate step changes the ownership of
/proc/net/ip_tables_names (and related files) from nobody:nobody to
root:root. See [1] and [2] for all the details.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f13f2aeed154da8e48f90b85e720f8ba39b1e881
[2] https://bugzilla.netfilter.org/show_bug.cgi?id=1064#c9
2018-03-20 18:07:17 +01:00
Lennart Poettering 2b33ab0957 tree-wide: port various places over to use new rearrange_stdio() 2018-03-02 11:42:10 +01:00
Zbigniew Jędrzejewski-Szmek 8405dcf752 nspawn: make sure we don't leak the fd in chase_symlinks_and_update
No callers use CHASE_OPEN right now, but let's be defensive.
2018-02-15 10:18:25 +01:00
Lennart Poettering d72495759b tree-wide: port all code to use safe_getcwd() 2018-01-17 11:17:38 +01:00
Lennart Poettering 75152a4d6a tree-wide: install matches asynchronously
Let's remove a number of synchronization points from our service
startups: let's drop synchronous match installation, and let's opt for
asynchronous instead.

Also, let's use sd_bus_match_signal() instead of sd_bus_add_match()
where we can.
2018-01-05 13:58:32 +01:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Zbigniew Jędrzejewski-Szmek dae8b82eb9 Add mkdir_errno_wrapper() and use instead of mkdir() in various places
We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir
returns the error in errno while mkdir_label returns the error directly.
2017-12-16 13:28:22 +01:00
Zbigniew Jędrzejewski-Szmek bdd2bbc445
Merge pull request #7469 from kinvolk/dongsu/nspawn-netns
nspawn: introduce an option for specifying network namespace path
2017-12-14 22:47:57 +01:00
Lennart Poettering fbd0b64f44
tree-wide: make use of new STRLEN() macro everywhere (#7639)
Let's employ coccinelle to do this for us.

Follow-up for #7625.
2017-12-14 19:02:29 +01:00
Dongsu Park d7bea6b629 nspawn: introduce an option for specifying network namespace path
Add a new option `--network-namespace-path` to systemd-nspawn to allow
users to specify an arbitrary network namespace, e.g. `/run/netns/foo`.
Then systemd-nspawn will open the netns file, pass the fd to
outer_child, and enter the namespace represented by the fd before
running inner_child.

```
$ sudo ip netns add foo
$ mount | grep /run/netns/foo
nsfs on /run/netns/foo type nsfs (rw)
...
$ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \
  /bin/readlink -f /proc/self/ns/net
/proc/1/ns/net:[4026532009]
```

Note that the option `--network-namespace-path=` cannot be used together
with other network-related options such as `--private-network` so that
the options do not conflict with each other.

Fixes https://github.com/systemd/systemd/issues/7361
2017-12-13 10:21:06 +00:00
Lennart Poettering fba868fa71 tree-wide: unify logging of "Must be root" message
Let's unify this in one call, generalizing must_be_root() from
bootctl.c.
2017-12-11 23:19:45 +01:00
Lennart Poettering 8fd010bb1b nspawn: turn on watchdog logic for nspawn too
It's a long-running daemon, and it's easy to enable, hence do it.
2017-12-07 12:34:46 +01:00
Lennart Poettering 87d5e4f286 build-sys: make the dynamic UID range, and the container UID range configurable
Also, export these ranges in our pkg-config files.
2017-12-06 12:55:37 +01:00
Lennart Poettering de54e02d5e nspawn: when in hybrid mode, chown() both the legacy and the unified hierarchy to the root in the container
If user namespacing is used, let's make sure that the root user in the
container gets access to both /sys/fs/cgroup/systemd and
/sys/fs/cgroup/unified.

This matches similar logic in cg_set_access().
2017-12-05 13:49:13 +01:00
Lennart Poettering 2d3a5a73e0 nspawn: make sure images containing an ESP are compatible with userns -U mode
In -U mode we might need to re-chown() all files and directories to
match the UID shift we want for the image. That's problematic on fat
partitions, such as the ESP (and which is generated by mkosi's
--bootable switch), because fat of course knows no UID/GID file
ownership natively.

With this change we take benefit of the uid= and gid= mount options FAT
knows: instead of chown()ing all files and directories we can just
specify the right UID/GID to use at mount time.

This beefs up the image dissection logic in two ways:

1. First of all support for mounting relevant file systems with
   uid=/gid= is added: when a UID is specified during mount it is used for
   all applicable file systems.

2. Secondly, two new mount flags are added:
   DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY.
   If one is specified the mount routine will either only mount the root
   partition of an image, or all partitions except the root partition.
   This is used by nspawn: first the root partition is mounted, so that
   we can determine the UID shift in use so far, based on ownership of
   the image's root directory. Then, we mount the remaining partitions
   in a second go, this time with the right UID/GID information.
2017-12-05 13:49:12 +01:00
Lennart Poettering 8199d554c1 nspawn: figure out cgroup mode *after* mounting image
If we operate on a disk image (i.e. --image=) then it's pointless to
look into the mount directory before it is actually mounted to see which
systemd version is running inside...

Unfortunately we only mount the disk image in the child process, but the
parent needs to know the cgroup mode, hence add some IPC for this
purpose and communicate the cgroup mode determined from the image back
to the parent.
2017-12-05 13:49:12 +01:00
Yu Watanabe 62b1e758d3 nspawn: adjust path to static resolv.conf to support split usr
Fixes #7302.
2017-11-25 21:11:07 +09:00
Lennart Poettering d381c8a6bf nspawn: hash the machine name, when looking for a suitable UID base (#7437)
When "-U" is used we look for a UID range we can use for our container.
We start with the UID the tree is already assigned to, and if that
didn't work we'd pick random ranges so far. With this change we'll first
try to hash a suitable range from the container name, and use that if it
works, in order to make UID assignments more likely to be stable.

This follows a similar logic PID 1 follows when using DynamicUser=1.
2017-11-24 20:57:19 +01:00
Lennart Poettering abdb9b08f6 nspawn: make use of the RequestStop logic of scope units
Since time began, scope units had a concept of "Controllers", a bus peer
that would be notified when somebody requested a unit to stop. None of
our code used that facility so far, let's change that.

This way, nspawn can print a nice message when somebody invokes
"systemctl stop" on the container's scope unit, and then react with the
right action to shut it down.
2017-11-23 21:47:48 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering 3603efdea5 nspawn: make recursive chown()ing logic safe for being aborted in the middle
We currently use the ownership of the top-level directory as a hint
whether we need to descent into the whole tree to chown() it recursively
or not. This is problematic with the previous chown()ing algorithm, as
when descending into the tree we'd first chown() and then descend
further down, which meant that the top-level directory would be chowned
first, and an aborted recursive chowning would appear on the next
invocation as successful, even though it was not. Let's reshuffle things
a bit, to make the re-chown()ing safe regarding interruptions:

a) We chown() the dir we are looking at last, and descent into all its
   children first. That way we know that if the top-level dir is
   properly owned everything inside of it is properly owned too.

b) Before starting a chown()ing operation, we mark the top-level
   directory as owned by a special "busy" UID range, which we can use to
   recognize whether a tree was fully chowned: if it is marked as busy,
   it's definitely not fully chowned, as the busy ownership will only be
   fixed as final step of the chowning.

Fixes: #6292
2017-11-17 11:12:33 +01:00
Lennart Poettering 0986658d51
Merge pull request #6866 from sourcejedi/set-linger2
logind: fix `loginctl enable-linger`
2017-11-15 11:15:15 +01:00
Lennart Poettering 759aaedc5c dissect: when we invoke dissection on a loop device with partscan help the user
This adds some simply detection logic for cases where dissection is
invoked on an externally created loop device, and partitions have been
detected on it, but partition scanning so far was off. If this is
detected we now print a brief message indicating what the issue is,
instead of failing with a useless EINVAL message the kernel passed to
us.
2017-10-26 17:54:56 +02:00
Lennart Poettering eb38edce88 machine-image: add partial discovery of block devices as images
This adds some basic discovery of block device images for nspawn and
friends. Note that this doesn't add searching for block devices using
udev, but instead expects users to symlink relevant block devices into
/var/lib/machines. Discovery is hence done exactly like for
dir/subvol/raw file images, except that what is found may be a (symlink
to) a block device.

For now, we do not support cloning these images, but removal, renaming
and read-only flags are supported to the point where that makes sense.

Fixe: #6990
2017-10-26 17:54:56 +02:00
Alan Jenkins 8d9c2bca41 nspawn: comment to acknowledge lying about "user session" 2017-10-18 09:47:10 +01:00
Zbigniew Jędrzejewski-Szmek 349cc4a507 build-sys: use #if Y instead of #ifdef Y everywhere
The advantage is that is the name is mispellt, cpp will warn us.

$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build

squash! build-sys: use #if Y instead of #ifdef Y everywhere

v2:
- fix incorrect setting of HAVE_LIBIDN2
2017-10-04 12:09:29 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Lennart Poettering 8e5430c4bd nspawn: set up a new session keyring for the container process
keyring material should not leak into the container. So far we relied on
seccomp to deny access to the keyring, but given that we now made the
seccomp configurable, and access to keyctl() and friends may optionally
be permitted to containers now let's make sure we disconnect the callers
keyring from the keyring of PID 1 in the container.
2017-09-22 15:28:04 +02:00
Lennart Poettering 960e4569e1 nspawn: implement configurable syscall whitelisting/blacklisting
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.

This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.

(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)

Fixes: #5163

Replaces: #5944
2017-09-12 14:06:21 +02:00
Lennart Poettering 21022b9dde util-lib: wrap personality() to fix up broken glibc error handling (#6766)
glibc appears to propagate different errors in different ways, let's fix
this up, so that our own code doesn't get confused by this.

See #6752 + #6737 for details.

Fixes: #6755
2017-09-08 17:16:29 +03:00
Lennart Poettering 8cb5743079 nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416)
Given that we set NOTIFY_SOCKET unconditionally it's not surprising that
processes way down the process tree think it's smart to send us a
notification message.

It's still useful to keep this message, for debugging things, but it
shouldn't be generated by default.
2017-07-20 14:46:58 -04:00
Lennart Poettering cd2dfc6fae nspawn: register a scope for the unit if --register=no is specified (#6166)
Previously, only when --register=yes was set (the default) the invoked
container would get its own scope, created by machined on behalf of
nspawn. With this change if --register=no is set nspawn will still get
its own scope (which is a good thing, so that --slice= and --property=
take effect), but this is not done through machined but by registering a
scope unit directly in PID 1.

Summary:

--register=yes             → allocate a new scope through machined (the default)
--register=yes --keep-unit → use the unit we are already running in an register with machined
--register=no              → allocate a new scope directly, but no machined
--register=no --keep-unit  → do not allocate nor register anything

Fixes: #5823
2017-06-28 13:22:46 -04:00
Zbigniew Jędrzejewski-Szmek 35bca925f9 tree-wide: fix incorrect uses of %m
In those cases errno was not set, so we would be logging some unrelated error
or "Success".
2017-05-13 15:42:26 -04:00
Zbigniew Jędrzejewski-Szmek ab8ee0f259 tree-wide: use SET_FLAG in more places (#5892) 2017-05-07 07:03:28 -04:00
Zbigniew Jędrzejewski-Szmek 399e391fa6 nspawn: check cgroups after parsing options
Same justification as in previous commit.
2017-04-25 08:54:00 -04:00
Lennart Poettering 948a3241de Merge pull request #5708 from vcatechnology/arm-cross-compile
ARM32 cross-compile fixes
2017-04-17 15:49:06 +02:00
Matt Clarkson 6b5cf3ea62 build-sys: correct blkid.h includes
When using pkg-config to determine the include flags for blkid the
flags are returned as:

    $ pkg-config blkid --cflags
    -I/usr/include/blkid -I/usr/include/uuid

We use the <blkid/blkid.h> include which would be correct when using
the default compiler /usr/include header search path. However, when
cross-compiling the blkid.h will not be installed at /usr/include and
highly likely in a temporary system root. It is futher compounded if
the cross-compile packages are split up and the blkid package is not
available in the same sysroot as the compiler.

Regardless of the compilation setup, the correct include path should be
<blkid.h> if using the pkg-config returned CFLAGS.
2017-04-06 14:33:02 +01:00
David Michael 7357272ed1 nspawn: check if the DNS stub is listening for requests 2017-03-31 11:34:32 -07:00
Zbigniew Jędrzejewski-Szmek 78e4f19ebc Merge pull request #5444 from poettering/cgroups-revert-no-error
Revert "core: simplify cg_[all_]unified()" and more.
2017-02-24 18:48:57 -05:00
AsciiWolf 13e785f7a0 Fix missing space in comments (#5439) 2017-02-24 18:14:02 +01:00
Lennart Poettering c22800e40e cgroup: rename cg_unified() → cg_unified_controller()
cg_unified() is a bit generic a name, let's make clear that it checks
whether a specified controller is in unified mode.
2017-02-24 18:00:04 +01:00
Lennart Poettering b4cccbc13a cgroup: change cg_unified() to possibly return errors again
We use our cgroup APIs in various contexts, including from our libraries
sd-login, sd-bus. As we don#t control those environments we can't rely
that the unified cgroup setup logic succeeds, and hence really shouldn't
assert on it.

This more or less reverts 415fc41cea.
2017-02-24 17:52:58 +01:00
Tejun Heo 2977724b09 core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd hierarchy
Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1
name=systemd hierarchy.  While this works fine for systemd itself, it breaks
tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd.

This patch updates the hybrid mode so that it mounts v2 hierarchy on
/sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on
/sys/fs/cgroup/systemd for compatibility.  systemd itself doesn't depend on the
"name=systemd" hierarchy at all.  All operations take place on the v2 hierarchy
as before but the v1 hierarchy is kept in sync so that any tools which expect
it to be there can keep doing so.  This allows systemd to take advantage of
cgroup v2 process management without requiring other tools to be aware of the
hybrid mode.

The hybrid mode is implemented by mapping the special systemd controller to
/sys/fs/cgroup/unified and making the basic cgroup utility operations -
cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the
/sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated.

While a bit messy, this will allow dropping complications from using cgroup v1
for process management a lot sooner than otherwise possible which should make
it a net gain in terms of maintainability.

v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount
    point to /sys/fs/cgroup/unified as suggested by @brauner.

v3: chown the compat hierarchy too on delegation.  Suggested by @evverx.

v4: [zj]
- drop the change to default, full "legacy" is still the default.
2017-02-20 12:28:35 -05:00
Tejun Heo 415fc41cea core: simplify cg_[all_]unified()
cg_[all_]unified() test whether a specific controller or all controllers are on
the unified hierarchy.  While what's being asked is a simple binary question,
the callers must assume that the functions may fail any time, which
unnecessarily complicates their usages.  This complication is unnecessary.
Internally, the test result is cached anyway and there are only a few places
where the test actually needs to be performed.

This patch simplifies cg_[all_]unified().

* cg_[all_]unified() are updated to return bool.  If the result can't be
  decided, assertion failure is triggered.  Error handlings from their callers
  are dropped.

* cg_unified_flush() is updated to calculate the new result synchrnously and
  return whether it succeeded or not.  Places which need to flush the test
  result are updated to test for failure.  This ensures that all the following
  cg_[all_]unified() tests succeed.

* Places which expected possible cg_[all_]unified() failures are updated to
  call and test cg_unified_flush() before calling cg_[all_]unified().  This
  includes functions used while setting up mounts during boot and
  manager_setup_cgroup().
2017-02-18 17:51:13 -05:00
Tejun Heo bd15ab41a1 nspawn: fix cgroup mode detection
cgroup mode detection is broken in two different ways.

* detect_unified_cgroup_hierarchy() is called too nested in outer_child().
  sync_cgroup() which is used by run() also needs to know the requested cgroup
  mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN.  This makes it
  skip syncing the inner cgroup hierarchy on some config combinations.

   $ cat /proc/self/cgroup | grep systemd
   1:name=systemd:/user.slice/user-0.slice/session-c1.scope

   $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   ...
   [root@container ~]# cat /proc/self/cgroup | grep systemd
   1:name=systemd:/machine.slice/machine-container.x86_64.scope
   $ exit

   $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   [root@container ~]# cat /proc/self/cgroup | grep 0::
   0::/
   $ exit

  Note how the unified hierarchy case's path is not synchronized with the host.
  This for example can cause issues when there are multiple such containers.

  Fixed by moving detect_unified_cgroup_hierarchy() invocation to main().

* inner_child() was invoking cg_unified_flush().  inner_child() executes fully
  scoped and can't determine which cgroup mode the host was in.  It doesn't
  make sense to keep flushing the detected mode when the host mode can't
  change.

  Fixed by replacing cg_unified_flush() invocations in outer_child() and
  inner_child() with one in main().
2017-02-18 17:49:06 -05:00
Zbigniew Jędrzejewski-Szmek 581a07f9f0 Merge pull request #5369 from poettering/nspawn-resolved
fixes for running nspawn+resolved in combination
2017-02-18 11:54:34 -05:00
Lennart Poettering b053cd5f8e nspawn: tweak check whether resolved is around a bit
Let's check D-Bus instead of files in /run to see if resolved is
running. This is a bit nicer as bus names are automatically cleaned up
when resolved dies, which is not the case for files in /run.

See: #4649
2017-02-17 16:06:31 -05:00
Lennart Poettering 1c876927e4 copy: change the various copy_xyz() calls to take a unified flags parameter
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
2017-02-17 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek fc6149a6ce Merge pull request #4962 from poettering/root-directory-2
Add new MountAPIVFS= boolean unit file setting + RootImage=
2017-02-08 23:05:05 -05:00
Philip Withnall b53ede699c nspawn: Add support for sysroot pivoting (#5258)
Add a new --pivot-root argument to systemd-nspawn, which specifies a
directory to pivot to / inside the container; while the original / is
pivoted to another specified directory (if provided). This adds
support for booting container images which may contain several bootable
sysroots, as is common with OSTree disk images. When these disk images
are booted on real hardware, ostree-prepare-root is run in conjunction
with sysroot.mount in the initramfs to achieve the same results.
2017-02-08 16:54:31 +01:00
Lennart Poettering 78ebe98061 core,nspawn,dissect: make nspawn's .roothash file search reusable
This makes nspawn's logic of automatically discovering the root hash of
an image file generic, and then reuses it in systemd-dissect and in
PID1's RootImage= logic, so that verity is automatically set up whenever
we can.
2017-02-07 12:21:28 +01:00
Lennart Poettering ced58da749 nspawn: shown exec() command is misleading
There's no point in updating exec_target for each binary we try to
execute, if we override it right-away anyway... Let's just do this once,
and include all binaries we try each time.

Follow-up for 1a68e1e543.
2017-02-02 20:10:28 +01:00
Philip Withnall 1a68e1e543 nspawn: Print attempted execv() path on failure (#5199)
The failure message is typically currently:
   execv() failed: No such file or directory
which is not very useful because it doesn’t tell you which file or
directory it was trying to exec.
2017-02-01 08:36:16 -05:00
Zbigniew Jędrzejewski-Szmek ec251fe7d5 tree-wide: adjust fall through comments so that gcc is happy
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.

(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
2017-01-31 14:04:55 -05:00
Zbigniew Jędrzejewski-Szmek 9ce6d1b319 nspawn: fix clobbering of selinux context arg
First bug fixed by gcc 7. Yikes.
2017-01-31 14:04:55 -05:00
Evgeny Vereshchagin adc7d9f0da nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
Fixes #4944
2017-01-17 08:40:05 +00:00
Zbigniew Jędrzejewski-Szmek e0489532fd nspawn: fix memleak
CID #1368262: fn is allocated with new, so it should be freed.
2017-01-15 16:57:57 -05:00
Zbigniew Jędrzejewski-Szmek 6b3d378331 Merge pull request #4879 from poettering/systemd 2017-01-14 21:29:27 -05:00
Lennart Poettering 8dbf71ec58 nspawn: reword notice when /dev is pre-mounted and populated (#4971)
Fixes: #4676
2016-12-29 11:02:39 +01:00
Lennart Poettering 87447ae459 nspawn: tweaks to /etc/resolv.conf management
Handle properly if /etc is a symlink (i.e. make sure we don't follow the
symlink outside the image). Also follow /etc/resolv.conf if it is a
symlink, and use the resolved path when creating a mount point and
mounting (as both of these operations follow symlinks and rally
shouldn't).

Handle more types of read-only errors as debug-level issues.
2016-12-21 19:09:32 +01:00
Lennart Poettering 8ccf7e9e96 nspawn: don't complain when we can't fix the timezone of read-only containers
There's nothing we can do about it, hence don't complain.
2016-12-21 19:09:32 +01:00
Lennart Poettering e0f9e7bd03 dissect: make using a generic partition as root partition optional
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.

In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
2016-12-21 19:09:30 +01:00
Lennart Poettering 4ad14eff19 nspawn: restore --volatile=yes support
This was broken by 19caffac75 which remounted the
root directory to MS_SHARED before applying the volatile mount logic. This
broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we
need MS_MOVE in the volatile mount logic to rearrange the directory tree.
Simply swap the order here, apply the volatile logic before we switch to
MS_SHARED.
2016-12-21 19:09:28 +01:00
Evgeny Vereshchagin 5773024d7f nspawn: unref the notify event source (#4941)
Fixes:
```
sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b
...
==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15
==21224==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==21224==    by 0x4F6F565: sd_event_new (sd-event.c:431)
==21224==    by 0x1210BE: run (nspawn.c:3351)
==21224==    by 0x123908: main (nspawn.c:3826)
==21224==
==21224== LEAK SUMMARY:
==21224==    definitely lost: 656 bytes in 1 blocks
==21224==    indirectly lost: 1,788 bytes in 11 blocks
==21224==      possibly lost: 0 bytes in 0 blocks
==21224==    still reachable: 8,344 bytes in 3 blocks
==21224==         suppressed: 0 bytes in 0 blocks
```
Closes #4934
2016-12-21 18:36:15 +01:00
Lennart Poettering 9b6deb03fc dissect: optionally, only look for GPT partition tables, nothing else
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
2016-12-20 20:00:09 +01:00
Lennart Poettering 75bf701f5c nspawn: flush out environment block of the -a stub init process
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.

Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
2016-12-14 18:29:30 +01:00
Andrey Ulanov 6916b16464 nspawn: when getting SIGCHLD make sure it's from the first child (#4855)
When getting SIGCHLD we should not assume that it was the first
child forked from system-nspawn that has died as it may also be coming
from an orphan process. This change adds a signal handler that ignores
SIGCHLD unless it came from the first containerized child - the real
child.

Before this change the problem can be reproduced as follows:

$ sudo systemd-nspawn --directory=/container-root --share-system
Press ^] three times within 1s to kill container.
[root@andreyu-coreos ~]# { true & } &
[1] 22201
[root@andreyu-coreos ~]#
Container root-fedora-latest terminated by signal KILL
2016-12-13 02:38:18 +01:00
Zbigniew Jędrzejewski-Szmek 4a5567d5d6 Merge pull request #4795 from poettering/dissect
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10 01:08:13 -05:00
Wim de With 2e1f244efd nspawn: add missing -E to getopt_long (#4860) 2016-12-10 07:33:58 +03:00
Franck Bui 5367354dae nspawn: resolv.conf might not be created initially (#4799)
This might happen that resolv.conf is missing in a minimal rootfs and in this
case the following warning is emitted:

 Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory

This patch fixes this case.
2016-12-07 21:36:39 +01:00
Lennart Poettering 4623e8e6ac nspawn/dissect: automatically discover dm-verity verity partitions
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.

Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.

mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:

```
 # mkdir test
 # cd test
 # mkosi --verity
 # systemd-nspawn -i ./image.raw -bn
```

Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07 18:38:41 +01:00
Lennart Poettering 4827ab4854 nspawn: when generating a machine name from an image name, truncate .raw suffix
Let's prettify the machine name we generate for image-based containers: let's
chop off the .raw suffix before using it as machine name.
2016-12-07 18:38:41 +01:00
Lennart Poettering 18b5886e56 dissect: add support for encrypted images
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.

It's pretty basic: only support for passphrase-based encryption.

In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.
2016-12-07 18:38:41 +01:00
Lennart Poettering 2d8457851b nspawn: port nspawn to new generalized image dissection code
Let's make use of the new internal API. This mostly doesn't change anything for
the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new
code can handle disk images with no partition tables, and make any detected
images directly the root.
2016-12-07 18:38:40 +01:00
Lennart Poettering cb638b5e96 util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENT
As suggested by @keszybz
2016-12-01 12:49:55 +01:00
Lennart Poettering 86c0dd4a71 nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.

This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
2016-12-01 12:41:18 +01:00
Lennart Poettering e28c7cd066 tree-wide: set SA_RESTART for signal handlers we install
We already set it in most cases, but make sure to set it in all others too, and
document that that's a good idea.
2016-12-01 12:41:17 +01:00
Lennart Poettering ad85779a50 nspawn: split out overlayfs argument parsing into a function of its own
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and
bind_mount_parse().
2016-12-01 00:25:51 +01:00
Lennart Poettering 8d4aa2bb32 nspawn: make use of CHASE_NON_EXISTING when locking image
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
2016-12-01 00:25:51 +01:00
Lennart Poettering c4f4fce79e fs-util: add flags parameter to chase_symlinks()
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
2016-12-01 00:25:51 +01:00
Lennart Poettering 8cd328d82e nspawn: accept --ephemeral --template= as alternative for --ephemeral --directory=
As suggested in PR #3667.

This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.

This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.

Fixes: #3667
2016-12-01 00:25:51 +01:00
Lennart Poettering 3f342ec4b0 nspawn: properly handle image/directory paths that are symlinks
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.

Fixes: #2001
2016-12-01 00:25:51 +01:00
Lennart Poettering e187369587 tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
2016-12-01 00:25:51 +01:00
Lennart Poettering 17cbb288fa nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.

This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.

Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
2016-11-22 13:35:09 +01:00
Lennart Poettering c67b008273 nspawn: remove temporary root directory on exit
When mountint a loopback image, we need a temporary root directory we can mount
stuff to. Make sure to actually remove it when exiting, so that we don't leave
stuff around in /tmp unnecessarily.

See: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering 6a0f896b97 nspawn: try to wait for the container PID 1 to exit, before we exit
Let's make the shutdown logic synchronous, so that there's a better chance to
detach the loopback device after use.
2016-11-22 13:35:09 +01:00
Lennart Poettering 0f3be6ca4d nspawn: support ephemeral boots from images
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.

As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.

Fixes: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering f4ff4aa800 Merge pull request #4395 from s-urbaniak/rw-support
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18 12:36:46 +01:00
Sergiusz Urbaniak 4f086aab52
nspawn: R/W support for /sys, and /proc/sys
This commit adds the possibility to leave /sys, and /proc/sys read-write.
It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE
to enable this feature.

If set to "yes", /sys, and /proc/sys will be read-write.
If set to "no", /sys, and /proc/sys will be read-only.
If set to "network" /proc/sys/net will be read-write. This is useful in
use-cases, where systemd-nspawn is used in an external network
namespace.

This adds the possibility to start privileged containers which need more
control over settings in the /proc, and /sys filesystem.

This is also a follow-up on the discussion from
https://github.com/systemd/systemd/pull/4018#r76971862 where an
introduction of a simple env var to enable R/W support for those
directories was already discussed.
2016-11-18 09:50:40 +01:00
Zbigniew Jędrzejewski-Szmek 2a49b6120f nspawn: restart the whole systemd-nspawn@.service unit on container reboot (#4613)
Since 133 is now used in a few places, add a #define for it.
Also make the status message a bit informative.

Another issue introduced in b006762. The logic was borked, we were supposed
to return 0 to break the loop, and 133 to restart the container, not the other
way around.

But this doesn't seem to work, reboot fails with:
Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists
So actually the version before this patch worked better, since 133 > 0 and we'd
at least loop internally.
2016-11-14 11:49:49 +01:00
Christian Hesse 7debb05dbe nspawn: fix condition for mounting resolv.conf (#4622)
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us
whether or not systemd-resolved is running or not.
So check for /run/systemd/resolve/resolv.conf as well, which is created
at runtime and hence is a better indication.
2016-11-08 22:01:26 -05:00
Zbigniew Jędrzejewski-Szmek a809cee582 Merge pull request #4612 from keszybz/format-strings
Format string tweaks (and a small fix on 32bit)
2016-11-08 08:09:40 -05:00
Martin Pitt cfed63f60d nspawn: fix exit code for --help and --version (#4609)
Commit b006762 inverted the initial exit code which is relevant for --help and
--version without a particular reason.  For these special options, parse_argv()
returns 0 so that our main() immediately skips to the end without adjusting
"ret". Otherwise, if an actual container is being started, ret is set on error
in run(), which still provides the "non-zero exit on error" behaviour.

Fixes #4605.
2016-11-07 23:31:55 -05:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Lennart Poettering 2bce2acce8 nspawn: if we set up a loopback device, try to mount it with "discard"
Let's make sure that our loopback files remain sparse, hence let's set
"discard" as mount option on file systems that support it if the backing device
is a loopback.
2016-11-02 11:39:49 -06:00
Evgeny Vereshchagin 6d66bd3b2a nspawn: become a new root early
036d523641

> vfs: Don't create inodes with a uid or gid unknown to the vfs
  It is expected that filesystems can not represent uids and gids from
  outside of their user namespace.  Keep things simple by not even
  trying to create filesystem nodes with non-sense uids and gids.

So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955

$ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target

Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide.
Press ^] three times within 1s to kill container.
Child died too early.
Selected user namespace base 1073283072 and range 65536.
Failed to mount to /sys/fs/cgroup/systemd: No such file or directory

Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519
Fixes: #4352
2016-10-23 23:23:42 -04:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 24597ee0e6 nspawn, NEWS: add missing "s" in --private-users-chown (#4438) 2016-10-21 06:03:26 +03:00
Evgeny Vereshchagin f0bef277a4 nspawn: cleanup and chown the synced cgroup hierarchy (#4223)
Fixes: #4181
2016-10-13 09:50:46 -04:00
Zbigniew Jędrzejewski-Szmek 60e76d4897 nspawn,mount-util: add [u]mount_verbose and use it in nspawn
This makes it easier to debug failed nspawn invocations:

Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")...
Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")...
Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")...
Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")...
Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")...
Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11 16:50:07 -04:00
Zbigniew Jędrzejewski-Szmek ada5412039 nspawn: simplify arg_us_cgns passing
We would check the condition cg_ns_supported() twice. No functional
change.
2016-10-11 16:46:58 -04:00
Lennart Poettering 6dca2fe325 Merge pull request #4332 from keszybz/nspawn-arguments-3
nspawn --private-users parsing, v2
2016-10-10 19:51:51 +02:00
Evgeny Vereshchagin a0f72a24e0 Merge pull request #4310 from keszybz/nspawn-autodetect
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10 20:47:25 +03:00
Zbigniew Jędrzejewski-Szmek be7157316c nspawn: better error messages for parsing errors
In particular, the check for arg_uid_range <= 0 is moved to the end, so that
"foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek ae209204d8 nspawn,man: fix parsing of numeric args for --private-users, accept any boolean
This is like the previous reverted commit, but any boolean is still accepted,
not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek 6c2058b35e Revert "nspawn: fix parsing of numeric arguments for --private-users"
This reverts commit bfd292ec35.
2016-10-10 11:17:40 -04:00
Zbigniew Jędrzejewski-Szmek bfd292ec35 nspawn: fix parsing of numeric arguments for --private-users
The documentation says lists "yes", "no", "pick", and numeric arguments.
But parse_boolean was attempted first, so various numeric arguments were
misinterpreted.

In particular, this fixes --private-users=0 to mean the same thing as
--private-users=0:65536.

While at it, use strndupa to avoid some error handling.
Also give a better error for an empty UID range. I think it's likely that
people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09 11:52:35 -04:00
Zbigniew Jędrzejewski-Szmek 27eb8e9028 nspawn: reindent table 2016-10-09 11:51:18 -04:00
Zbigniew Jędrzejewski-Szmek a8725a06e6 nspawn: also fall back to legacy cgroup hierarchy for old containers
Current systemd version detection routine cannot detect systemd 230,
only systmed >= 231. This means that we'll still use the legacy hierarchy
in some cases where we wouldn't have too. If somebody figures out a nice
way to detect systemd 230 this can be later improved.
2016-10-08 19:03:53 -04:00
Zbigniew Jędrzejewski-Szmek 0fd9563fde nspawn: use mixed cgroup hierarchy only when container has new systemd
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy.
So make an educated guess, and use the mixed hierarchy in that case.

Tested by running the host with mixed hierarchy (i.e. simply using a recent
kernel with systemd from git), and booting first a container with older systemd,
and then one with a newer systemd.

Fixes #4008.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 27e29a1e43 nspawn: fix spurious reboot if container process returns 133 2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek b006762524 nspawn: move the main loop body out to a new function
The new function has 416 lines by itself!

"return log_error_errno" is used to nicely reduce the volume of error
handling code.

A few minor issues are fixed on the way:
- positive value was used as error value (EIO), causing systemd-nspawn
  to return success, even though it shouldn't.
- In two places random values were used as error status, when the
  actual value was in an unusual place (etc_password_lock, notify_socket).

Those are the only functional changes.

There is another potential issue, which is marked with a comment, and left
unresolved: the container can also return 133 by itself, causing a spurious
reboot.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 98afd6af3a nspawn: check env var first, detect second
If we are going to use the env var to override the detection result
anyway, there is not point in doing the detection, especially that
it can fail.
2016-10-08 14:48:41 -04:00
Lennart Poettering 7429b2eb83 tree-wide: drop some misleading compiler warnings
gcc at some optimization levels thinks thes variables were used without
initialization. it's wrong, but let's make the message go anyway.
2016-10-06 19:04:10 +02:00
Djalal Harouni 41eb436265 nspawn: add log message to let users know that nspawn needs an empty /dev directory (#4226)
Fixes https://github.com/systemd/systemd/issues/3695

At the same time it adds a protection against userns chown of inodes of
a shared mount point.
2016-10-05 06:57:02 +02:00
Alban Crequy 19caffac75 nspawn: set shared propagation mode for the container 2016-10-03 14:19:27 +02:00