Commit graph

815 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 8405dcf752 nspawn: make sure we don't leak the fd in chase_symlinks_and_update
No callers use CHASE_OPEN right now, but let's be defensive.
2018-02-15 10:18:25 +01:00
Lennart Poettering d72495759b tree-wide: port all code to use safe_getcwd() 2018-01-17 11:17:38 +01:00
Lennart Poettering dccca82b1a log: minimize includes in log.h
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.

Let's hence drop inclusion of:

1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
   declaration
4. process-util.h which was needed for getpid_cached() which we now hide
   in a funciton log_emergency_level() instead, which nicely abstracts
   the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
   forward declaration suffices for that too.

Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.

(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
2018-01-11 14:44:31 +01:00
Lennart Poettering 75152a4d6a tree-wide: install matches asynchronously
Let's remove a number of synchronization points from our service
startups: let's drop synchronous match installation, and let's opt for
asynchronous instead.

Also, let's use sd_bus_match_signal() instead of sd_bus_add_match()
where we can.
2018-01-05 13:58:32 +01:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Lennart Poettering b6e1fff13d process-util: add another fork_safe() flag for enabling LOG_ERR/LOG_WARN logging 2018-01-04 13:27:26 +01:00
Lennart Poettering 4c253ed1ca tree-wide: introduce new safe_fork() helper and port everything over
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:

1. Optionally resets signal handlers/mask in the child

2. Sets a name on all processes we fork off right after forking off (and
   the patch assigns useful names for all processes we fork off now,
   following a systematic naming scheme: always enclosed in () – in order
   to indicate that these are not proper, exec()ed processes, but only
   forked off children, and if the process is long-running with only our
   own code, without execve()'ing something else, it gets am "sd-" prefix.)

3. Optionally closes all file descriptors in the child

4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
   way so that the parent dying before this happens being handled
   safely.

5. Optionally reopens the logs

6. Optionally connects stdin/stdout/stderr to /dev/null

7. Debug logs about the forked off processes.
2017-12-25 11:48:21 +01:00
Lennart Poettering ebe6ff658d
Merge pull request #7663 from keszybz/mkdir-return-value
util-lib: fix return value in mkdir_parents()
2017-12-24 11:59:58 +01:00
Yu Watanabe 89ada3ba08 bus-unit-util: add socket unit related options
Also, split bus_append_unit_property_assignment().
2017-12-23 18:48:16 +09:00
Henrik Grindal Bakken cacc0d7a78 nspawn: Include missing.h 2017-12-18 14:15:17 +01:00
Zbigniew Jędrzejewski-Szmek dae8b82eb9 Add mkdir_errno_wrapper() and use instead of mkdir() in various places
We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir
returns the error in errno while mkdir_label returns the error directly.
2017-12-16 13:28:22 +01:00
Zbigniew Jędrzejewski-Szmek bdd2bbc445
Merge pull request #7469 from kinvolk/dongsu/nspawn-netns
nspawn: introduce an option for specifying network namespace path
2017-12-14 22:47:57 +01:00
Lennart Poettering fbd0b64f44
tree-wide: make use of new STRLEN() macro everywhere (#7639)
Let's employ coccinelle to do this for us.

Follow-up for #7625.
2017-12-14 19:02:29 +01:00
Dongsu Park d7bea6b629 nspawn: introduce an option for specifying network namespace path
Add a new option `--network-namespace-path` to systemd-nspawn to allow
users to specify an arbitrary network namespace, e.g. `/run/netns/foo`.
Then systemd-nspawn will open the netns file, pass the fd to
outer_child, and enter the namespace represented by the fd before
running inner_child.

```
$ sudo ip netns add foo
$ mount | grep /run/netns/foo
nsfs on /run/netns/foo type nsfs (rw)
...
$ sudo systemd-nspawn -D /srv/fc27 --network-namespace-path=/run/netns/foo \
  /bin/readlink -f /proc/self/ns/net
/proc/1/ns/net:[4026532009]
```

Note that the option `--network-namespace-path=` cannot be used together
with other network-related options such as `--private-network` so that
the options do not conflict with each other.

Fixes https://github.com/systemd/systemd/issues/7361
2017-12-13 10:21:06 +00:00
Lennart Poettering fba868fa71 tree-wide: unify logging of "Must be root" message
Let's unify this in one call, generalizing must_be_root() from
bootctl.c.
2017-12-11 23:19:45 +01:00
Lennart Poettering 8fd010bb1b nspawn: turn on watchdog logic for nspawn too
It's a long-running daemon, and it's easy to enable, hence do it.
2017-12-07 12:34:46 +01:00
Lennart Poettering 87d5e4f286 build-sys: make the dynamic UID range, and the container UID range configurable
Also, export these ranges in our pkg-config files.
2017-12-06 12:55:37 +01:00
Lennart Poettering de54e02d5e nspawn: when in hybrid mode, chown() both the legacy and the unified hierarchy to the root in the container
If user namespacing is used, let's make sure that the root user in the
container gets access to both /sys/fs/cgroup/systemd and
/sys/fs/cgroup/unified.

This matches similar logic in cg_set_access().
2017-12-05 13:49:13 +01:00
Lennart Poettering 2d3a5a73e0 nspawn: make sure images containing an ESP are compatible with userns -U mode
In -U mode we might need to re-chown() all files and directories to
match the UID shift we want for the image. That's problematic on fat
partitions, such as the ESP (and which is generated by mkosi's
--bootable switch), because fat of course knows no UID/GID file
ownership natively.

With this change we take benefit of the uid= and gid= mount options FAT
knows: instead of chown()ing all files and directories we can just
specify the right UID/GID to use at mount time.

This beefs up the image dissection logic in two ways:

1. First of all support for mounting relevant file systems with
   uid=/gid= is added: when a UID is specified during mount it is used for
   all applicable file systems.

2. Secondly, two new mount flags are added:
   DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY.
   If one is specified the mount routine will either only mount the root
   partition of an image, or all partitions except the root partition.
   This is used by nspawn: first the root partition is mounted, so that
   we can determine the UID shift in use so far, based on ownership of
   the image's root directory. Then, we mount the remaining partitions
   in a second go, this time with the right UID/GID information.
2017-12-05 13:49:12 +01:00
Lennart Poettering 1cfdbe293f cgroup: also include "cgroups.threads" in the list of files to chown
Also, add "cgroups.stat". It's read-only anyway, hence its UID/GID
ownership matters little, but it's probably a good idea to keep it
ownership in sync with the other read-only files such as
"cgroups.controllers".

Also, order the list of files alphabetically.
2017-12-05 13:49:12 +01:00
Lennart Poettering 8199d554c1 nspawn: figure out cgroup mode *after* mounting image
If we operate on a disk image (i.e. --image=) then it's pointless to
look into the mount directory before it is actually mounted to see which
systemd version is running inside...

Unfortunately we only mount the disk image in the child process, but the
parent needs to know the cgroup mode, hence add some IPC for this
purpose and communicate the cgroup mode determined from the image back
to the parent.
2017-12-05 13:49:12 +01:00
Zbigniew Jędrzejewski-Szmek 40fd52f28d util-lib: rename path_check_fstype to path_is_fs_type 2017-11-30 20:43:25 +01:00
Yu Watanabe 62b1e758d3 nspawn: adjust path to static resolv.conf to support split usr
Fixes #7302.
2017-11-25 21:11:07 +09:00
Lennart Poettering d381c8a6bf nspawn: hash the machine name, when looking for a suitable UID base (#7437)
When "-U" is used we look for a UID range we can use for our container.
We start with the UID the tree is already assigned to, and if that
didn't work we'd pick random ranges so far. With this change we'll first
try to hash a suitable range from the container name, and use that if it
works, in order to make UID assignments more likely to be stable.

This follows a similar logic PID 1 follows when using DynamicUser=1.
2017-11-24 20:57:19 +01:00
Lennart Poettering a8027a18f1
Merge pull request #7442 from poettering/scope-fixes
some fixes to the scope unit type
2017-11-24 17:15:09 +01:00
Lennart Poettering f170504825
Merge pull request #7453 from neosilky/coccinelle-fixes
Applied fixes from Coccinelle
2017-11-24 13:29:48 +01:00
Daniel Lockyer f9ecfd3bbe Replace free and reassignment with free_and_replace 2017-11-24 10:33:41 +00:00
Daniel Lockyer 87e4e28dcf Replace empty ternary with helper method 2017-11-24 09:31:08 +00:00
Lennart Poettering abdb9b08f6 nspawn: make use of the RequestStop logic of scope units
Since time began, scope units had a concept of "Controllers", a bus peer
that would be notified when somebody requested a unit to stop. None of
our code used that facility so far, let's change that.

This way, nspawn can print a nice message when somebody invokes
"systemctl stop" on the container's scope unit, and then react with the
right action to shut it down.
2017-11-23 21:47:48 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Lennart Poettering 6925a0de4e cgroup-util: move Set* allocation into cg_kernel_controllers()
Previously, callers had to do this on their own. Let's make the call do
that instead, making the caller code a bit shorter.
2017-11-21 11:54:08 +01:00
Lennart Poettering bf516294c8 nspawn: minor optimization
no need to prepare the target path if we quite the loop anyway one step
later.
2017-11-21 11:54:08 +01:00
Lennart Poettering d7c9693a3e nspawn-mount: rework get_controllers() a bit
Let's rename get_controllers() → get_process_controllers(), in order to
underline the difference to cg_kernel_controllers(). After all, one
returns the controllers available to the process, the other the
controllers enabled in the kernel at all).

Let's also update the code to use read_line() and set_put_strdup() to
shorten the code a bit, and make it more robust.
2017-11-21 11:54:08 +01:00
Lennart Poettering ea9053c5f8 nspawn: rework mount_systemd_cgroup_writable() a bit
We shouldn't call alloca() as part of function calls, that's not really
defined in C. Hence, let's first do our stack allocations, and then
invoke functions.

Also, some coding style fixes, and minor shuffling around.

No functional changes.
2017-11-21 11:54:08 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek 3a726fcd08 Add license headers and SPDX identifiers to meson.build files
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering 3603efdea5 nspawn: make recursive chown()ing logic safe for being aborted in the middle
We currently use the ownership of the top-level directory as a hint
whether we need to descent into the whole tree to chown() it recursively
or not. This is problematic with the previous chown()ing algorithm, as
when descending into the tree we'd first chown() and then descend
further down, which meant that the top-level directory would be chowned
first, and an aborted recursive chowning would appear on the next
invocation as successful, even though it was not. Let's reshuffle things
a bit, to make the re-chown()ing safe regarding interruptions:

a) We chown() the dir we are looking at last, and descent into all its
   children first. That way we know that if the top-level dir is
   properly owned everything inside of it is properly owned too.

b) Before starting a chown()ing operation, we mark the top-level
   directory as owned by a special "busy" UID range, which we can use to
   recognize whether a tree was fully chowned: if it is marked as busy,
   it's definitely not fully chowned, as the busy ownership will only be
   fixed as final step of the chowning.

Fixes: #6292
2017-11-17 11:12:33 +01:00
Lennart Poettering 14f8ccc755 nspawn: add missing #pragma once to header file 2017-11-17 11:12:33 +01:00
Lennart Poettering 0986658d51
Merge pull request #6866 from sourcejedi/set-linger2
logind: fix `loginctl enable-linger`
2017-11-15 11:15:15 +01:00
Lennart Poettering bcde742e78 conf-parser: turn three bool function params into a flags fields
This makes things more readable and fixes some issues with incorrect
flag propagation between the various flavours of config_parse().
2017-11-13 10:24:03 +01:00
Lennart Poettering 759aaedc5c dissect: when we invoke dissection on a loop device with partscan help the user
This adds some simply detection logic for cases where dissection is
invoked on an externally created loop device, and partitions have been
detected on it, but partition scanning so far was off. If this is
detected we now print a brief message indicating what the issue is,
instead of failing with a useless EINVAL message the kernel passed to
us.
2017-10-26 17:54:56 +02:00
Lennart Poettering eb38edce88 machine-image: add partial discovery of block devices as images
This adds some basic discovery of block device images for nspawn and
friends. Note that this doesn't add searching for block devices using
udev, but instead expects users to symlink relevant block devices into
/var/lib/machines. Discovery is hence done exactly like for
dir/subvol/raw file images, except that what is found may be a (symlink
to) a block device.

For now, we do not support cloning these images, but removal, renaming
and read-only flags are supported to the point where that makes sense.

Fixe: #6990
2017-10-26 17:54:56 +02:00
Lauri Tirkkonen 4f13e53428 nspawn: EROFS for chowning mount points is not fatal (#7122)
This fixes --read-only with --private-users. mkdir_userns_p may return
-EROFS if either mkdir or lchown fails; lchown failing is fine as the
mount point will just be overmounted, and if mkdir fails then the
following mount() will also fail (with ENOENT).
2017-10-24 19:40:50 +02:00
myrkr 1898e5f9a3 nspawn: Fix calculation of capabilities for configuration file (#7087)
The current code shifting an integer 1 failed for capabilities like
CAP_MAC_ADMIN (numerical value 33). This caused issues when specifying
them in the nspawn configuration file. Using an uint64_t 1 instead.

The similar code for processing the --capability command line option
was already correctly working.
2017-10-24 09:56:40 +02:00
Alan Jenkins 8d9c2bca41 nspawn: comment to acknowledge lying about "user session" 2017-10-18 09:47:10 +01:00
Yu Watanabe c31ad02403 mkdir: introduce follow_symlink flag to mkdir_safe{,_label}() 2017-10-06 16:03:33 +09:00
Lennart Poettering 44898c5358 seccomp: add three more seccomp groups
@aio → asynchronous IO calls
@sync → msync/fsync/... and friends
@chown → changing file ownership

(Also, change @privileged to reference @chown now, instead of the
individual syscalls it contains)
2017-10-05 15:42:48 +02:00
Lennart Poettering 4c3a917617 seccomp: include prlimit64 and ugetrlimit in @default
Also, move prlimit64() out of @resources.

prlimit64() may be used both for getting and setting resource limits, and
is implicitly called by glibc at various places, on some archs, the same
was as getrlimit(). SImilar, igetrlimit() is an arch-specific
replacement for getrlimit(), and hence should be whitelisted at the same
place as getrlimit() and prlimit64().

Also see: https://lists.freedesktop.org/archives/systemd-devel/2017-September/039543.html
2017-10-05 11:27:34 +02:00
Zbigniew Jędrzejewski-Szmek 349cc4a507 build-sys: use #if Y instead of #ifdef Y everywhere
The advantage is that is the name is mispellt, cpp will warn us.

$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build

squash! build-sys: use #if Y instead of #ifdef Y everywhere

v2:
- fix incorrect setting of HAVE_LIBIDN2
2017-10-04 12:09:29 +02:00
Djalal Harouni 09d3020b0a seccomp: remove '@credentials' syscall set (#6958)
This removes the '@credentials' syscall set that was added in commit
v234-468-gcd0ddf6f75.

Most of these syscalls are so simple that we do not want to filter them.
They work on the current calling process, doing only read operations,
they do not have a deep kernel path.

The problem may only be in 'capget' syscall since it can query arbitrary
processes, and used to discover processes, however sending signal 0 to
arbitrary processes can be used to discover if a process exists or not.
It is unfortunate that Linux allows to query processes of different
users. Lets put it now in '@process' syscall set, and later we may add
it to a new '@basic-process' set that allows most basic process
operations.
2017-10-03 07:20:05 +02:00
Lennart Poettering 64fbdc0f91 nspawn: properly report all kinds of changed UID/GID when patching things for userns
We forgot to propagate one chmod().
2017-10-02 17:41:43 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Lennart Poettering 8e5430c4bd nspawn: set up a new session keyring for the container process
keyring material should not leak into the container. So far we relied on
seccomp to deny access to the keyring, but given that we now made the
seccomp configurable, and access to keyctl() and friends may optionally
be permitted to containers now let's make sure we disconnect the callers
keyring from the keyring of PID 1 in the container.
2017-09-22 15:28:04 +02:00
Lennart Poettering 96bedbe2e5 nspawn: replace syscall blacklist by a whitelist
Let's lock things down a bit, and maintain a list of what's permitted
rather than a list of what's prohibited in nspawn (also to make things a
bit more like Docker and friends).

Note that this slightly alters the effect of --system-call-filter=, as
now the negative list now takes precedence over the positive list.
However, given that the option is just a few days old and not included
in any released version it should be fine to change it at this point in
time.

Note that the whitelist is good chunk more restrictive thatn the
previous blacklist. Specifically:

- fanotify is not permitted (given the buffer size issues it's
  problematic in containers)
- nfsservctl is not permitted (NFS server support is not virtualized)
- pkey_xyz stuff is not permitted (really new stuff I don't grok)
- @cpu-emulation is prohibited (untested legacy stuff mostly, and if
  people really want to run dosemu in nspawn, they should use
  --system-call-filter=@cpu-emulation and all should be good)
2017-09-14 15:45:21 +02:00
Lennart Poettering 960e4569e1 nspawn: implement configurable syscall whitelisting/blacklisting
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.

This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.

(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)

Fixes: #5163

Replaces: #5944
2017-09-12 14:06:21 +02:00
Lennart Poettering 7609340e2f nspawn: replace homegrown seccomp filter table largely with references to the existing syscall groups
Let's shorten the table, now that we are hooked up to the syscall group
system.
2017-09-11 18:00:07 +02:00
Lennart Poettering 402530d91e nspawn: part over seccomp code to use seccomp_add_syscall_filter_item()
Let's unify a bit of the code here.
2017-09-11 18:00:07 +02:00
Lennart Poettering 21022b9dde util-lib: wrap personality() to fix up broken glibc error handling (#6766)
glibc appears to propagate different errors in different ways, let's fix
this up, so that our own code doesn't get confused by this.

See #6752 + #6737 for details.

Fixes: #6755
2017-09-08 17:16:29 +03:00
Zbigniew Jędrzejewski-Szmek b167945935 nspawn: do not mount /sys/fs/kdbus 2017-07-23 12:03:00 -04:00
Lennart Poettering 8cb5743079 nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416)
Given that we set NOTIFY_SOCKET unconditionally it's not surprising that
processes way down the process tree think it's smart to send us a
notification message.

It's still useful to keep this message, for debugging things, but it
shouldn't be generated by default.
2017-07-20 14:46:58 -04:00
Zbigniew Jędrzejewski-Szmek e5f752082e build-sys: drop gitignore patterns for in-tree builds
... and other autotools-generated files.
2017-07-18 10:05:06 -04:00
Zbigniew Jędrzejewski-Szmek 72cdb3e783 build-sys: drop automake support
v2:
- also mention m4
2017-07-18 10:04:44 -04:00
Lennart Poettering 3dad4f0666 Merge pull request #6257 from keszybz/unnecessary-job-log
core: do not print color console message about gc-ed jobs
2017-07-03 10:48:28 +02:00
Zbigniew Jędrzejewski-Szmek 0a5706d143 nspawn: wait for the scope to be created (#6261)
Fixes #6253.
2017-07-03 07:59:49 +02:00
Zbigniew Jędrzejewski-Szmek bd68e99bd0 Be slightly more verbose in error message
Including the full path is always useful.

Also use PID_FMT in one more place.
2017-07-02 12:03:56 -04:00
Lennart Poettering cd2dfc6fae nspawn: register a scope for the unit if --register=no is specified (#6166)
Previously, only when --register=yes was set (the default) the invoked
container would get its own scope, created by machined on behalf of
nspawn. With this change if --register=no is set nspawn will still get
its own scope (which is a good thing, so that --slice= and --property=
take effect), but this is not done through machined but by registering a
scope unit directly in PID 1.

Summary:

--register=yes             → allocate a new scope through machined (the default)
--register=yes --keep-unit → use the unit we are already running in an register with machined
--register=no              → allocate a new scope directly, but no machined
--register=no --keep-unit  → do not allocate nor register anything

Fixes: #5823
2017-06-28 13:22:46 -04:00
Lennart Poettering a462478539 nspawn: make sure to send SIGTERM/SIGHUP to the main nspawn process if stubinit receives SIGRTMIN+3 (#6167)
This code already existed in some form, however commented. Remove the
comments, as this was most likely simply a forgotten commenting for
debugging purposes.

This also extends the logic a bit, by sending SIGHUP right after the
SIGTERM, so that shells will also terminate, when PID 1 gets a
SIGRTMIN+3.

Fixes: #5711
2017-06-22 22:20:09 -04:00
tomty89 e8a94ce83e nspawn: add nosuid and nodev to /tmp mount (#6004)
When automatic /tmp mount was introduced to nspawn in v219, it was done without having the nosuid and nodev mount options, which was the same case as systemd's default tmp.mount unit back then.

nosuid and nodev was added to tmp.mount(.m4) in v231 for security reasons. matching the nspawn /tmp mount entry against that.

Ref.:
2f9df7c96a
bbb99c30d0
2017-05-23 09:41:36 +02:00
Lennart Poettering 401a38e770 Merge pull request #5958 from keszybz/explicit-log-errno
Use explicit errno in log calls
2017-05-22 10:12:18 +02:00
Matija Skala fe9938888b Fix includes (#5980)
Needed on musl.
2017-05-19 10:01:35 -04:00
Zbigniew Jędrzejewski-Szmek 35bca925f9 tree-wide: fix incorrect uses of %m
In those cases errno was not set, so we would be logging some unrelated error
or "Success".
2017-05-13 15:42:26 -04:00
Zbigniew Jędrzejewski-Szmek ab8ee0f259 tree-wide: use SET_FLAG in more places (#5892) 2017-05-07 07:03:28 -04:00
Lennart Poettering a7c8991383 Merge pull request #5801 from keszybz/help-error
nspawn,cgtop: make sure --version, --help always work
2017-04-29 12:30:29 +02:00
Zbigniew Jędrzejewski-Szmek 399e391fa6 nspawn: check cgroups after parsing options
Same justification as in previous commit.
2017-04-25 08:54:00 -04:00
Zbigniew Jędrzejewski-Szmek 37efbbd821 meson: reindent all files with 8 spaces
The indentation for emacs'es meson-mode is added .dir-locals.

All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
2017-04-23 21:47:29 -04:00
Zbigniew Jędrzejewski-Szmek 69e96427a2 meson: define tests
Tests can be run with 'ninja-build test' or using 'mesontest'.
'-Dtests=unsafe' can be used to include the "unsafe" tests in the
test suite, same as with autotools.

v2:
- use more conf.get guards are optional components
- declare deps on generated headers for test-{af,arphrd,cap}-list

v3:
- define environment for tests

  Most test don't need this, but to be consistent with autotools-based build, and
  to avoid questions which tests need it and which don't, set the same environment
  for all tests.

v4:
- rework test generation

  Use a list of lists to define each test. This way we can reduce the
  boilerplate somewhat, although the test listings are still pretty verbose. We
  can also move the definitions of the tests to the subdirs. Unfortunately some
  subdirs are included earlier than some of the libraries that test binaries
  are linked to.  So just dump all definitions of all tests that cannot be
  defined earlier into src/test. The `executable` definitions are still at the
  top level, so the binaries are compiled into the build root.

v5:
- tag test-dnssec-complex as manual

v6:
- fix HAVE_LIBZ typo
- add missing libgobject/libgio defs
- mark test-qcow2 as manual
2017-04-23 21:47:26 -04:00
Zbigniew Jędrzejewski-Szmek 5c23128dab meson: build systemd using meson
It's crucial that we can build systemd using VS2010!

... er, wait, no, that's not the official reason. We need to shed old systems
by requring python 3! Oh, no, it's something else. Maybe we need to throw out
345 years of knowlege accumulated in autotools? Whatever, this new thing is
cool and shiny, let's use it.

This is not complete, I'm throwing it out here for your amusement and critique.

- rules for sd-boot are missing. Those might be quite complicated.

- rules for tests are missing too. Those are probably quite simple and
  repetitive, but there's lots of them.

- it's likely that I didn't get all the conditions right, I only tested "full"
  compilation where most deps are provided and nothing is disabled.

- busname.target and all .busname units are skipped on purpose.

  Otherwise, installation into $DESTDIR has the same list of files and the
  autoconf install, except for .la files.

It'd be great if people had a careful look at all the library linking options.
I added stuff until things compiled, and in the end there's much less linking
then in the old system. But it seems that there's still a lot of unnecessary
deps.

meson has a `shared_module` statement, which sounds like something appropriate
for our nss and pam modules. Unfortunately, I couldn't get it to work. For the
nss modules, we need an .so version of '2', but `shared_module` disallows the
version argument. For the pam module, it also didn't work, I forgot the reason.

The handling of .m4 and .in and .m4.in files is rather awkward. It's likely
that this could be simplified. If make support is ever dropped, I think it'd
make sense to switch to a different templating system so that two different
languages and not required, which would make everything simpler yet.

v2:
- use get_pkgconfig_variable
- use sh not bash
- use add_project_arguments

v3:
- drop required:true and fix progs/prog typo

v4:
- use find_library('bz2')
- add TTY_GID definition
- define __SANE_USERSPACE_TYPES__
- use join_paths(prefix, ...) is used on all paths to make them all absolute

v5:
- replace all declare_dependency's with []
- add more conf.get guards around optional components

v6:
- drop -pipe, -Wall which are the default in meson
- use compiler.has_function() and compiler.has_header_symbol instead of the
  hand-rolled checks.
- fix duplication in 'liblibsystemd' library name
- use the right .sym file for pam_systemd
- rename 'compiler' to 'cc': shorter, and more idiomatic.

v7:
- use ENABLE_ENVIRONMENT_D not HAVE_ENVIRONMENT_D
- rename prefix to prefixdir, rootprefix to rootprefixdir
  ("prefix" is too common of a name and too easy to overwrite by mistake)
- wrap more stuff with conf.get('ENABLE...') == 1
- use rootprefix=='/' and rootbindir as install_dir, to fix paths under
  split-usr==true.

v8:
- use .split() also for src/coredump. Now everything is consistent ;)
- add rootlibdir option and use it on the libraries that require it

v9:
- indentation

v10:
- fix check for qrencode and libaudit

v11:
- unify handling of executable paths, provide options for all progs

  This makes the meson build behave slightly differently than the
  autoconf-based one, because we always first try to find the executable in the
  filesystem, and fall back to the default. I think different handling of
  loadkeys, setfont, and telinit was just a historical accident.

  In addition to checking in $PATH, also check /usr/sbin/, /sbin for programs.
  In Fedora $PATH includes /usr/sbin, (and /sbin is is a symlink to /usr/sbin),
  but in Debian, those directories are not included in the path.

  C.f. https://github.com/mesonbuild/meson/issues/1576.

- call all the options 'xxx-path' for clarity.
- sort man/rules/meson.build properly so it's stable
2017-04-23 21:47:26 -04:00
Lennart Poettering 948a3241de Merge pull request #5708 from vcatechnology/arm-cross-compile
ARM32 cross-compile fixes
2017-04-17 15:49:06 +02:00
Matt Clarkson 6b5cf3ea62 build-sys: correct blkid.h includes
When using pkg-config to determine the include flags for blkid the
flags are returned as:

    $ pkg-config blkid --cflags
    -I/usr/include/blkid -I/usr/include/uuid

We use the <blkid/blkid.h> include which would be correct when using
the default compiler /usr/include header search path. However, when
cross-compiling the blkid.h will not be installed at /usr/include and
highly likely in a temporary system root. It is futher compounded if
the cross-compile packages are split up and the blkid package is not
available in the same sysroot as the compiler.

Regardless of the compilation setup, the correct include path should be
<blkid.h> if using the pkg-config returned CFLAGS.
2017-04-06 14:33:02 +01:00
David Michael 7357272ed1 nspawn: check if the DNS stub is listening for requests 2017-03-31 11:34:32 -07:00
Zbigniew Jędrzejewski-Szmek 78e4f19ebc Merge pull request #5444 from poettering/cgroups-revert-no-error
Revert "core: simplify cg_[all_]unified()" and more.
2017-02-24 18:48:57 -05:00
AsciiWolf 13e785f7a0 Fix missing space in comments (#5439) 2017-02-24 18:14:02 +01:00
Lennart Poettering c22800e40e cgroup: rename cg_unified() → cg_unified_controller()
cg_unified() is a bit generic a name, let's make clear that it checks
whether a specified controller is in unified mode.
2017-02-24 18:00:04 +01:00
Lennart Poettering b4cccbc13a cgroup: change cg_unified() to possibly return errors again
We use our cgroup APIs in various contexts, including from our libraries
sd-login, sd-bus. As we don#t control those environments we can't rely
that the unified cgroup setup logic succeeds, and hence really shouldn't
assert on it.

This more or less reverts 415fc41cea.
2017-02-24 17:52:58 +01:00
Tejun Heo 2977724b09 core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd hierarchy
Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1
name=systemd hierarchy.  While this works fine for systemd itself, it breaks
tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd.

This patch updates the hybrid mode so that it mounts v2 hierarchy on
/sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on
/sys/fs/cgroup/systemd for compatibility.  systemd itself doesn't depend on the
"name=systemd" hierarchy at all.  All operations take place on the v2 hierarchy
as before but the v1 hierarchy is kept in sync so that any tools which expect
it to be there can keep doing so.  This allows systemd to take advantage of
cgroup v2 process management without requiring other tools to be aware of the
hybrid mode.

The hybrid mode is implemented by mapping the special systemd controller to
/sys/fs/cgroup/unified and making the basic cgroup utility operations -
cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the
/sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated.

While a bit messy, this will allow dropping complications from using cgroup v1
for process management a lot sooner than otherwise possible which should make
it a net gain in terms of maintainability.

v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount
    point to /sys/fs/cgroup/unified as suggested by @brauner.

v3: chown the compat hierarchy too on delegation.  Suggested by @evverx.

v4: [zj]
- drop the change to default, full "legacy" is still the default.
2017-02-20 12:28:35 -05:00
Tejun Heo 415fc41cea core: simplify cg_[all_]unified()
cg_[all_]unified() test whether a specific controller or all controllers are on
the unified hierarchy.  While what's being asked is a simple binary question,
the callers must assume that the functions may fail any time, which
unnecessarily complicates their usages.  This complication is unnecessary.
Internally, the test result is cached anyway and there are only a few places
where the test actually needs to be performed.

This patch simplifies cg_[all_]unified().

* cg_[all_]unified() are updated to return bool.  If the result can't be
  decided, assertion failure is triggered.  Error handlings from their callers
  are dropped.

* cg_unified_flush() is updated to calculate the new result synchrnously and
  return whether it succeeded or not.  Places which need to flush the test
  result are updated to test for failure.  This ensures that all the following
  cg_[all_]unified() tests succeed.

* Places which expected possible cg_[all_]unified() failures are updated to
  call and test cg_unified_flush() before calling cg_[all_]unified().  This
  includes functions used while setting up mounts during boot and
  manager_setup_cgroup().
2017-02-18 17:51:13 -05:00
Tejun Heo bd15ab41a1 nspawn: fix cgroup mode detection
cgroup mode detection is broken in two different ways.

* detect_unified_cgroup_hierarchy() is called too nested in outer_child().
  sync_cgroup() which is used by run() also needs to know the requested cgroup
  mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN.  This makes it
  skip syncing the inner cgroup hierarchy on some config combinations.

   $ cat /proc/self/cgroup | grep systemd
   1:name=systemd:/user.slice/user-0.slice/session-c1.scope

   $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   ...
   [root@container ~]# cat /proc/self/cgroup | grep systemd
   1:name=systemd:/machine.slice/machine-container.x86_64.scope
   $ exit

   $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   [root@container ~]# cat /proc/self/cgroup | grep 0::
   0::/
   $ exit

  Note how the unified hierarchy case's path is not synchronized with the host.
  This for example can cause issues when there are multiple such containers.

  Fixed by moving detect_unified_cgroup_hierarchy() invocation to main().

* inner_child() was invoking cg_unified_flush().  inner_child() executes fully
  scoped and can't determine which cgroup mode the host was in.  It doesn't
  make sense to keep flushing the detected mode when the host mode can't
  change.

  Fixed by replacing cg_unified_flush() invocations in outer_child() and
  inner_child() with one in main().
2017-02-18 17:49:06 -05:00
Zbigniew Jędrzejewski-Szmek 581a07f9f0 Merge pull request #5369 from poettering/nspawn-resolved
fixes for running nspawn+resolved in combination
2017-02-18 11:54:34 -05:00
Lennart Poettering b053cd5f8e nspawn: tweak check whether resolved is around a bit
Let's check D-Bus instead of files in /run to see if resolved is
running. This is a bit nicer as bus names are automatically cleaned up
when resolved dies, which is not the case for files in /run.

See: #4649
2017-02-17 16:06:31 -05:00
Lennart Poettering 1c876927e4 copy: change the various copy_xyz() calls to take a unified flags parameter
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
2017-02-17 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek fc6149a6ce Merge pull request #4962 from poettering/root-directory-2
Add new MountAPIVFS= boolean unit file setting + RootImage=
2017-02-08 23:05:05 -05:00
Philip Withnall b53ede699c nspawn: Add support for sysroot pivoting (#5258)
Add a new --pivot-root argument to systemd-nspawn, which specifies a
directory to pivot to / inside the container; while the original / is
pivoted to another specified directory (if provided). This adds
support for booting container images which may contain several bootable
sysroots, as is common with OSTree disk images. When these disk images
are booted on real hardware, ostree-prepare-root is run in conjunction
with sysroot.mount in the initramfs to achieve the same results.
2017-02-08 16:54:31 +01:00
Lennart Poettering 78ebe98061 core,nspawn,dissect: make nspawn's .roothash file search reusable
This makes nspawn's logic of automatically discovering the root hash of
an image file generic, and then reuses it in systemd-dissect and in
PID1's RootImage= logic, so that verity is automatically set up whenever
we can.
2017-02-07 12:21:28 +01:00
Lennart Poettering ced58da749 nspawn: shown exec() command is misleading
There's no point in updating exec_target for each binary we try to
execute, if we override it right-away anyway... Let's just do this once,
and include all binaries we try each time.

Follow-up for 1a68e1e543.
2017-02-02 20:10:28 +01:00
Lennart Poettering 49bfc8774b fs-util: unify code we use to check if dirent's d_name is "." or ".."
We use different idioms at different places. Let's replace this is the
one true new idiom, that is even a bit faster...
2017-02-02 00:06:18 +01:00
Philip Withnall 1a68e1e543 nspawn: Print attempted execv() path on failure (#5199)
The failure message is typically currently:
   execv() failed: No such file or directory
which is not very useful because it doesn’t tell you which file or
directory it was trying to exec.
2017-02-01 08:36:16 -05:00
Zbigniew Jędrzejewski-Szmek ec251fe7d5 tree-wide: adjust fall through comments so that gcc is happy
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.

(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
2017-01-31 14:04:55 -05:00
Zbigniew Jędrzejewski-Szmek 9ce6d1b319 nspawn: fix clobbering of selinux context arg
First bug fixed by gcc 7. Yikes.
2017-01-31 14:04:55 -05:00
Stefan Schweter 1a012455c2 tree-wide: remove consecutive duplicate words in comments (#5148) 2017-01-24 21:45:30 -05:00
Djalal Harouni 0819dd72df Merge pull request #5098 from evverx/fix-nspawn-notifications
nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
2017-01-18 14:36:07 +01:00
Zbigniew Jędrzejewski-Szmek 5b3637b44a Merge pull request #4991 from poettering/seccomp-fix 2017-01-17 23:10:46 -05:00
Lennart Poettering 469830d142 seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.

So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.

This rework hence changes a couple of things:

- We no longer use seccomp_rule_add(), but only
  seccomp_rule_add_exact(), and fail the installation of a filter if the
  architecture doesn't support it.

- We no longer rely on adding multiple syscall architectures to a single filter,
  but instead install a separate filter for each syscall architecture
  supported. This way, we can install a strict filter for x86-64, while
  permitting a less strict filter for i386.

- All high-level filter additions are now moved from execute.c to
  seccomp-util.c, so that we can test them independently of the service
  execution logic.

- Tests have been added for all types of our seccomp filters.

- SystemCallFilters= and SystemCallArchitectures= are now implemented in
  independent filters and installation logic, as they semantically are
  very much independent of each other.

Fixes: #4575
2017-01-17 22:14:27 -05:00
Evgeny Vereshchagin adc7d9f0da nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
Fixes #4944
2017-01-17 08:40:05 +00:00
Zbigniew Jędrzejewski-Szmek e0489532fd nspawn: fix memleak
CID #1368262: fn is allocated with new, so it should be freed.
2017-01-15 16:57:57 -05:00
Zbigniew Jędrzejewski-Szmek 6b3d378331 Merge pull request #4879 from poettering/systemd 2017-01-14 21:29:27 -05:00
Mike Gilbert c9f7b4d356 build-sys: add check for gperf lookup function signature (#5055)
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.

Fixes: https://github.com/systemd/systemd/issues/5039
2017-01-10 08:39:05 +01:00
Lennart Poettering 8dbf71ec58 nspawn: reword notice when /dev is pre-mounted and populated (#4971)
Fixes: #4676
2016-12-29 11:02:39 +01:00
Lennart Poettering 87447ae459 nspawn: tweaks to /etc/resolv.conf management
Handle properly if /etc is a symlink (i.e. make sure we don't follow the
symlink outside the image). Also follow /etc/resolv.conf if it is a
symlink, and use the resolved path when creating a mount point and
mounting (as both of these operations follow symlinks and rally
shouldn't).

Handle more types of read-only errors as debug-level issues.
2016-12-21 19:09:32 +01:00
Lennart Poettering 8ccf7e9e96 nspawn: don't complain when we can't fix the timezone of read-only containers
There's nothing we can do about it, hence don't complain.
2016-12-21 19:09:32 +01:00
Lennart Poettering e0f9e7bd03 dissect: make using a generic partition as root partition optional
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.

In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
2016-12-21 19:09:30 +01:00
Lennart Poettering 4ad14eff19 nspawn: restore --volatile=yes support
This was broken by 19caffac75 which remounted the
root directory to MS_SHARED before applying the volatile mount logic. This
broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we
need MS_MOVE in the volatile mount logic to rearrange the directory tree.
Simply swap the order here, apply the volatile logic before we switch to
MS_SHARED.
2016-12-21 19:09:28 +01:00
Evgeny Vereshchagin 5773024d7f nspawn: unref the notify event source (#4941)
Fixes:
```
sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b
...
==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15
==21224==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==21224==    by 0x4F6F565: sd_event_new (sd-event.c:431)
==21224==    by 0x1210BE: run (nspawn.c:3351)
==21224==    by 0x123908: main (nspawn.c:3826)
==21224==
==21224== LEAK SUMMARY:
==21224==    definitely lost: 656 bytes in 1 blocks
==21224==    indirectly lost: 1,788 bytes in 11 blocks
==21224==      possibly lost: 0 bytes in 0 blocks
==21224==    still reachable: 8,344 bytes in 3 blocks
==21224==         suppressed: 0 bytes in 0 blocks
```
Closes #4934
2016-12-21 18:36:15 +01:00
Lennart Poettering 9b6deb03fc dissect: optionally, only look for GPT partition tables, nothing else
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
2016-12-20 20:00:09 +01:00
Lennart Poettering a4c35b6b4d nspawn: split out VolatileMode definitions
This moves the VolatileMode enum and its helper functions to src/shared/. This
is useful to then reuse them to implement systemd.volatile= in a later commit.
2016-12-20 20:00:08 +01:00
Lennart Poettering 75bf701f5c nspawn: flush out environment block of the -a stub init process
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.

Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
2016-12-14 18:29:30 +01:00
Andrey Ulanov 6916b16464 nspawn: when getting SIGCHLD make sure it's from the first child (#4855)
When getting SIGCHLD we should not assume that it was the first
child forked from system-nspawn that has died as it may also be coming
from an orphan process. This change adds a signal handler that ignores
SIGCHLD unless it came from the first containerized child - the real
child.

Before this change the problem can be reproduced as follows:

$ sudo systemd-nspawn --directory=/container-root --share-system
Press ^] three times within 1s to kill container.
[root@andreyu-coreos ~]# { true & } &
[1] 22201
[root@andreyu-coreos ~]#
Container root-fedora-latest terminated by signal KILL
2016-12-13 02:38:18 +01:00
Zbigniew Jędrzejewski-Szmek 4a5567d5d6 Merge pull request #4795 from poettering/dissect
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10 01:08:13 -05:00
Wim de With 2e1f244efd nspawn: add missing -E to getopt_long (#4860) 2016-12-10 07:33:58 +03:00
Franck Bui 5367354dae nspawn: resolv.conf might not be created initially (#4799)
This might happen that resolv.conf is missing in a minimal rootfs and in this
case the following warning is emitted:

 Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory

This patch fixes this case.
2016-12-07 21:36:39 +01:00
Lennart Poettering 4623e8e6ac nspawn/dissect: automatically discover dm-verity verity partitions
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.

Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.

mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:

```
 # mkdir test
 # cd test
 # mkosi --verity
 # systemd-nspawn -i ./image.raw -bn
```

Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07 18:38:41 +01:00
Lennart Poettering 4827ab4854 nspawn: when generating a machine name from an image name, truncate .raw suffix
Let's prettify the machine name we generate for image-based containers: let's
chop off the .raw suffix before using it as machine name.
2016-12-07 18:38:41 +01:00
Lennart Poettering 18b5886e56 dissect: add support for encrypted images
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.

It's pretty basic: only support for passphrase-based encryption.

In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.
2016-12-07 18:38:41 +01:00
Lennart Poettering 2d8457851b nspawn: port nspawn to new generalized image dissection code
Let's make use of the new internal API. This mostly doesn't change anything for
the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new
code can handle disk images with no partition tables, and make any detected
images directly the root.
2016-12-07 18:38:40 +01:00
Susant Sahani 10452f7c93 core: introduce parse_ip_port (#4825)
1. Listed in TODO.
2. Tree wide replace safe_atou16 with parse_ip_port incase
   it's used for ports.
2016-12-06 12:21:45 +01:00
Evgeny Vereshchagin c9fd987279 nspawn: don't hide --bind=/tmp/* mounts (#4824)
Fixes #4789
2016-12-05 18:14:05 +01:00
Lennart Poettering cb638b5e96 util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENT
As suggested by @keszybz
2016-12-01 12:49:55 +01:00
Lennart Poettering ec57bd426a nspawn: improve log messages
When complaining about the inability to resolve a path, show the full path, not
just the relative one.

As suggested by @keszybz.
2016-12-01 12:41:18 +01:00
Lennart Poettering c7a4890ce4 nspawn: optionally, automatically allocated --bind=/--overlay source from /var/tmp
This extends the --bind= and --overlay= syntax so that an empty string as source/upper
directory is taken as request to automatically allocate a temporary directory
below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination
with the "+" path extension this permits a switch "--overlay=+/var::/var" in
order to use the container's shipped /var, combine it with a writable temporary
directory and mount it to the runtime /var of the container.
2016-12-01 12:41:18 +01:00
Lennart Poettering 86c0dd4a71 nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.

This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
2016-12-01 12:41:18 +01:00
Lennart Poettering e28c7cd066 tree-wide: set SA_RESTART for signal handlers we install
We already set it in most cases, but make sure to set it in all others too, and
document that that's a good idea.
2016-12-01 12:41:17 +01:00
Lennart Poettering 7b4318b6a5 nspawn: add ability to configure overlay mounts to .nspawn files
Fixes: #4634
2016-12-01 12:41:17 +01:00
Lennart Poettering ad85779a50 nspawn: split out overlayfs argument parsing into a function of its own
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and
bind_mount_parse().
2016-12-01 00:25:51 +01:00
Lennart Poettering 48cbe5f80b nspawn: use -ENOMEM instead of log_oom() in one case
The function is of the "library" kind and doesn't log ENOMEM in all other
cases, hence fix the one outlier.
2016-12-01 00:25:51 +01:00
Lennart Poettering 8d4aa2bb32 nspawn: make use of CHASE_NON_EXISTING when locking image
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
2016-12-01 00:25:51 +01:00
Lennart Poettering 8ce48cf0f8 nspawn: use the new CHASE_NON_EXISTING flag when resolving mount points
This restores the ability to implicitly create files/directories to mount
specified mount points on.
2016-12-01 00:25:51 +01:00
Lennart Poettering c4f4fce79e fs-util: add flags parameter to chase_symlinks()
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
2016-12-01 00:25:51 +01:00
Lennart Poettering 68cf43c315 nspawn: use chase_symlinks() on all paths specified via --tmpfs=, --bind= and so on
Fixes: #2860
2016-12-01 00:25:51 +01:00
Lennart Poettering 4da92e5857 nspawn: coding style: don't mix variable declarations and function calls 2016-12-01 00:25:51 +01:00
Lennart Poettering 5639193139 nspawn: use realloc_multiply() where it makes sense 2016-12-01 00:25:51 +01:00
Lennart Poettering 8cd328d82e nspawn: accept --ephemeral --template= as alternative for --ephemeral --directory=
As suggested in PR #3667.

This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.

This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.

Fixes: #3667
2016-12-01 00:25:51 +01:00
Lennart Poettering 3f342ec4b0 nspawn: properly handle image/directory paths that are symlinks
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.

Fixes: #2001
2016-12-01 00:25:51 +01:00
Lennart Poettering e187369587 tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
2016-12-01 00:25:51 +01:00
Lennart Poettering acbbf69b71 nspawn: don't require chown() if userns is not on
Fixes: #4711
2016-11-22 13:35:24 +01:00
Lennart Poettering 17cbb288fa nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.

This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.

Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
2016-11-22 13:35:09 +01:00
Lennart Poettering c67b008273 nspawn: remove temporary root directory on exit
When mountint a loopback image, we need a temporary root directory we can mount
stuff to. Make sure to actually remove it when exiting, so that we don't leave
stuff around in /tmp unnecessarily.

See: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering 6a0f896b97 nspawn: try to wait for the container PID 1 to exit, before we exit
Let's make the shutdown logic synchronous, so that there's a better chance to
detach the loopback device after use.
2016-11-22 13:35:09 +01:00
Lennart Poettering 0f3be6ca4d nspawn: support ephemeral boots from images
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.

As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.

Fixes: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering f4ff4aa800 Merge pull request #4395 from s-urbaniak/rw-support
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18 12:36:46 +01:00