Commit Graph

574 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 349cc4a507 build-sys: use #if Y instead of #ifdef Y everywhere
The advantage is that is the name is mispellt, cpp will warn us.

$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build

squash! build-sys: use #if Y instead of #ifdef Y everywhere

v2:
- fix incorrect setting of HAVE_LIBIDN2
2017-10-04 12:09:29 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Lennart Poettering 8e5430c4bd nspawn: set up a new session keyring for the container process
keyring material should not leak into the container. So far we relied on
seccomp to deny access to the keyring, but given that we now made the
seccomp configurable, and access to keyctl() and friends may optionally
be permitted to containers now let's make sure we disconnect the callers
keyring from the keyring of PID 1 in the container.
2017-09-22 15:28:04 +02:00
Lennart Poettering 960e4569e1 nspawn: implement configurable syscall whitelisting/blacklisting
Now that we have ported nspawn's seccomp code to the generic code in
seccomp-util, let's extend it to support whitelisting and blacklisting
of specific additional syscalls.

This uses similar syntax as PID1's support for system call filtering,
but in contrast to that always implements a blacklist (and not a
whitelist), as we prepopulate the filter with a blacklist, and the
unit's system call filter logic does not come with anything
prepopulated.

(Later on we might actually want to invert the logic here, and
whitelist rather than blacklist things, but at this point let's not do
that. In case we switch this over later, the syscall add/remove logic of
this commit should be compatible conceptually.)

Fixes: #5163

Replaces: #5944
2017-09-12 14:06:21 +02:00
Lennart Poettering 21022b9dde util-lib: wrap personality() to fix up broken glibc error handling (#6766)
glibc appears to propagate different errors in different ways, let's fix
this up, so that our own code doesn't get confused by this.

See #6752 + #6737 for details.

Fixes: #6755
2017-09-08 17:16:29 +03:00
Lennart Poettering 8cb5743079 nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416)
Given that we set NOTIFY_SOCKET unconditionally it's not surprising that
processes way down the process tree think it's smart to send us a
notification message.

It's still useful to keep this message, for debugging things, but it
shouldn't be generated by default.
2017-07-20 14:46:58 -04:00
Lennart Poettering cd2dfc6fae nspawn: register a scope for the unit if --register=no is specified (#6166)
Previously, only when --register=yes was set (the default) the invoked
container would get its own scope, created by machined on behalf of
nspawn. With this change if --register=no is set nspawn will still get
its own scope (which is a good thing, so that --slice= and --property=
take effect), but this is not done through machined but by registering a
scope unit directly in PID 1.

Summary:

--register=yes             → allocate a new scope through machined (the default)
--register=yes --keep-unit → use the unit we are already running in an register with machined
--register=no              → allocate a new scope directly, but no machined
--register=no --keep-unit  → do not allocate nor register anything

Fixes: #5823
2017-06-28 13:22:46 -04:00
Zbigniew Jędrzejewski-Szmek 35bca925f9 tree-wide: fix incorrect uses of %m
In those cases errno was not set, so we would be logging some unrelated error
or "Success".
2017-05-13 15:42:26 -04:00
Zbigniew Jędrzejewski-Szmek ab8ee0f259 tree-wide: use SET_FLAG in more places (#5892) 2017-05-07 07:03:28 -04:00
Zbigniew Jędrzejewski-Szmek 399e391fa6 nspawn: check cgroups after parsing options
Same justification as in previous commit.
2017-04-25 08:54:00 -04:00
Lennart Poettering 948a3241de Merge pull request #5708 from vcatechnology/arm-cross-compile
ARM32 cross-compile fixes
2017-04-17 15:49:06 +02:00
Matt Clarkson 6b5cf3ea62 build-sys: correct blkid.h includes
When using pkg-config to determine the include flags for blkid the
flags are returned as:

    $ pkg-config blkid --cflags
    -I/usr/include/blkid -I/usr/include/uuid

We use the <blkid/blkid.h> include which would be correct when using
the default compiler /usr/include header search path. However, when
cross-compiling the blkid.h will not be installed at /usr/include and
highly likely in a temporary system root. It is futher compounded if
the cross-compile packages are split up and the blkid package is not
available in the same sysroot as the compiler.

Regardless of the compilation setup, the correct include path should be
<blkid.h> if using the pkg-config returned CFLAGS.
2017-04-06 14:33:02 +01:00
David Michael 7357272ed1 nspawn: check if the DNS stub is listening for requests 2017-03-31 11:34:32 -07:00
Zbigniew Jędrzejewski-Szmek 78e4f19ebc Merge pull request #5444 from poettering/cgroups-revert-no-error
Revert "core: simplify cg_[all_]unified()" and more.
2017-02-24 18:48:57 -05:00
AsciiWolf 13e785f7a0 Fix missing space in comments (#5439) 2017-02-24 18:14:02 +01:00
Lennart Poettering c22800e40e cgroup: rename cg_unified() → cg_unified_controller()
cg_unified() is a bit generic a name, let's make clear that it checks
whether a specified controller is in unified mode.
2017-02-24 18:00:04 +01:00
Lennart Poettering b4cccbc13a cgroup: change cg_unified() to possibly return errors again
We use our cgroup APIs in various contexts, including from our libraries
sd-login, sd-bus. As we don#t control those environments we can't rely
that the unified cgroup setup logic succeeds, and hence really shouldn't
assert on it.

This more or less reverts 415fc41cea.
2017-02-24 17:52:58 +01:00
Tejun Heo 2977724b09 core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd hierarchy
Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1
name=systemd hierarchy.  While this works fine for systemd itself, it breaks
tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd.

This patch updates the hybrid mode so that it mounts v2 hierarchy on
/sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on
/sys/fs/cgroup/systemd for compatibility.  systemd itself doesn't depend on the
"name=systemd" hierarchy at all.  All operations take place on the v2 hierarchy
as before but the v1 hierarchy is kept in sync so that any tools which expect
it to be there can keep doing so.  This allows systemd to take advantage of
cgroup v2 process management without requiring other tools to be aware of the
hybrid mode.

The hybrid mode is implemented by mapping the special systemd controller to
/sys/fs/cgroup/unified and making the basic cgroup utility operations -
cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the
/sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated.

While a bit messy, this will allow dropping complications from using cgroup v1
for process management a lot sooner than otherwise possible which should make
it a net gain in terms of maintainability.

v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount
    point to /sys/fs/cgroup/unified as suggested by @brauner.

v3: chown the compat hierarchy too on delegation.  Suggested by @evverx.

v4: [zj]
- drop the change to default, full "legacy" is still the default.
2017-02-20 12:28:35 -05:00
Tejun Heo 415fc41cea core: simplify cg_[all_]unified()
cg_[all_]unified() test whether a specific controller or all controllers are on
the unified hierarchy.  While what's being asked is a simple binary question,
the callers must assume that the functions may fail any time, which
unnecessarily complicates their usages.  This complication is unnecessary.
Internally, the test result is cached anyway and there are only a few places
where the test actually needs to be performed.

This patch simplifies cg_[all_]unified().

* cg_[all_]unified() are updated to return bool.  If the result can't be
  decided, assertion failure is triggered.  Error handlings from their callers
  are dropped.

* cg_unified_flush() is updated to calculate the new result synchrnously and
  return whether it succeeded or not.  Places which need to flush the test
  result are updated to test for failure.  This ensures that all the following
  cg_[all_]unified() tests succeed.

* Places which expected possible cg_[all_]unified() failures are updated to
  call and test cg_unified_flush() before calling cg_[all_]unified().  This
  includes functions used while setting up mounts during boot and
  manager_setup_cgroup().
2017-02-18 17:51:13 -05:00
Tejun Heo bd15ab41a1 nspawn: fix cgroup mode detection
cgroup mode detection is broken in two different ways.

* detect_unified_cgroup_hierarchy() is called too nested in outer_child().
  sync_cgroup() which is used by run() also needs to know the requested cgroup
  mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN.  This makes it
  skip syncing the inner cgroup hierarchy on some config combinations.

   $ cat /proc/self/cgroup | grep systemd
   1:name=systemd:/user.slice/user-0.slice/session-c1.scope

   $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   ...
   [root@container ~]# cat /proc/self/cgroup | grep systemd
   1:name=systemd:/machine.slice/machine-container.x86_64.scope
   $ exit

   $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   [root@container ~]# cat /proc/self/cgroup | grep 0::
   0::/
   $ exit

  Note how the unified hierarchy case's path is not synchronized with the host.
  This for example can cause issues when there are multiple such containers.

  Fixed by moving detect_unified_cgroup_hierarchy() invocation to main().

* inner_child() was invoking cg_unified_flush().  inner_child() executes fully
  scoped and can't determine which cgroup mode the host was in.  It doesn't
  make sense to keep flushing the detected mode when the host mode can't
  change.

  Fixed by replacing cg_unified_flush() invocations in outer_child() and
  inner_child() with one in main().
2017-02-18 17:49:06 -05:00
Zbigniew Jędrzejewski-Szmek 581a07f9f0 Merge pull request #5369 from poettering/nspawn-resolved
fixes for running nspawn+resolved in combination
2017-02-18 11:54:34 -05:00
Lennart Poettering b053cd5f8e nspawn: tweak check whether resolved is around a bit
Let's check D-Bus instead of files in /run to see if resolved is
running. This is a bit nicer as bus names are automatically cleaned up
when resolved dies, which is not the case for files in /run.

See: #4649
2017-02-17 16:06:31 -05:00
Lennart Poettering 1c876927e4 copy: change the various copy_xyz() calls to take a unified flags parameter
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
2017-02-17 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek fc6149a6ce Merge pull request #4962 from poettering/root-directory-2
Add new MountAPIVFS= boolean unit file setting + RootImage=
2017-02-08 23:05:05 -05:00
Philip Withnall b53ede699c nspawn: Add support for sysroot pivoting (#5258)
Add a new --pivot-root argument to systemd-nspawn, which specifies a
directory to pivot to / inside the container; while the original / is
pivoted to another specified directory (if provided). This adds
support for booting container images which may contain several bootable
sysroots, as is common with OSTree disk images. When these disk images
are booted on real hardware, ostree-prepare-root is run in conjunction
with sysroot.mount in the initramfs to achieve the same results.
2017-02-08 16:54:31 +01:00
Lennart Poettering 78ebe98061 core,nspawn,dissect: make nspawn's .roothash file search reusable
This makes nspawn's logic of automatically discovering the root hash of
an image file generic, and then reuses it in systemd-dissect and in
PID1's RootImage= logic, so that verity is automatically set up whenever
we can.
2017-02-07 12:21:28 +01:00
Lennart Poettering ced58da749 nspawn: shown exec() command is misleading
There's no point in updating exec_target for each binary we try to
execute, if we override it right-away anyway... Let's just do this once,
and include all binaries we try each time.

Follow-up for 1a68e1e543.
2017-02-02 20:10:28 +01:00
Philip Withnall 1a68e1e543 nspawn: Print attempted execv() path on failure (#5199)
The failure message is typically currently:
   execv() failed: No such file or directory
which is not very useful because it doesn’t tell you which file or
directory it was trying to exec.
2017-02-01 08:36:16 -05:00
Zbigniew Jędrzejewski-Szmek ec251fe7d5 tree-wide: adjust fall through comments so that gcc is happy
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.

(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
2017-01-31 14:04:55 -05:00
Zbigniew Jędrzejewski-Szmek 9ce6d1b319 nspawn: fix clobbering of selinux context arg
First bug fixed by gcc 7. Yikes.
2017-01-31 14:04:55 -05:00
Evgeny Vereshchagin adc7d9f0da nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
Fixes #4944
2017-01-17 08:40:05 +00:00
Zbigniew Jędrzejewski-Szmek e0489532fd nspawn: fix memleak
CID #1368262: fn is allocated with new, so it should be freed.
2017-01-15 16:57:57 -05:00
Zbigniew Jędrzejewski-Szmek 6b3d378331 Merge pull request #4879 from poettering/systemd 2017-01-14 21:29:27 -05:00
Lennart Poettering 8dbf71ec58 nspawn: reword notice when /dev is pre-mounted and populated (#4971)
Fixes: #4676
2016-12-29 11:02:39 +01:00
Lennart Poettering 87447ae459 nspawn: tweaks to /etc/resolv.conf management
Handle properly if /etc is a symlink (i.e. make sure we don't follow the
symlink outside the image). Also follow /etc/resolv.conf if it is a
symlink, and use the resolved path when creating a mount point and
mounting (as both of these operations follow symlinks and rally
shouldn't).

Handle more types of read-only errors as debug-level issues.
2016-12-21 19:09:32 +01:00
Lennart Poettering 8ccf7e9e96 nspawn: don't complain when we can't fix the timezone of read-only containers
There's nothing we can do about it, hence don't complain.
2016-12-21 19:09:32 +01:00
Lennart Poettering e0f9e7bd03 dissect: make using a generic partition as root partition optional
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.

In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
2016-12-21 19:09:30 +01:00
Lennart Poettering 4ad14eff19 nspawn: restore --volatile=yes support
This was broken by 19caffac75 which remounted the
root directory to MS_SHARED before applying the volatile mount logic. This
broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we
need MS_MOVE in the volatile mount logic to rearrange the directory tree.
Simply swap the order here, apply the volatile logic before we switch to
MS_SHARED.
2016-12-21 19:09:28 +01:00
Evgeny Vereshchagin 5773024d7f nspawn: unref the notify event source (#4941)
Fixes:
```
sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b
...
==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15
==21224==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==21224==    by 0x4F6F565: sd_event_new (sd-event.c:431)
==21224==    by 0x1210BE: run (nspawn.c:3351)
==21224==    by 0x123908: main (nspawn.c:3826)
==21224==
==21224== LEAK SUMMARY:
==21224==    definitely lost: 656 bytes in 1 blocks
==21224==    indirectly lost: 1,788 bytes in 11 blocks
==21224==      possibly lost: 0 bytes in 0 blocks
==21224==    still reachable: 8,344 bytes in 3 blocks
==21224==         suppressed: 0 bytes in 0 blocks
```
Closes #4934
2016-12-21 18:36:15 +01:00
Lennart Poettering 9b6deb03fc dissect: optionally, only look for GPT partition tables, nothing else
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
2016-12-20 20:00:09 +01:00
Lennart Poettering 75bf701f5c nspawn: flush out environment block of the -a stub init process
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.

Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
2016-12-14 18:29:30 +01:00
Andrey Ulanov 6916b16464 nspawn: when getting SIGCHLD make sure it's from the first child (#4855)
When getting SIGCHLD we should not assume that it was the first
child forked from system-nspawn that has died as it may also be coming
from an orphan process. This change adds a signal handler that ignores
SIGCHLD unless it came from the first containerized child - the real
child.

Before this change the problem can be reproduced as follows:

$ sudo systemd-nspawn --directory=/container-root --share-system
Press ^] three times within 1s to kill container.
[root@andreyu-coreos ~]# { true & } &
[1] 22201
[root@andreyu-coreos ~]#
Container root-fedora-latest terminated by signal KILL
2016-12-13 02:38:18 +01:00
Zbigniew Jędrzejewski-Szmek 4a5567d5d6 Merge pull request #4795 from poettering/dissect
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10 01:08:13 -05:00
Wim de With 2e1f244efd nspawn: add missing -E to getopt_long (#4860) 2016-12-10 07:33:58 +03:00
Franck Bui 5367354dae nspawn: resolv.conf might not be created initially (#4799)
This might happen that resolv.conf is missing in a minimal rootfs and in this
case the following warning is emitted:

 Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory

This patch fixes this case.
2016-12-07 21:36:39 +01:00
Lennart Poettering 4623e8e6ac nspawn/dissect: automatically discover dm-verity verity partitions
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.

Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.

mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:

```
 # mkdir test
 # cd test
 # mkosi --verity
 # systemd-nspawn -i ./image.raw -bn
```

Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07 18:38:41 +01:00
Lennart Poettering 4827ab4854 nspawn: when generating a machine name from an image name, truncate .raw suffix
Let's prettify the machine name we generate for image-based containers: let's
chop off the .raw suffix before using it as machine name.
2016-12-07 18:38:41 +01:00
Lennart Poettering 18b5886e56 dissect: add support for encrypted images
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.

It's pretty basic: only support for passphrase-based encryption.

In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.
2016-12-07 18:38:41 +01:00
Lennart Poettering 2d8457851b nspawn: port nspawn to new generalized image dissection code
Let's make use of the new internal API. This mostly doesn't change anything for
the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new
code can handle disk images with no partition tables, and make any detected
images directly the root.
2016-12-07 18:38:40 +01:00
Lennart Poettering cb638b5e96 util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENT
As suggested by @keszybz
2016-12-01 12:49:55 +01:00