Commit graph

815 commits

Author SHA1 Message Date
Stefan Schweter 1a012455c2 tree-wide: remove consecutive duplicate words in comments (#5148) 2017-01-24 21:45:30 -05:00
Djalal Harouni 0819dd72df Merge pull request #5098 from evverx/fix-nspawn-notifications
nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
2017-01-18 14:36:07 +01:00
Zbigniew Jędrzejewski-Szmek 5b3637b44a Merge pull request #4991 from poettering/seccomp-fix 2017-01-17 23:10:46 -05:00
Lennart Poettering 469830d142 seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.

So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.

This rework hence changes a couple of things:

- We no longer use seccomp_rule_add(), but only
  seccomp_rule_add_exact(), and fail the installation of a filter if the
  architecture doesn't support it.

- We no longer rely on adding multiple syscall architectures to a single filter,
  but instead install a separate filter for each syscall architecture
  supported. This way, we can install a strict filter for x86-64, while
  permitting a less strict filter for i386.

- All high-level filter additions are now moved from execute.c to
  seccomp-util.c, so that we can test them independently of the service
  execution logic.

- Tests have been added for all types of our seccomp filters.

- SystemCallFilters= and SystemCallArchitectures= are now implemented in
  independent filters and installation logic, as they semantically are
  very much independent of each other.

Fixes: #4575
2017-01-17 22:14:27 -05:00
Evgeny Vereshchagin adc7d9f0da nspawn: change owner/group of /run/systemd/nspawn/notify to userns-root
Fixes #4944
2017-01-17 08:40:05 +00:00
Zbigniew Jędrzejewski-Szmek e0489532fd nspawn: fix memleak
CID #1368262: fn is allocated with new, so it should be freed.
2017-01-15 16:57:57 -05:00
Zbigniew Jędrzejewski-Szmek 6b3d378331 Merge pull request #4879 from poettering/systemd 2017-01-14 21:29:27 -05:00
Mike Gilbert c9f7b4d356 build-sys: add check for gperf lookup function signature (#5055)
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.

Fixes: https://github.com/systemd/systemd/issues/5039
2017-01-10 08:39:05 +01:00
Lennart Poettering 8dbf71ec58 nspawn: reword notice when /dev is pre-mounted and populated (#4971)
Fixes: #4676
2016-12-29 11:02:39 +01:00
Lennart Poettering 87447ae459 nspawn: tweaks to /etc/resolv.conf management
Handle properly if /etc is a symlink (i.e. make sure we don't follow the
symlink outside the image). Also follow /etc/resolv.conf if it is a
symlink, and use the resolved path when creating a mount point and
mounting (as both of these operations follow symlinks and rally
shouldn't).

Handle more types of read-only errors as debug-level issues.
2016-12-21 19:09:32 +01:00
Lennart Poettering 8ccf7e9e96 nspawn: don't complain when we can't fix the timezone of read-only containers
There's nothing we can do about it, hence don't complain.
2016-12-21 19:09:32 +01:00
Lennart Poettering e0f9e7bd03 dissect: make using a generic partition as root partition optional
In preparation for reusing the image dissector in the GPT auto-discovery
logic, only optionally fail the dissection when we can't identify a root
partition.

In the GPT auto-discovery we are completely fine with any kind of root,
given that we run when it is already mounted and all we do is find some
additional auxiliary partitions on the same disk.
2016-12-21 19:09:30 +01:00
Lennart Poettering 4ad14eff19 nspawn: restore --volatile=yes support
This was broken by 19caffac75 which remounted the
root directory to MS_SHARED before applying the volatile mount logic. This
broke things as MS_MOVE is incompatible with MS_SHARED directory trees, and we
need MS_MOVE in the volatile mount logic to rearrange the directory tree.
Simply swap the order here, apply the volatile logic before we switch to
MS_SHARED.
2016-12-21 19:09:28 +01:00
Evgeny Vereshchagin 5773024d7f nspawn: unref the notify event source (#4941)
Fixes:
```
sudo ./libtool --mode=execute valgrind --leak-check=full ./systemd-nspawn -D ./CONT/ -b
...
==21224== 2,444 (656 direct, 1,788 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 15
==21224==    at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==21224==    by 0x4F6F565: sd_event_new (sd-event.c:431)
==21224==    by 0x1210BE: run (nspawn.c:3351)
==21224==    by 0x123908: main (nspawn.c:3826)
==21224==
==21224== LEAK SUMMARY:
==21224==    definitely lost: 656 bytes in 1 blocks
==21224==    indirectly lost: 1,788 bytes in 11 blocks
==21224==      possibly lost: 0 bytes in 0 blocks
==21224==    still reachable: 8,344 bytes in 3 blocks
==21224==         suppressed: 0 bytes in 0 blocks
```
Closes #4934
2016-12-21 18:36:15 +01:00
Lennart Poettering 9b6deb03fc dissect: optionally, only look for GPT partition tables, nothing else
This is useful for reusing the dissector logic in the gpt-auto-discovery logic:
there we really don't want to use MBR or naked file systems as root device.
2016-12-20 20:00:09 +01:00
Lennart Poettering a4c35b6b4d nspawn: split out VolatileMode definitions
This moves the VolatileMode enum and its helper functions to src/shared/. This
is useful to then reuse them to implement systemd.volatile= in a later commit.
2016-12-20 20:00:08 +01:00
Lennart Poettering 75bf701f5c nspawn: flush out environment block of the -a stub init process
The container detection code in virt.c we ship checks for /proc/1/environ,
looking for "container=" in it. Let's make sure our "-a" init stub exposes that
correctly.

Without this "systemd-detect-virt" run in a "-a" container won't detect that it
is being run in a container.
2016-12-14 18:29:30 +01:00
Andrey Ulanov 6916b16464 nspawn: when getting SIGCHLD make sure it's from the first child (#4855)
When getting SIGCHLD we should not assume that it was the first
child forked from system-nspawn that has died as it may also be coming
from an orphan process. This change adds a signal handler that ignores
SIGCHLD unless it came from the first containerized child - the real
child.

Before this change the problem can be reproduced as follows:

$ sudo systemd-nspawn --directory=/container-root --share-system
Press ^] three times within 1s to kill container.
[root@andreyu-coreos ~]# { true & } &
[1] 22201
[root@andreyu-coreos ~]#
Container root-fedora-latest terminated by signal KILL
2016-12-13 02:38:18 +01:00
Zbigniew Jędrzejewski-Szmek 4a5567d5d6 Merge pull request #4795 from poettering/dissect
Generalize image dissection logic of nspawn, and make it useful for other tools.
2016-12-10 01:08:13 -05:00
Wim de With 2e1f244efd nspawn: add missing -E to getopt_long (#4860) 2016-12-10 07:33:58 +03:00
Franck Bui 5367354dae nspawn: resolv.conf might not be created initially (#4799)
This might happen that resolv.conf is missing in a minimal rootfs and in this
case the following warning is emitted:

 Failed to mount n/a on /mnt/etc/resolv.conf (MS_BIND ""): No such file or directory

This patch fixes this case.
2016-12-07 21:36:39 +01:00
Lennart Poettering 4623e8e6ac nspawn/dissect: automatically discover dm-verity verity partitions
This adds support for discovering and making use of properly tagged dm-verity
data integrity partitions. This extends both systemd-nspawn and systemd-dissect
with a new --root-hash= switch that takes the root hash to use for the root
partition, and is otherwise fully automatic.

Verity partitions are discovered automatically by GPT table type UUIDs, as
listed in
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
(which I updated prior to this change, to include new UUIDs for this purpose.

mkosi with https://github.com/systemd/mkosi/pull/39 applied may generate images
that carry the necessary integrity data. With that PR and this commit, the
following simply lines suffice to boot up an integrity-protected container image:

```
 # mkdir test
 # cd test
 # mkosi --verity
 # systemd-nspawn -i ./image.raw -bn
```

Note that mkosi writes the image file to "image.raw" next to a a file
"image.roothash" that contains the root hash. systemd-nspawn will look for that
file and use it if it exists, in case --root-hash= is not specified explicitly.
2016-12-07 18:38:41 +01:00
Lennart Poettering 4827ab4854 nspawn: when generating a machine name from an image name, truncate .raw suffix
Let's prettify the machine name we generate for image-based containers: let's
chop off the .raw suffix before using it as machine name.
2016-12-07 18:38:41 +01:00
Lennart Poettering 18b5886e56 dissect: add support for encrypted images
This adds support to the image dissector to deal with encrypted images (only
LUKS). Given that we now have a neatly isolated image dissector codebase, let's
add a new feature to it: support for automatically dealing with encrypted
images. This is then exposed in systemd-dissect and nspawn.

It's pretty basic: only support for passphrase-based encryption.

In order to ensure that "systemd-dissect --mount" results in mount points whose
backing LUKS DM devices are cleaned up automatically we use the DM_DEV_REMOVE
ioctl() directly on the device (in DM_DEFERRED_REMOVE mode). libgcryptsetup at
the moment doesn't provide a proper API for this. Thankfully, the ioctl() API
is pretty easy to use.
2016-12-07 18:38:41 +01:00
Lennart Poettering 2d8457851b nspawn: port nspawn to new generalized image dissection code
Let's make use of the new internal API. This mostly doesn't change anything for
the caller, however, "systemd-nspawn --image=/dev/sda7" works now as the new
code can handle disk images with no partition tables, and make any detected
images directly the root.
2016-12-07 18:38:40 +01:00
Susant Sahani 10452f7c93 core: introduce parse_ip_port (#4825)
1. Listed in TODO.
2. Tree wide replace safe_atou16 with parse_ip_port incase
   it's used for ports.
2016-12-06 12:21:45 +01:00
Evgeny Vereshchagin c9fd987279 nspawn: don't hide --bind=/tmp/* mounts (#4824)
Fixes #4789
2016-12-05 18:14:05 +01:00
Lennart Poettering cb638b5e96 util-lib: rename CHASE_NON_EXISTING → CHASE_NONEXISTENT
As suggested by @keszybz
2016-12-01 12:49:55 +01:00
Lennart Poettering ec57bd426a nspawn: improve log messages
When complaining about the inability to resolve a path, show the full path, not
just the relative one.

As suggested by @keszybz.
2016-12-01 12:41:18 +01:00
Lennart Poettering c7a4890ce4 nspawn: optionally, automatically allocated --bind=/--overlay source from /var/tmp
This extends the --bind= and --overlay= syntax so that an empty string as source/upper
directory is taken as request to automatically allocate a temporary directory
below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination
with the "+" path extension this permits a switch "--overlay=+/var::/var" in
order to use the container's shipped /var, combine it with a writable temporary
directory and mount it to the runtime /var of the container.
2016-12-01 12:41:18 +01:00
Lennart Poettering 86c0dd4a71 nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"
If a source path is prefixed with "+" it is taken relative to the container's
root directory instead of the host. This permits easily establishing bind and
overlay mounts based on data from the container rather than the host.

This also reworks custom_mounts_prepare(), and turns it into two functions: one
custom_mount_check_all() that remains in nspawn.c but purely verifies the
validity of the custom mounts configured. And one called
custom_mount_prepare_all() that actually does the preparation step, sorts the
custom mounts, resolves relative paths, and allocates temporary directories as
necessary.
2016-12-01 12:41:18 +01:00
Lennart Poettering e28c7cd066 tree-wide: set SA_RESTART for signal handlers we install
We already set it in most cases, but make sure to set it in all others too, and
document that that's a good idea.
2016-12-01 12:41:17 +01:00
Lennart Poettering 7b4318b6a5 nspawn: add ability to configure overlay mounts to .nspawn files
Fixes: #4634
2016-12-01 12:41:17 +01:00
Lennart Poettering ad85779a50 nspawn: split out overlayfs argument parsing into a function of its own
Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and
bind_mount_parse().
2016-12-01 00:25:51 +01:00
Lennart Poettering 48cbe5f80b nspawn: use -ENOMEM instead of log_oom() in one case
The function is of the "library" kind and doesn't log ENOMEM in all other
cases, hence fix the one outlier.
2016-12-01 00:25:51 +01:00
Lennart Poettering 8d4aa2bb32 nspawn: make use of CHASE_NON_EXISTING when locking image
If --template= is used on an image, then the image might not exist initially.
We can use CHASE_NON_EXISTING to properly lock the image already before it
exists. Let's do so.
2016-12-01 00:25:51 +01:00
Lennart Poettering 8ce48cf0f8 nspawn: use the new CHASE_NON_EXISTING flag when resolving mount points
This restores the ability to implicitly create files/directories to mount
specified mount points on.
2016-12-01 00:25:51 +01:00
Lennart Poettering c4f4fce79e fs-util: add flags parameter to chase_symlinks()
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
2016-12-01 00:25:51 +01:00
Lennart Poettering 68cf43c315 nspawn: use chase_symlinks() on all paths specified via --tmpfs=, --bind= and so on
Fixes: #2860
2016-12-01 00:25:51 +01:00
Lennart Poettering 4da92e5857 nspawn: coding style: don't mix variable declarations and function calls 2016-12-01 00:25:51 +01:00
Lennart Poettering 5639193139 nspawn: use realloc_multiply() where it makes sense 2016-12-01 00:25:51 +01:00
Lennart Poettering 8cd328d82e nspawn: accept --ephemeral --template= as alternative for --ephemeral --directory=
As suggested in PR #3667.

This PR simply ensures that --template= can be used as alternative to
--directory= when --ephemeral is used, following the logic that for ephemeral
options the source directory is actually a template.

This does not deprecate usage of --directory= with --ephemeral, as I am not
convinced the old logic wouldn't make sense.

Fixes: #3667
2016-12-01 00:25:51 +01:00
Lennart Poettering 3f342ec4b0 nspawn: properly handle image/directory paths that are symlinks
This resolves any paths specified on --directory=, --template=, and --image=
before using them. This makes sure nspawn can be used correctly on symlinked
images and directory trees.

Fixes: #2001
2016-12-01 00:25:51 +01:00
Lennart Poettering e187369587 tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
2016-12-01 00:25:51 +01:00
Lennart Poettering acbbf69b71 nspawn: don't require chown() if userns is not on
Fixes: #4711
2016-11-22 13:35:24 +01:00
Lennart Poettering 17cbb288fa nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.

This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.

Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
2016-11-22 13:35:09 +01:00
Lennart Poettering c67b008273 nspawn: remove temporary root directory on exit
When mountint a loopback image, we need a temporary root directory we can mount
stuff to. Make sure to actually remove it when exiting, so that we don't leave
stuff around in /tmp unnecessarily.

See: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering 6a0f896b97 nspawn: try to wait for the container PID 1 to exit, before we exit
Let's make the shutdown logic synchronous, so that there's a better chance to
detach the loopback device after use.
2016-11-22 13:35:09 +01:00
Lennart Poettering 0f3be6ca4d nspawn: support ephemeral boots from images
Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.

As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.

Fixes: #4664
2016-11-22 13:35:09 +01:00
Lennart Poettering f4ff4aa800 Merge pull request #4395 from s-urbaniak/rw-support
nspawn: R/W support for /sysfs, /proc, and /proc/sys/net
2016-11-18 12:36:46 +01:00
Sergiusz Urbaniak 4f086aab52
nspawn: R/W support for /sys, and /proc/sys
This commit adds the possibility to leave /sys, and /proc/sys read-write.
It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE
to enable this feature.

If set to "yes", /sys, and /proc/sys will be read-write.
If set to "no", /sys, and /proc/sys will be read-only.
If set to "network" /proc/sys/net will be read-write. This is useful in
use-cases, where systemd-nspawn is used in an external network
namespace.

This adds the possibility to start privileged containers which need more
control over settings in the /proc, and /sys filesystem.

This is also a follow-up on the discussion from
https://github.com/systemd/systemd/pull/4018#r76971862 where an
introduction of a simple env var to enable R/W support for those
directories was already discussed.
2016-11-18 09:50:40 +01:00
Zbigniew Jędrzejewski-Szmek 2a49b6120f nspawn: restart the whole systemd-nspawn@.service unit on container reboot (#4613)
Since 133 is now used in a few places, add a #define for it.
Also make the status message a bit informative.

Another issue introduced in b006762. The logic was borked, we were supposed
to return 0 to break the loop, and 133 to restart the container, not the other
way around.

But this doesn't seem to work, reboot fails with:
Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists
So actually the version before this patch worked better, since 133 > 0 and we'd
at least loop internally.
2016-11-14 11:49:49 +01:00
Christian Hesse 7debb05dbe nspawn: fix condition for mounting resolv.conf (#4622)
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us
whether or not systemd-resolved is running or not.
So check for /run/systemd/resolve/resolv.conf as well, which is created
at runtime and hence is a better indication.
2016-11-08 22:01:26 -05:00
Zbigniew Jędrzejewski-Szmek a809cee582 Merge pull request #4612 from keszybz/format-strings
Format string tweaks (and a small fix on 32bit)
2016-11-08 08:09:40 -05:00
Martin Pitt cfed63f60d nspawn: fix exit code for --help and --version (#4609)
Commit b006762 inverted the initial exit code which is relevant for --help and
--version without a particular reason.  For these special options, parse_argv()
returns 0 so that our main() immediately skips to the end without adjusting
"ret". Otherwise, if an actual container is being started, ret is set on error
in run(), which still provides the "non-zero exit on error" behaviour.

Fixes #4605.
2016-11-07 23:31:55 -05:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
tblume bdb4e0cb64 systemd-nspawn: decrease non-fatal mount errors to debug level (#4569)
non-fatal mount errors shouldn't be logged as warnings.
2016-11-07 08:20:43 -05:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Lennart Poettering 2bce2acce8 nspawn: if we set up a loopback device, try to mount it with "discard"
Let's make sure that our loopback files remain sparse, hence let's set
"discard" as mount option on file systems that support it if the backing device
is a loopback.
2016-11-02 11:39:49 -06:00
Lennart Poettering 8d7b0c8fd7 seccomp: add new seccomp_init_conservative() helper
This adds a new seccomp_init_conservative() helper call that is mostly just a
wrapper around seccomp_init(), but turns off NNP and adds in all secondary
archs, for best compatibility with everything else.

Pretty much all of our code used the very same constructs for these three
steps, hence unifying this in one small function makes things a lot shorter.

This also changes incorrect usage of the "scmp_filter_ctx" type at various
places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type
(pretty poor choice already!) that casts implicitly to and from all other
pointer types (even poorer choice: you defined a confusing type now, and don't
even gain any bit of type safety through it...). A lot of the code assumed the
type would refer to a structure, and hence aded additional "*" here and there.
Remove that.
2016-10-24 17:32:50 +02:00
Evgeny Vereshchagin 6d66bd3b2a nspawn: become a new root early
036d523641

> vfs: Don't create inodes with a uid or gid unknown to the vfs
  It is expected that filesystems can not represent uids and gids from
  outside of their user namespace.  Keep things simple by not even
  trying to create filesystem nodes with non-sense uids and gids.

So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955

$ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target

Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide.
Press ^] three times within 1s to kill container.
Child died too early.
Selected user namespace base 1073283072 and range 65536.
Failed to mount to /sys/fs/cgroup/systemd: No such file or directory

Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519
Fixes: #4352
2016-10-23 23:23:42 -04:00
Evgeny Vereshchagin 63eae72312 nspawn: really lchown(uid/gid)
https://github.com/systemd/systemd/pull/4372#issuecomment-253723849:

* `mount_all (outer_child)` creates `container_dir/sys/fs/selinux`
* `mount_all (outer_child)` doesn't patch `container_dir/sys/fs` and so on.
* `mount_sysfs (inner_child)` tries to create `/sys/fs/cgroup`
* This fails

370   stat("/sys/fs", {st_dev=makedev(0, 28), st_ino=13880, st_mode=S_IFDIR|0755, st_nlink=3, st_uid=65534, st_gid=65534, st_blksize=4096, st_blocks=0, st_size=60, st_atime=2016/10/14-05:16:43.398665943, st_mtime=2016/10/14-05:16:43.399665943, st_ctime=2016/10/14-05:16:43.399665943}) = 0
370   mkdir("/sys/fs/cgroup", 0755)     = -1 EACCES (Permission denied)

* `mount_syfs (inner_child)` ignores that error and

mount(NULL, "/sys", NULL, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = 0

* `mount_cgroups` finally fails
2016-10-23 23:23:40 -04:00
Zbigniew Jędrzejewski-Szmek 9aa2169eae nspawn: use the return value from asprintf instead of checking the pointer
If allocation fails, the value of the point is "undefined". In practice
this matters very little, but for consistency with rest of the code,
let's check the return value.
2016-10-23 11:57:55 -04:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 24597ee0e6 nspawn, NEWS: add missing "s" in --private-users-chown (#4438) 2016-10-21 06:03:26 +03:00
Zbigniew Jędrzejewski-Szmek 6b430fdb7c tree-wide: use mfree more 2016-10-16 23:35:39 -04:00
Thomas H. P. Andersen 5c4624e082 nspawn: remove unused variable (#4369) 2016-10-14 00:30:28 +03:00
Evgeny Vereshchagin f0bef277a4 nspawn: cleanup and chown the synced cgroup hierarchy (#4223)
Fixes: #4181
2016-10-13 09:50:46 -04:00
Lennart Poettering 18e51a022c Merge pull request #4351 from keszybz/nspawn-debugging
Enhance nspawn debug logs for mount/unmount operations
2016-10-12 11:21:11 +02:00
Evgeny Vereshchagin 8492849ee5 nspawn: let's mount(/tmp) inside the user namespace (#4340)
Fixes:
host# systemd-nspawn -D ... -U -b systemd.unit=multi-user.target
...
$ grep /tmp /proc/self/mountinfo
154 145 0:41 / /tmp rw - tmpfs tmpfs rw,seclabel,uid=1036124160,gid=1036124160

$ umount /tmp
umount: /root/tmp: not mounted

$ systemctl poweroff
...
[FAILED] Failed unmounting Temporary Directory.
2016-10-11 17:18:27 -04:00
Zbigniew Jędrzejewski-Szmek 60e76d4897 nspawn,mount-util: add [u]mount_verbose and use it in nspawn
This makes it easier to debug failed nspawn invocations:

Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")...
Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")...
Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")...
Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")...
Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")...
Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11 16:50:07 -04:00
Zbigniew Jędrzejewski-Szmek add554f4e1 nspawn: small cleanups in get_controllers()
- check for oom after strdup
- no need to truncate the line since we're only extracting one field anyway
- use STR_IN_SET
2016-10-11 16:46:58 -04:00
Zbigniew Jędrzejewski-Szmek ada5412039 nspawn: simplify arg_us_cgns passing
We would check the condition cg_ns_supported() twice. No functional
change.
2016-10-11 16:46:58 -04:00
Lennart Poettering 6dca2fe325 Merge pull request #4332 from keszybz/nspawn-arguments-3
nspawn --private-users parsing, v2
2016-10-10 19:51:51 +02:00
Evgeny Vereshchagin a0f72a24e0 Merge pull request #4310 from keszybz/nspawn-autodetect
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10 20:47:25 +03:00
Zbigniew Jędrzejewski-Szmek be7157316c nspawn: better error messages for parsing errors
In particular, the check for arg_uid_range <= 0 is moved to the end, so that
"foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek ae209204d8 nspawn,man: fix parsing of numeric args for --private-users, accept any boolean
This is like the previous reverted commit, but any boolean is still accepted,
not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek 6c2058b35e Revert "nspawn: fix parsing of numeric arguments for --private-users"
This reverts commit bfd292ec35.
2016-10-10 11:17:40 -04:00
Zbigniew Jędrzejewski-Szmek bfd292ec35 nspawn: fix parsing of numeric arguments for --private-users
The documentation says lists "yes", "no", "pick", and numeric arguments.
But parse_boolean was attempted first, so various numeric arguments were
misinterpreted.

In particular, this fixes --private-users=0 to mean the same thing as
--private-users=0:65536.

While at it, use strndupa to avoid some error handling.
Also give a better error for an empty UID range. I think it's likely that
people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09 11:52:35 -04:00
Zbigniew Jędrzejewski-Szmek 27eb8e9028 nspawn: reindent table 2016-10-09 11:51:18 -04:00
Zbigniew Jędrzejewski-Szmek a8725a06e6 nspawn: also fall back to legacy cgroup hierarchy for old containers
Current systemd version detection routine cannot detect systemd 230,
only systmed >= 231. This means that we'll still use the legacy hierarchy
in some cases where we wouldn't have too. If somebody figures out a nice
way to detect systemd 230 this can be later improved.
2016-10-08 19:03:53 -04:00
Zbigniew Jędrzejewski-Szmek 0fd9563fde nspawn: use mixed cgroup hierarchy only when container has new systemd
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy.
So make an educated guess, and use the mixed hierarchy in that case.

Tested by running the host with mixed hierarchy (i.e. simply using a recent
kernel with systemd from git), and booting first a container with older systemd,
and then one with a newer systemd.

Fixes #4008.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 27e29a1e43 nspawn: fix spurious reboot if container process returns 133 2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek b006762524 nspawn: move the main loop body out to a new function
The new function has 416 lines by itself!

"return log_error_errno" is used to nicely reduce the volume of error
handling code.

A few minor issues are fixed on the way:
- positive value was used as error value (EIO), causing systemd-nspawn
  to return success, even though it shouldn't.
- In two places random values were used as error status, when the
  actual value was in an unusual place (etc_password_lock, notify_socket).

Those are the only functional changes.

There is another potential issue, which is marked with a comment, and left
unresolved: the container can also return 133 by itself, causing a spurious
reboot.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 98afd6af3a nspawn: check env var first, detect second
If we are going to use the env var to override the detection result
anyway, there is not point in doing the detection, especially that
it can fail.
2016-10-08 14:48:41 -04:00
Lennart Poettering 7429b2eb83 tree-wide: drop some misleading compiler warnings
gcc at some optimization levels thinks thes variables were used without
initialization. it's wrong, but let's make the message go anyway.
2016-10-06 19:04:10 +02:00
Djalal Harouni 41eb436265 nspawn: add log message to let users know that nspawn needs an empty /dev directory (#4226)
Fixes https://github.com/systemd/systemd/issues/3695

At the same time it adds a protection against userns chown of inodes of
a shared mount point.
2016-10-05 06:57:02 +02:00
Alban Crequy 19caffac75 nspawn: set shared propagation mode for the container 2016-10-03 14:19:27 +02:00
Evgeny Vereshchagin cc238590e4 Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-28 04:50:30 +03:00
Torstein Husebø d23a0044a3 treewide: fix typos (#4217) 2016-09-26 11:32:47 +02:00
Lennart Poettering 920a7899de nspawn: let's mount /proc/sysrq-trigger read-only by default
LXC does this, and we should probably too. Better safe than sorry.
2016-09-25 10:42:18 +02:00
Lennart Poettering 6b7c9f8bce namespace: rework how ReadWritePaths= is applied
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths=
specification, then we'd first recursively apply the ReadOnlyPaths= paths, and
make everything below read-only, only in order to then flip the read-only bit
again for the subdirs listed in ReadWritePaths= below it.

This is not only ugly (as for the dirs in question we first turn on the RO bit,
only to turn it off again immediately after), but also problematic in
containers, where a container manager might have marked a set of dirs read-only
and this code will undo this is ReadWritePaths= is set for any.

With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be
applied to the children listed in ReadWritePaths= in the first place, so that
we do not need to turn off the RO bit for those after all.

This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the
RO bit, but never to turn it off again. Or to say this differently: if some
dirs are marked read-only via some external tool, then ReadWritePaths= will not
undo it.

This is not only the safer option, but also more in-line with what the man page
currently claims:

        "Entries (files or directories) listed in ReadWritePaths= are
        accessible from within the namespace with the same access rights as
        from outside."

To implement this change bind_remount_recursive() gained a new "blacklist"
string list parameter, which when passed may contain subdirs that shall be
excluded from the read-only mounting.

A number of functions are updated to add more debug logging to make this more
digestable.
2016-09-25 10:40:51 +02:00
Luca Bruno 48a8d337a6 nspawn: decouple --boot from CLONE_NEWIPC (#4180)
This commit is a minor tweak after the split of `--share-system`, decoupling the `--boot`
option from IPC namespacing.

Historically there has been a single `--share-system` option for sharing IPC/PID/UTS with the
host, which was incompatible with boot/pid1 mode. After the split, it is now possible to express
the requirements with better granularity.

For reference, this is a followup to #4023 which contains references to previous discussions.
I realized too late that CLONE_NEWIPC is not strictly needed for boot mode.
2016-09-24 08:30:42 -04:00
Michael Pope 21dc02277d nspawn: fix comment typo in setup_timezone example (#4183) 2016-09-20 07:30:48 +02:00
Michael Pope 0b493a0263 nspawn: clarify log warning for /etc/localtime not being a symbolic link (#4163) 2016-09-17 09:59:28 +02:00
Felipe Sateler 1cec406d62 nspawn: detect SECCOMP availability, skip audit filter if unavailable
Fail hard if SECCOMP was detected but could not be installed
2016-09-06 20:25:49 -03:00
Luca Bruno 0c582db0c6 nspawn: split down SYSTEMD_NSPAWN_SHARE_SYSTEM (#4023)
This commit follows further on the deprecation path for --share-system,
by splitting and gating each share-able namespace behind its own
environment flag.
2016-08-26 00:08:26 +02:00
Zbigniew Jędrzejewski-Szmek 2056ec1927 Merge pull request #3965 from htejun/systemd-controller-on-unified 2016-08-19 19:58:01 -04:00
Lennart Poettering 8673cf13c0 bus-util: unify loop around bus_append_unit_property_assignment()
This is done exactly the same way a couple of times at various places, let's
unify this into one version.
2016-08-18 22:23:31 +02:00
Tejun Heo 5da38d0768 core: use the unified hierarchy for the systemd cgroup controller hierarchy
Currently, systemd uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, systemd uses a named legacy hierarchy
mounted on /sys/fs/cgroup/systemd without any kernel controllers for process
management.  Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.

Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for systemd to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies.  It can simply mount the unified hierarchy under
/sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test.  In time, this would also allow deleting the said
complexities.

This patch updates systemd so that it prefers the unified hierarchy for the
systemd cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.

* cg_unified(@controller) is introduced which tests whether the specific
  controller in on unified hierarchy and used to choose the unified hierarchy
  code path for process and service management when available.  Kernel
  controller specific operations remain gated by cg_all_unified().

* "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to
  force the use of legacy hierarchy for systemd cgroup controller.

* nspawn: By default nspawn uses the same hierarchies as the host.  If
  UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all.  If
  0, legacy for all.

* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
  three options - legacy, only systemd controller on unified, and unified.  The
  value is passed into mount setup functions and controls cgroup configuration.

* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
  option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
  appropriate action depending on the configuration of the host.

v2: - CGroupUnified enum replaces open coded integer values to indicate the
      cgroup operation mode.
    - Various style updates.

v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.

v4: Restored legacy container on unified host support and fixed another bug in
    detect_unified_cgroup_hierarchy().
2016-08-17 17:44:36 -04:00
Tejun Heo ca2f6384aa core: rename cg_unified() to cg_all_unified()
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies.  This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is.  In preparation, this patch renames cg_unified() to
cg_all_unified().

This patch doesn't cause any functional changes.
2016-08-15 18:13:36 -04:00
Lennart Poettering 07a1734a13 Merge pull request #3885 from keszybz/help-output
Update help for "short-full" and shorten to 80 columns
2016-08-04 16:11:38 +02:00
Zbigniew Jędrzejewski-Szmek 90b4a64d77 nspawn,resolve: short --help output to fit within 80 columns
make dist-check-help FTW!
2016-08-04 09:03:42 -04:00
Lennart Poettering f7b7b3df9e nspawn: if we can't mark the boot ID RO let's fail
It's probably better to be safe here.
2016-08-03 14:52:16 +02:00
Lennart Poettering a6b5216c7c nspawn: deprecate --share-system support
This removes the --share-system switch: from the documentation, the --help text
as well as the command line parsing. It's an ugly option, given that it kinda
contradicts the whole concept of PID namespaces that nspawn implements. Since
it's barely ever used, let's just deprecate it and remove it from the options.

It might be useful as a debugging option, hence the functionality is kept
around for now, exposed via an undocumented $SYSTEMD_NSPAWN_SHARE_SYSTEM
environment variable.
2016-08-03 14:52:16 +02:00
Lennart Poettering 3539724c26 nspawn: try to bind mount resolved's resolv.conf snippet into the container
This has the benefit that the container can follow the host's DNS server
changes without us having to constantly update the container's resolv.conf
settings.
2016-08-03 14:52:16 +02:00
Christian Brauner 5a8ff0e61d nspawn: add SYSTEMD_NSPAWN_USE_CGNS env variable (#3809)
SYSTEMD_NSPAWN_USE_CGNS allows to disable the use of cgroup namespaces.
2016-07-26 16:49:15 +02:00
Zbigniew Jędrzejewski-Szmek e28973ee18 Merge pull request #3757 from poettering/efi-search 2016-07-25 16:34:18 -04:00
Lennart Poettering 1a0b98c437 Merge pull request #3589 from brauner/cgroup_namespace
Cgroup namespace
2016-07-25 22:23:00 +02:00
Zbigniew Jędrzejewski-Szmek 476b8254d9 nspawn: don't skip cleanup on locking error 2016-07-22 21:25:09 -04:00
Zbigniew Jędrzejewski-Szmek d710aaf7a5 Use "return log_error_errno" in more places" 2016-07-22 21:25:09 -04:00
Zbigniew Jędrzejewski-Szmek 31b14fdb6f Merge pull request #3777 from poettering/id128-rework
uuid/id128 code rework
2016-07-22 21:18:41 -04:00
Alessandro Puccetti 54cd6556b3 nspawn: set DevicesPolicy closed and clean up duplicated devices 2016-07-22 16:08:26 +02:00
Lennart Poettering 15b1248a6b machine-id-setup: port machine_id_commit() to new id128-util.c APIs 2016-07-22 12:59:36 +02:00
Lennart Poettering 317feb4d9f nspawn: rework /etc/machine-id handling
With this change we'll no longer write to /etc/machine-id from nspawn, as that
breaks the --volatile= operation, as it ensures the image is never considered
in "first boot", since that's bound to the pre-existance of /etc/machine-id.

The new logic works like this:

- If /etc/machine-id already exists in the container, it is read by nspawn and
  exposed in "machinectl status" and friends.

- If the file doesn't exist yet, but --uuid= is passed on the nspawn cmdline,
  this UUID is passed in $container_uuid to PID 1, and PID 1 is then expected
  to persist this to /etc/machine-id for future boots (which systemd already
  does).

- If the file doesn#t exist yet, and no --uuid= is passed a random UUID is
  generated and passed via $container_uuid.

The result is that /etc/machine-id is never initialized by nspawn itself, thus
unbreaking the volatile mode. However still the machine ID configured in the
machine always matches nspawn's and thus machined's idea of it.

Fixes: #3611
2016-07-22 12:59:36 +02:00
Lennart Poettering 691675ba9f nspawn: rework machine/boot ID handling code to use new calls from id128-util.[ch] 2016-07-22 12:59:36 +02:00
Lennart Poettering 910fd145f4 sd-id128: split UUID file read/write code into new id128-util.[ch]
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.

The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().

In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
2016-07-22 12:59:36 +02:00
Lennart Poettering 3bbaff3e08 tree-wide: use sd_id128_is_null() instead of sd_id128_equal where appropriate
It's a bit easier to read because shorter. Also, most likely a tiny bit faster.
2016-07-22 12:38:08 +02:00
Martin Pitt 5c3c778014 Merge pull request #3764 from poettering/assorted-stuff-2
Assorted fixes
2016-07-22 09:10:04 +02:00
Alessandro Puccetti 31d28eabc1 nspawn: enable major=0/minor=0 devices inside the container (#3773)
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
2016-07-21 17:39:38 +02:00
Lennart Poettering a6bc7db980 nspawn: if an ESP is part of the disk image to operate on, mount it to /efi or /boot
Matching the behaviour of gpt-auto-generator, if we find an ESP while
dissecting a container image, mount it to /efi or /boot if those dirs exist and
are empty.

This should enable us to run "bootctl" inside a container and do the right
thing.
2016-07-21 11:10:35 +02:00
Lennart Poettering 1ddc1272e7 nspawn: when netns is on, mount /proc/sys/net writable
Normally we make all of /proc/sys read-only in a container, but if we do have
netns enabled we can make /proc/sys/net writable, as things are virtualized
then.
2016-07-20 14:53:15 +02:00
Lennart Poettering 065d31c360 nspawn: document why the uid shift range is the way it is 2016-07-20 14:53:15 +02:00
Thomas Hindoe Paaboel Andersen ba19c6e181 treewide: remove unused variables 2016-07-18 22:32:08 +02:00
tblume 201b13c81e nspawn: decrease mkdir error logging in /sys to debug priority (#3748)
Such mkdir errors happen for example when trying to mkdir /sys/fs/selinux.

/sys is documented to be readonly in the container, so mkdir errors below /sys
can be expected.
They shouldn't be logged as warnings since they lead users to think that
there is something wrong.
2016-07-18 12:23:08 +02:00
Zbigniew Jędrzejewski-Szmek 2ed968802c tree-wide: get rid of selinux_context_t (#3732)
9eb9c93275
deprecated selinux_context_t. Replace with a simple char* everywhere.

Alternative fix for #3719.
2016-07-15 18:44:02 +02:00
Michael Biebl 595bfe7df2 Various fixes for typos found by lintian (#3705) 2016-07-12 12:52:11 +02:00
Torstein Husebø 61233823aa treewide: fix typos and remove accidental repetition of words 2016-07-11 16:18:43 +02:00
Christian Brauner 0996ef00fb nspawn: handle cgroup namespaces
(NOTE: Cgroup namespaces work with legacy and unified hierarchies: "This is
completely backward compatible and will be completely invisible to any existing
cgroup users (except for those running inside a cgroup namespace and looking at
/proc/pid/cgroup of tasks outside their namespace.)"
(https://lists.linuxfoundation.org/pipermail/containers/2016-January/036582.html)
So there is no need to special case unified.)

If cgroup namespaces are supported we skip mount_cgroups() in the
outer_child(). Instead, we unshare(CLONE_NEWCGROUP) in the inner_child() and
only then do we call mount_cgroups().
The clean way to handle cgroup namespaces would be to delegate mounting of
cgroups completely to the init system in the container. However, this would
likely break backward compatibility with the UNIFIED_CGROUP_HIERARCHY flag of
systemd-nspawn. Also no cgroupfs would be mounted whenever the user simply
requests a shell and no init is available to mount cgroups. Hence, we introduce
mount_legacy_cgns_supported(). After calling unshare(CLONE_NEWCGROUP) it parses
/proc/self/cgroup to find the mounted controllers and mounts them inside the
new cgroup namespace. This should preserve backward compatibility with the
UNIFIED_CGROUP_HIERARCHY flag and mount a cgroupfs when no init in the
container is running.
2016-07-09 06:34:11 +02:00
Torstein Husebø 6dd6a9c493 treewide: fix typos 2016-07-04 17:10:23 +02:00
Lennart Poettering 0c6aeb4609 nspawn: fix uid patching logic (#3599)
An incorrectly set if/else chain caused aus to apply the access mode of a
symlink to the directory it is located in. Yuck.

Fixes: #3547
2016-06-25 07:04:43 +03:00
Lennart Poettering 54a17e01de nspawn: lock down system call filter a bit
Let's block access to the kernel keyring and a number of obsolete system calls.
Also, update list of syscalls that may alter the system clock, and do raw IO
access. Filter ptrace() if CAP_SYS_PTRACE is not passed to the container and
acct() if CAP_SYS_PACCT is not passed.

This also changes things so that kexec(), some profiling calls, the swap calls
and quotactl() is never available to containers, not even if CAP_SYS_ADMIN is
passed. After all we currently permit CAP_SYS_ADMIN to containers by default,
but these calls should not be available, even then.
2016-06-13 16:25:54 +02:00
Lennart Poettering 50b52222f2 nspawn: order caps to retain alphabetically 2016-06-13 16:25:54 +02:00
Alessandro Puccetti 9c1e04d0fa nspawn: introduce --notify-ready=[no|yes] (#3474)
This the patch implements a notificaiton mechanism from the init process
in the container to systemd-nspawn.
The switch --notify-ready=yes configures systemd-nspawn to wait the "READY=1"
message from the init process in the container to send its own to systemd.
--notify-ready=no is equivalent to the previous behavior before this patch,
systemd-nspawn notifies systemd with a "READY=1" message when the container is
created. This notificaiton mechanism uses socket file with path relative to the contanier
"/run/systemd/nspawn/notify". The default values it --notify-ready=no.
It is also possible to configure this mechanism from the .nspawn files using
NotifyReady. This parameter takes the same options of the command line switch.

Before this patch, systemd-nspawn notifies "ready" after the inner child was created,
regardless the status of the service running inside it. Now, with --notify-ready=yes,
systemd-nspawn notifies when the service is ready. This is really useful when
there are dependencies between different contaniers.

Fixes https://github.com/systemd/systemd/issues/1369
Based on the work from https://github.com/systemd/systemd/pull/3022

Testing:
Boot a OS inside a container with systemd-nspawn.
Note: modify the commands accordingly with your filesystem.

1. Create a filesystem where you can boot an OS.
2. sudo systemd-nspawn -D ${HOME}/distros/fedora-23/ sh
2.1. Create the unit file /etc/systemd/system/sleep.service inside the container
     (You can use the example below)
2.2. systemdctl enable sleep
2.3 exit
3. sudo systemd-run --service-type=notify --unit=notify-test
   ${HOME}/systemd/systemd-nspawn --notify-ready=yes
   -D ${HOME}/distros/fedora-23/ -b
4. In a different shell run "systemctl status notify-test"

When using --notify-ready=yes the service status is "activating" for 20 seconds
before being set to "active (running)". Instead, using --notify-ready=no
the service status is marked "active (running)" quickly, without waiting for
the 20 seconds.

This patch was also test with --private-users=yes, you can test it just adding it
at the end of the command at point 3.

------ sleep.service ------
[Unit]
Description=sleep
After=network.target

[Service]
Type=oneshot
ExecStart=/bin/sleep 20

[Install]
WantedBy=multi-user.target
------------ end ------------
2016-06-10 13:09:06 +02:00
Michael Karcher 8869a0b40b util-lib: Add sparc64 support for process creation (#3348)
The current raw_clone function takes two arguments, the cloning flags and
a pointer to the stack for the cloned child. The raw cloning without
passing a "thread main" function does not make sense if a new stack is
specified, as it returns in both the parent and the child, which will fail
in the child as the stack is virgin. All uses of raw_clone indeed pass NULL
for the stack pointer which indicates that both processes should share the
stack address (so you better don't pass CLONE_VM).

This commit refactors the code to not require the caller to pass the stack
address, as NULL is the only sensible option. It also adds the magic code
needed to make raw_clone work on sparc64, which does not return 0 in %o0
for the child, but indicates the child process by setting %o1 to non-zero.
This refactoring is not plain aesthetic, because non-NULL stack addresses
need to get mangled before being passed to the clone syscall (you have to
apply STACK_BIAS), whereas NULL must not be mangled. Implementing the
conditional mangling of the stack address would needlessly complicate the
code.

raw_clone is moved to a separete header, because the burden of including
the assert machinery and sched.h shouldn't be applied to every user of
missing_syscalls.h
2016-05-29 20:03:51 -04:00
Djalal Harouni 520e0d541f nspawn: rename arg_retain to arg_caps_retain
The argument is about capabilities.
2016-05-26 22:43:34 +02:00
Djalal Harouni f011b0b87a nspawn: split out seccomp call into nspawn-seccomp.[ch]
Split seccomp into nspawn-seccomp.[ch]. Currently there are no changes,
but this will make it easy in the future to share or use the seccomp logic
from systemd core.
2016-05-26 22:42:29 +02:00
Djalal Harouni 231bfb1b02 nspawn: rename is_procfs_sysfs_or_suchlike() to is_fs_fully_userns_compatible()
Rename is_procfs_sysfs_or_suchlike() to is_fs_fully_userns_compatible()
to give it the real meaning. This may prevent future modifications that
may introduce bugs.
2016-05-26 22:39:34 +02:00
Djalal Harouni 87c05f365d nspawn: a bench of special fileystems that should not be shifted
Add some special filesystems that should not be shifted, most of them
relate to the host and not to containers.
2016-05-26 22:38:25 +02:00
Zbigniew Jędrzejewski-Szmek b5a2179b10 nspawn: remove unreachable return statement (#3320) 2016-05-22 13:02:41 +02:00
Lennart Poettering 2099b3e993 nspawn: drop spurious newline 2016-05-12 20:14:58 +02:00
Lennart Poettering 7513c5b89f nspawn: only remove veth links we created ourselves
Let's make sure we don't remove veth links that existed before nspawn was
invoked.

https://github.com/systemd/systemd/pull/3209#discussion_r62439999
2016-05-09 15:45:31 +02:00
Lennart Poettering d31645adef tree-wide: port more code to use ifname_valid() 2016-05-09 15:45:31 +02:00
Lennart Poettering 22b28dfdc7 nspawn: add new --network-zone= switch for automatically managed bridge devices
This adds a new concept of network "zones", which are little more than bridge
devices that are automatically managed by nspawn: when the first container
referencing a bridge is started, the bridge device is created, when the last
container referencing it is removed the bridge device is removed again. Besides
this logic --network-zone= is pretty much identical to --network-bridge=.

The usecase for this is to make it easy to run multiple related containers
(think MySQL in one and Apache in another) in a common, named virtual Ethernet
broadcast zone, that only exists as long as one of them is running, and fully
automatically managed otherwise.
2016-05-09 15:45:31 +02:00
Lennart Poettering ef76dff225 util-lib: add new ifname_valid() call that validates interface names
Make use of this in nspawn at a couple of places. A later commit should port
more code over to this, including networkd.
2016-05-09 15:45:31 +02:00
Zbigniew Jędrzejewski-Szmek 5ab1cef0db Merge pull request #3111 from poettering/nspawn-remove-veth 2016-05-03 13:53:00 -04:00
Zbigniew Jędrzejewski-Szmek c29f959b44 Revert "nspawn: explicitly remove veth links after use (#3111)"
This reverts commit d2773e59de.

Merge got squashed by mistake.
2016-05-03 13:53:00 -04:00
Evgeny Vereshchagin e192a2815e nspawn: convert uuid to string (#3146)
Fixes:
cp /etc/machine-id /var/tmp/systemd-test.HccKPa/nspawn-root/etc
systemd-nspawn -D /var/tmp/systemd-test.HccKPa/nspawn-root --link-journal host -b
...
Host and machine ids are equal (P�S!V): refusing to link journals
2016-04-29 10:38:35 +02:00
Evgeny Vereshchagin 5aa3eba50c nspawn: initialize the veth_name (#3141)
Fixes:
$ systemd-nspawn -h
...
Failed to remove veth interface ����: Operation not permitted

This is a follow-up for d2773e59de
2016-04-28 19:48:17 +02:00
Lennart Poettering d7fe83bbc2 Merge pull request #3093 from poettering/nspawn-userns-magic
nspawn automatic user namespaces
2016-04-26 14:57:04 +02:00
Lennart Poettering d2773e59de nspawn: explicitly remove veth links after use (#3111)
* sd-netlink: permit RTM_DELLINK messages with no ifindex

This is useful for removing network interfaces by name.

* nspawn: explicitly remove veth links we created after use

Sometimes the kernel keeps veth links pinned after the namespace they have been
joined to died. Let's hence explicitly remove veth links after use.

Fixes: #2173
2016-04-25 17:36:51 +02:00
Lennart Poettering ef3b2aa7a1 nspawn: explicitly remove veth links we created after use
Sometimes the kernel keeps veth links pinned after the namespace they have been
joined to died. Let's hence explicitly remove veth links after use.

Fixes: #2173
2016-04-25 13:44:24 +02:00
Lennart Poettering 4aeb20f5aa nspawn: when readjusting UID/GID ownership of OS trees, skip read-only subtrees
This should allow tools like rkt to pre-mount read-only subtrees in the OS
tree, without breaking the patching code.

Note that the code will still fail, if the top-level directory is already
read-only.
2016-04-25 12:50:13 +02:00
Lennart Poettering 88cd066e11 nspawn: don't try to patch UIDs/GIDs of procfs and suchlike 2016-04-25 12:50:06 +02:00
Lennart Poettering ccabee0d64 nspawn: make -U a tiny bit smarter
With this change -U will turn on user namespacing only if the kernel actually
supports it and otherwise gracefully degrade to non-userns mode.
2016-04-25 12:16:02 +02:00
Lennart Poettering 0de7accea9 nspawn: allow configuration of user namespaces in .nspawn files
In order to implement this we change the bool arg_userns into an enum
UserNamespaceMode, which can take one of NO, PICK or FIXED, and replace the
arg_uid_range_pick bool with it.
2016-04-25 12:16:02 +02:00
Lennart Poettering 19aac838fc nspawn: add -U as shortcut for --private-users=pick
Given that user namespacing is pretty useful now, let's add a shortcut command
line switch for the logic.
2016-04-25 12:16:02 +02:00
Lennart Poettering 0e7ac7515f nspawn: optionally, automatically allocate a UID/GID range for userns containers
This adds the new value "pick" to --private-users=. When specified a new
UID/GID range of 65536 users is automatically and randomly allocated from the
host range 0x00080000-0xDFFF0000 and used for the container. The setting
implies --private-users-chown, so that container directory is recursively
chown()ed to the newly allocated UID/GID range, if that's necessary. As an
optimization before picking a randomized UID/GID the UID of the container's
root directory is used as starting point and used if currently not used
otherwise.

To protect against using the same UID/GID range multiple times a few mechanisms
are in place:

- The first and the last UID and GID of the range are checked with getpwuid()
  and getgrgid(). If an entry already exists a different range is picked. Note
  that by "last" UID the user 65534 is used, as 65535 is the 16bit (uid_t) -1.

- A lock file for the range is taken in /run/systemd/nspawn-uid/. Since the
  ranges are taken in a non-overlapping fashion, and always start on 64K
  boundaries this allows us to maintain a single lock file for each range that
  can be randomly picked. This protects nspawn from picking the same range in
  two parallel instances.

- If possible the /etc/passwd lock file is taken while a new range is selected
  until the container is up. This means adduser/addgroup should safely avoid
  the range as long as nss-mymachines is used, since the allocated range will
  then show up in the user database.

The UID/GID range nspawn picks from is compiled in and not configurable at the
moment. That should probably stay that way, since we already provide ways how
users can pick their own ranges manually if they don't like the automatic
logic.

The new --private-users=pick logic makes user namespacing pretty useful now, as
it relieves the user from managing UID/GID ranges.
2016-04-25 12:16:02 +02:00
Lennart Poettering 7336138eed nspawn: optionally fix up OS tree uid/gids for userns
This adds a new --private-userns-chown switch that may be used in combination
with --private-userns. If it is passed a recursive chmod() operation is run on
the OS tree, fixing all file owner UID/GIDs to the right ranges. This should
make user namespacing pretty workable, as the OS trees don't need to be
prepared manually anymore.
2016-04-25 12:15:57 +02:00
Thomas H. P. Andersen 0f5e13822d tree-wide: remove unused variables (#3098) 2016-04-22 20:49:07 -04:00
Lennart Poettering 20b1644140 shared: move unit-specific code from bus-util.h to bus-unit-util.h
Previously we'd have generally useful sd-bus utilities in bust-util.h,
intermixed with code that is specifically for writing clients for PID 1,
wrapping job and unit handling. Let's split the latter out and move it into
bus-unit-util.c, to make the sources a bit short and easier to grok.
2016-04-22 16:06:20 +02:00
Zbigniew Jędrzejewski-Szmek ccddd104fc tree-wide: use mdash instead of a two minuses 2016-04-21 23:00:13 -04:00
Zbigniew Jędrzejewski-Szmek a5f1cb3bad nspawn: add -E as alias for --setenv
v2:
- "=" is required, so remove the <optional> tags that v1 added
2016-04-20 09:00:39 -04:00
Lennart Poettering 70a399c43a Merge pull request #3014 from msekletar/nspawn-empty-machine-id-v3
nspawn: always setup machine id (v3)
2016-04-11 17:27:11 +02:00
Michal Sekletar e01ff70a77 nspawn: always setup machine id
We check /etc/machine-id of the container and if it is already populated
we use value from there, possibly ignoring value of --uuid option from
the command line. When dealing with R/O image we setup transient machine
id.

Once we determined machine id of the container, we use this value for
registration with systemd-machined and we also export it via
container_uuid environment variable.

As registration with systemd-machined is done by the main nspawn process
we communicate container machine id established by setup_machine_id from
outer child to the main process by unix domain socket. Similarly to PID
of inner child.
2016-04-11 16:43:16 +02:00
Zbigniew Jędrzejewski-Szmek d929b0f98b nspawn: ignore failure to chdir
CID #1322380.
2016-04-08 21:09:06 -04:00
Bjørnar Ness b97e83cb52 prevent systemd-nspawn from trying to create target
for bind-mounts when they already exist. This allows
bind-mounting over read-only files.
2016-04-01 17:31:55 +02:00
Tejun Heo ab2c3861dc core: update populated event handling in unified hierarchy
Earlier during the development of unified hierarchy, the populated event was
reported through by the dedicated "cgroup.populated" file; however, the
interface was updated so that it's reported through the "populated" field of
"cgroup.events" file.  Update populated event handling logic accordingly.
2016-03-26 12:05:57 -04:00
Alban Crequy 099619957a cgroup2: use new fstype for unified hierarchy
Since Linux v4.4-rc1, __DEVEL__sane_behavior does not exist anymore and
is replaced by a new fstype "cgroup2".

With this patch, systemd no longer supports the old (unstable) way of
doing unified hierarchy with __DEVEL__sane_behavior and systemd now
requires Linux v4.4 for unified hierarchy.

Non-unified hierarchy is still the default and is unchanged by this
patch.

67e9c74b8a
2016-03-26 12:05:29 -04:00
Evgeny Vereshchagin 1c1ea21735 nspawn: don't run nspawn --port=... without libiptc support
We get
$ systemd-nspawn --image /dev/loop1 --port 8080:80 -n -b 3
--port= is not supported, compiled without libiptc support.

instead of a ping-nc-iptables debugging session
2016-03-17 21:07:11 +00:00
Tobias Klauser 998fdc16aa nspawn: Fix two misspellings of "hierarchy" in error messages 2016-03-16 14:34:00 +01:00
Dan Walsh 68b020494d /dev/console must be labeled with SELinux label
If the user specifies an selinux_apifs_context all content created in
the container including /dev/console should use this label.

Currently when this uses the default label it gets labeled user_devpts_t,
which would require us to write a policy allowing container processes to
manage user_devpts_t.  This means that an escaped process would be allowed
to attack all users terminals as well as other container terminals.  Changing
the label to match the apifs_context, means the processes would only be allowed
to manage their specific tty.

This change fixes a problem preventing RKT containers from working with systemd-nspawn.
2016-03-09 11:19:45 -05:00
Vito Caputo 9ed794a32d tree-wide: minor formatting inconsistency cleanups 2016-02-23 14:20:34 -08:00
Vito Caputo 313cefa1d9 tree-wide: make ++/-- usage consistent WRT spacing
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands.  Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
2016-02-22 20:32:04 -08:00
Lennart Poettering 91ba5ac7d0 Merge pull request #2589 from keszybz/resolve-tool-2
Better support of OPENPGPKEY, CAA, TLSA packets and tests
2016-02-13 11:15:41 +01:00
Zbigniew Jędrzejewski-Szmek 75f32f047c Add memcpy_safe
ISO/IEC 9899:1999 §7.21.1/2 says:
Where an argument declared as size_t n specifies the length of the array
for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a call
shall still have valid values, as described in 7.1.4.

In base64_append_width memcpy was called as memcpy(x, NULL, 0).  GCC 4.9
started making use of this and assumes This worked fine under -O0, but
does something strange under -O3.

This patch fixes a bug in base64_append_width(), fixes a possible bug in
journal_file_append_entry_internal(), and makes use of the new function
to simplify the code in other places.
2016-02-11 13:07:02 -05:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Lennart Poettering 2b26a72816 nspawn: make sure --help fits it 79ch 2016-02-03 23:58:25 +01:00
Lennart Poettering 7732f92bad nspawn: optionally run a stub init process as PID 1
This adds a new switch --as-pid2, which allows running commands as PID 2, while a stub init process is run as PID 1.
This is useful in order to run arbitrary commands in a container, as PID1's semantics are different from all other
processes regarding reaping of unknown children or signal handling.
2016-02-03 23:58:24 +01:00
Lennart Poettering 5f932eb9af nspawn: add new --chdir= switch
Fixes: #2192
2016-02-03 23:58:24 +01:00
Lennart Poettering cab2aca3e7 core: fix support for transient resource limit properties
Make sure we can properly process resource limit properties. Specifically, allow transient configuration of both the
soft and hard limit, the same way from the unit files. Previously, only the the hard rlimits could be configured but
they'd implicitly spill into the soft hard rlimits.

This also updates the client-side code to be able to parse hard/soft resource limit specifications. Since we need to
serialize two properties in bus_append_unit_property_assignment() now, the marshalling of the container around it is
now moved into the function itself. This has the benefit of shortening the calling code.

As a side effect this now beefs up the rlimit parser of "systemctl set-property" to understand time and disk sizes
where that's appropriate.
2016-02-01 22:18:16 +01:00
Lennart Poettering ba8e6c4d0e nspawn: make sure --link-journal=host may be used twice in a row
Fixes #2186

This fixes fall-out from 574edc9006.
2016-01-28 20:24:28 +01:00
Lennart Poettering 8054d749c4 nspawn: make journal linking non-fatal in try and auto modes
Fixes #2091
2016-01-28 20:16:44 +01:00
Michal Sekletar 61e741ed3d nspawn: fix memory leak 2016-01-25 12:06:38 +01:00
Ismo Puustinen a103496ca5 capabilities: keep bounding set in non-inverted format.
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
2016-01-12 12:14:50 +02:00
Daniel Mack d2b8497d3c nspawn: fix two typos in error messages
On errors, mention the functions that really failed.
2016-01-06 14:57:29 +01:00
Alban Crequy aa3be0dd36 nspawn: userns and unified cgroup: chown cgroup.events
When starting a container in a new user namespace, systemd-nspawn chowns
the cgroup knob files so they are usable by the container. But the
cgroup knob file "cgroup.events" was missing. This file exists when the
unified hierarchy is used.
2015-12-28 14:30:56 +01:00
Alban Crequy b370fec2b9 nspawn: set TasksMax in machined instead of nspawn
https://github.com/systemd/systemd/issues/2016
2015-12-04 23:36:39 +01:00
Lennart Poettering 4afd3348c7 tree-wide: expose "p"-suffix unref calls in public APIs to make gcc cleanup easy
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.

With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.

The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).

This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.

Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:

       #define _cleanup_(function) __attribute__((cleanup(function)))

Or similar, to make the gcc feature easier to use.

Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.

See #2008.
2015-11-27 19:19:36 +01:00
Thomas Hindoe Paaboel Andersen 71d35b6b55 tree-wide: sort includes in *.h
This is a continuation of the previous include sort patch, which
only sorted for .c files.
2015-11-18 23:09:02 +01:00
Daniel Mack a57246551a Merge pull request #1926 from phomes/include-order-libudev
tree-wide: group include of libudev.h with sd-*
2015-11-17 09:36:25 +01:00
Thomas Hindoe Paaboel Andersen b4bbcaa9c4 tree-wide: group include of libudev.h with sd-* 2015-11-17 07:06:08 +01:00
Lennart Poettering 4a0b58c4a3 tree-wide: use right cast macros for UIDs, GIDs and PIDs 2015-11-17 00:52:10 +01:00
Lennart Poettering 357bc17975 Merge pull request #1923 from zonque/siphash
siphash24: let siphash24_finalize() and siphash24() return the result…
2015-11-17 00:32:06 +01:00
Daniel Mack 933f9caeeb siphash24: let siphash24_finalize() and siphash24() return the result directly
Rather than passing a pointer to return the result, return it directly
from the function calls.

Also, return the result in native endianess, and let the callers care
about the conversion. For hash tables and bloom filters, we don't care,
but in order to keep MAC addresses and DHCP client IDs stable, we
explicitly convert to LE.
2015-11-16 23:17:52 +01:00
Thomas Hindoe Paaboel Andersen cf0fbc49e6 tree-wide: sort includes
Sort the includes accoding to the new coding style.
2015-11-16 22:09:36 +01:00
Tom Gundersen f5ed8d4a51 Merge pull request #1916 from zonque/align
siphash: alignment
2015-11-16 15:50:13 +01:00
Martin Pitt dbe81cbd2a siphash24: change result argument to uint64_t
Change the "out" parameter from uint8_t[8] to uint64_t. On architectures which
enforce pointer alignment this fixes crashes when we previously cast an
unaligned array to uint64_t*, and on others this should at least improve
performance as the compiler now aligns these properly.

This also simplifies the code in most cases by getting rid of typecasts. The
only place which we can't change is struct duid's en.id, as that is _packed_
and public API, so we can't enforce alignment of the "id" field and have to
use memcpy instead.
2015-11-16 15:20:29 +01:00
Lennart Poettering 541ec33075 nspawn: set TasksMax= for containers to 8192 by default 2015-11-16 11:58:04 +01:00
Lennart Poettering f6d6bad146 nspawn: add new --network-veth-extra= switch for defining additional veth links
The new switch operates like --network-veth, but may be specified
multiple times (to define multiple link pairs) and allows flexible
definition of the interface names.

This is an independent reimplementation of #1678, but defines different
semantics, keeping the behaviour completely independent of
--network-veth. It also comes will full hook-up for .nspawn files, and
the matching documentation.
2015-11-12 22:04:49 +01:00
Daniel Mack b0bc8dbd73 Merge pull request #1820 from michich/errno-v2
[v2] treewide: treatment of errno and other cleanups
2015-11-09 21:56:49 +01:00
Michal Schmidt e1427b138f treewide: apply errno.cocci
with small manual cleanups for style.
2015-11-09 20:01:06 +01:00
Lennart Poettering 6c9e781eba Merge pull request #1799 from jengelh/doc
doc: typo and ortho fixes
2015-11-09 18:16:21 +01:00
Iago López Galeiras 6aadfa4c52 nspawn: support custom container service name
We were hardcoding "systemd-nspawn" as the value of the $container env
variable and "nspawn" as the service string in machined registration.

This commit allows the user to configure it by setting the
$SYSTEMD_NSPAWN_CONTAINER_SERVICE env variable when calling
systemd-nspawn.

If $SYSTEMD_NSPAWN_CONTAINER_SERVICE is not set, we use the string
"systemd-nspawn" for both, fixing the previous inconsistency.
2015-11-09 16:40:05 +01:00
Jan Engelhardt a8eaaee72a doc: correct orthography, word forms and missing/extraneous words 2015-11-06 13:45:21 +01:00
Jan Engelhardt b938cb902c doc: correct punctuation and improve typography in documentation 2015-11-06 13:00:02 +01:00
Michal Schmidt 35607a8d1c nspawn: save errno before reopening log after exec failure 2015-11-05 13:44:12 +01:00
Michal Schmidt 070edd97f3 nspawn: no fake errno
The S_ISREG test does not set errno, so don't use it in the error
message.
2015-11-05 13:44:11 +01:00
Michal Schmidt 4314d33f51 nspawn: simplify error returns
Use the "return log_error_errno(...)" idiom to have fewer curly braces.

The last hunk also fixes the return value of setup_journal(), but the
fix has no practical effect.
2015-11-05 13:44:10 +01:00
Michal Schmidt 709f6e46a3 treewide: use the negative error codes returned by our functions
Our functions return negative error codes.
Do not rely on errno being set after calling our own functions.
2015-11-05 13:44:06 +01:00
Lennart Poettering 97044145b4 core,nspawn: minor coding style fixes 2015-10-31 19:09:20 +01:00
Susant Sahani 6cbe4ed1e1 nspwan: port to extract_first_word 2015-10-28 22:59:01 +05:30
Lennart Poettering 7b3e062cb6 process-util: move a couple of process-related calls over 2015-10-27 14:24:58 +01:00
Lennart Poettering b5efdb8af4 util-lib: split out allocation calls into alloc-util.[ch] 2015-10-27 13:45:53 +01:00
Lennart Poettering 15a5e95075 util-lib: split out printf() helpers to stdio-util.h 2015-10-27 13:25:57 +01:00
Lennart Poettering ee104e11e3 user-util: move UID/GID related macros from macro.h to user-util.h 2015-10-27 13:25:57 +01:00
Lennart Poettering 430f0182b7 src/basic: rename audit.[ch] → audit-util.[ch] and capability.[ch] → capability-util.[ch]
The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.
2015-10-27 13:25:57 +01:00
Lennart Poettering affb60b1ef util-lib: split out umask-related code to umask-util.h 2015-10-27 13:25:56 +01:00
Lennart Poettering 8fcde01280 util-lib: split stat()/statfs()/stavfs() related calls into stat-util.[ch] 2015-10-27 13:25:56 +01:00
Lennart Poettering f4f15635ec util-lib: move a number of fs operations into fs-util.[ch] 2015-10-27 13:25:56 +01:00
Lennart Poettering 4349cd7c1d util-lib: move mount related utility calls to mount-util.[ch] 2015-10-27 13:25:55 +01:00
Lennart Poettering 6bedfcbb29 util-lib: split string parsing related calls from util.[ch] into parse-util.[ch] 2015-10-27 13:25:55 +01:00
Lennart Poettering 2583fbea8e socket-util: move remaining socket-related calls from util.[ch] to socket-util.[ch] 2015-10-26 01:24:39 +01:00
Lennart Poettering b1d4f8e154 util-lib: split out user/group/uid/gid calls into user-util.[ch] 2015-10-26 01:24:38 +01:00
Lennart Poettering 3ffd4af220 util-lib: split out fd-related operations into fd-util.[ch]
There are more than enough to deserve their own .c file, hence move them
over.
2015-10-25 13:19:18 +01:00
Lennart Poettering 07630cea1f util-lib: split our string related calls from util.[ch] into its own file string-util.[ch]
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.

This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.

Also touches a few unrelated include files.
2015-10-24 23:05:02 +02:00
Lennart Poettering 4f5dd3943b util: split out escaping code into escape.[ch]
This really deserves its own file, given how much code this is now.
2015-10-24 23:04:42 +02:00
Lennart Poettering 0f03c2a4c0 path-util: unify how we process paths specified on the command line
Let's introduce a common function that makes relative paths absolute and
warns about any errors while doing so.
2015-10-24 23:03:49 +02:00
Lennart Poettering 0f47436510 util-lib: get_current_dir_name() can return errors other than ENOMEM
get_current_dir_name() can return a variety of errors, not just ENOMEM,
hence don't blindly turn its errors to ENOMEM, but return correct errors
in path_make_absolute_cwd().

This trickles down into a couple of other functions, some of which
receive unrelated minor fixes too with this commit.
2015-10-24 23:03:49 +02:00
Lennart Poettering 16fb773ee3 nspawn: don't try to resolve passed binary before entering namespace
Othewise we might follow the symlinks on the host, instead of the
container.

Fixes #1400
2015-10-22 01:59:25 +02:00
Lennart Poettering 0e2656744f nspawn: rework how we determine private networking settings
Make sure we acquire CAP_NET_ADMIN if we require virtual networking.

Make sure we imply virtual ethernet correctly when bridge is request.

Fixes: #1511
Fixes: #1554
Fixes: #1590
2015-10-22 01:59:25 +02:00
Lennart Poettering 5bcd08db28 btrfs: beef-up btrfs support with a limited understanding of quota
With this change we understand more than just leaf quota groups for
btrfs file systems. Specifically:

- When we create a subvolume we can now optionally add the new subvolume
  to all qgroups its parent subvolume was member of too. Alternatively
  it is also possible to insert an intermediary quota group between the
  parent's qgroups and the subvolume's leaf qgroup, which is useful for
  a concept of "subtree" qgroups, that contain a subvolume and all its
  children.

- The remove logic for subvolumes has been updated to optionally remove
  any leaf qgroups or "subtree" qgroups, following the logic above.

- The snapshot logic for subvolumes has been updated to replicate the
  original qgroup setup of the source, if it follows the "subtree"
  design described above. It will not cover qgroup setups that introduce
  arbitrary qgroups, especially those orthogonal to the subvolume
  hierarchy.

This also tries to be more graceful when setting up /var/lib/machines as
btrfs. For example, if mkfs.btrfs is missing we don't even try to set it
up as loopback device.

Fixes #1559
Fixes #1129
2015-10-22 01:59:25 +02:00
Iago López Galeiras d167824896 nspawn: skip /sys-as-tmpfs if we don't use private-network
Since v3.11/7dc5dbc ("sysfs: Restrict mounting sysfs"), the kernel
doesn't allow mounting sysfs if you don't have CAP_SYS_ADMIN rights over
the network namespace.

So the mounting /sys as a tmpfs code introduced in
d8fc6a000f doesn't work with user
namespaces if we don't use private-net. The reason is that we mount
sysfs inside the container and we're in the network namespace of the host
but we don't have CAP_SYS_ADMIN over that namespace.

To fix that, we mount /sys as a sysfs (instead of tmpfs) if we don't use
private network and ignore the /sys-as-a-tmpfs code if we find that /sys
is already mounted as sysfs.

Fixes #1555
2015-10-20 10:19:23 +02:00
Mirco Tischler 88e1057286 nspawn: create /sys/fs/cgroup for unified hierarchy as well 2015-10-09 13:12:08 +02:00
Lennart Poettering ae3dde8012 machinectl: fix race when opening new shells with "machinectl shell"
Previously, we'd allocate the TTY, spawn a service on it, but
immediately start processing the TTY and forwarding it to whatever the
commnd was started on. This is however problematic, as the TTY might get
actually opened only much later by the service. We'll hence first get
EIOs on the master as the other side is still closed, and hence
considered it hung up and terminated the session.

With this change we add a flag to the pty forwarding logic:
PTY_FORWARD_IGNORE_INITIAL_VHANGUP. If set, we'll ignore all hangups
(i.e. EIOs) on the master PTY until the first byte is successfully read.
From that point on we consider a hangup/EIO a regular connection termination. This
way, we handle the race: when we get EIO initially we'll ignore it,
until the connection is properly set up, at which time we start
honouring it.
2015-10-07 20:10:48 +02:00
Lennart Poettering 12ca818ffd tree-wide: clean up log_syntax() usage
- Rely everywhere that we use abs() on the error code passed in anyway,
  thus don't need to explicitly negate what we pass in

- Never attach synthetic error number information to log messages. Only
  log about errors we *receive* with the error number we got there,
  don't log any synthetic error, that don#t even propagate, but just eat
  up.

- Be more careful with attaching exactly the error we get, instead of
  errno or unrelated errors randomly.

- Fix one occasion where the error number and line number got swapped.

- Make sure we never tape over OOM issues, or inability to resolve
  specifiers
2015-09-30 22:26:16 +02:00
Lennart Poettering d8fc6a000f nspawn: mount /sys as tmpfs, and then mount only select subdirs of the real sysfs below it
This way we can hide things like /sys/firmware or /sys/hypervisor from
the container, while keeping the device tree around.

While this is a security benefit in itself it also allows us to fix
issue #1277.

Previously we'd mount /sys before creating the user namespace, in order
to be able to mount /sys/fs/cgroup/* beneath it (which resides in it),
which we can only mount outside of the user namespace. To ensure that
the user namespace owns the network namespace we'd set up the network
namespace at the same time as the user namespace. Thus, we'd still see
the /sys/class/net/ from the originating network namespace, even though
we are in our own network namespace now. With this patch, /sys is
mounted before transitioning into the user namespace as tmpfs, so that
we can also mount /sys/fs/cgroup/* into it this early. The directories
such as /sys/class/ are then later added in from the real sysfs from
inside the network and user namespace so that they actually show whatis
available in it.

Fixes #1277
2015-09-30 15:19:33 +02:00
Lennart Poettering 403af78c80 nspawn: fix user namespace support
We didn#t actually pass ownership of /run to the UID in the container
since some releases, let's fix that.
2015-09-30 12:48:17 +02:00
Lennart Poettering db3b1dedb2 nspawn: order includes 2015-09-30 12:24:06 +02:00
Lennart Poettering ee30f6ac32 nspawn: make sure mount_legacy_cgroup_hierarchy() can deal with NULL root directories 2015-09-30 12:23:33 +02:00
Lennart Poettering 3f6fd1ba65 util: introduce common version() implementation and use it everywhere
This also allows us to drop build.h from a ton of files, hence do so.
Since we touched the #includes of those files, let's order them properly
according to CODING_STYLE.
2015-09-29 21:08:37 +02:00
Lennart Poettering 189d5bac5c util: unify implementation of NOP signal handler
This is highly complex code after all, we really should make sure to
only keep one implementation of this extremely difficult function
around.
2015-09-29 21:08:37 +02:00
Lennart Poettering 2feceb5eb9 tree-wide: take benefit of the fact that fdset_free() returns NULL 2015-09-29 21:08:37 +02:00
Lennart Poettering 3ee897d6c2 tree-wide: port more code to use send_one_fd() and receive_one_fd()
Also, make it slightly more powerful, by accepting a flags argument, and
make it safe for handling if more than one cmsg attribute happens to be
attached.
2015-09-29 21:08:37 +02:00
Krzesimir Nowak c0ffce2bd1 nspawn, machined: fix comments and error messages
A bunch of "Client -> Child" fixes and one barrier-enumerator fix.

(David: rebased on master)
2015-09-22 14:17:03 +02:00
Krzesimir Nowak 327e26d689 nspawn: close unneeded sockets in outer child
(David: Note, this is just a cleanup and doesn't fix any bugs)
2015-09-22 14:11:44 +02:00
David Herrmann d960371482 util: introduce {send,receive}_one_fd()
Introduce two new helpers that send/receive a single fd via a unix
transport. Also make nspawn use them instead of hard-coding it.

Based on a patch by Krzesimir Nowak.
2015-09-22 14:09:54 +02:00
Lennart Poettering 59f448cf15 tree-wide: never use the off_t unless glibc makes us use it
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.

Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
2015-09-10 18:16:18 +02:00
Lennart Poettering 38c2ef1d55 nspawn: add missing comma to gperf file 2015-09-09 08:36:20 +02:00
Lennart Poettering 82116c4329 nspawn: also close uid shift socket in the parent
We should really close all parent sides of our child/parent socket
pairs.
2015-09-08 01:22:46 +02:00