Commit graph

1115 commits

Author SHA1 Message Date
Daan De Meyer bbd407ea2b nspawn: Don't mount read-only if we have a custom mount on root. 2020-01-03 14:06:38 +01:00
Lennart Poettering 12da859a3f
Merge pull request #14401 from DaanDeMeyer/nspawn-move-veth-back-to-host
nspawn: move virtual interfaces added with --network-interface back to the host
2020-01-03 12:47:03 +01:00
Kai Krakow bc5ea049f2 nspawn: Generate unique short veth names
This commit lowers the chance of having veth name conflicts for machines
created with similar names.

Replaces: #12865
Fixes: #13417
2020-01-02 20:05:42 +01:00
Daan De Meyer 5b4855ab73 nspawn: Move --network-interface interfaces back to the host. 2020-01-02 14:13:03 +01:00
Daan De Meyer b390f17892 nspawn-network: Split off udev checking from parse_interface. 2019-12-23 18:47:36 +01:00
Lennart Poettering 19ac32cdd6 docs: import discoverable partitions spec
This was previously available here:

https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/

Let's pull it into our repository.
2019-12-23 14:44:33 +01:00
Lennart Poettering d4dffb8533 dissect: introduce new recognizable partition types for /var and /var/tmp
This has been requested many times before. Let's add it finally.

GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.

To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).

Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.

To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
2019-12-23 14:43:59 +01:00
Anita Zhang e5f10cafe0 core: create inaccessible nodes for users when making runtime dirs
To support ProtectHome=y in a user namespace (which mounts the inaccessible
nodes), the nodes need to be accessible by the user. Create these paths and
devices in the user runtime directory so they can be used later if needed.
2019-12-18 11:09:30 -08:00
Yu Watanabe 3267cb45e9
Merge pull request #14208 from poettering/json-homed-prepare
json bits from homed PR
2019-12-17 23:10:08 +09:00
Lennart Poettering d0556c55e7 nspawn: fix overlay with automatic temporary tree
This makes --overlay=+/foobar::/foobar work again, i.e. where the middle
parameter is left out. According to the documentation this is supposed
to generate a temporary writable work place in the midle. But it
apparently never did. Weird.
2019-12-13 15:11:38 +01:00
Lennart Poettering a724732208
Merge pull request #14269 from DaanDeMeyer/enable-mounts-on-root
nspawn: Enable specifying root as the mount target directory.
2019-12-13 00:05:38 +01:00
Daan De Meyer bd6609eb11 nspawn-mount: Use FLAGS_SET to check flags. 2019-12-12 20:18:37 +01:00
Daan De Meyer 5530dc87f2 nspawn: Only bind-mount directory when necessary. 2019-12-12 20:15:10 +01:00
Daan De Meyer e091a5dfd1 nspawn-mount: Remove unused parameters 2019-12-12 20:15:10 +01:00
Daan De Meyer 5f0a6347ac nspawn: Enable specifying root as the mount target directory.
Fixes #3847.
2019-12-12 20:15:03 +01:00
Shengjing Zhu 679ecd3616 nspawn: allow combination of private-network and network-namespace-path
Fixes: #14289
2019-12-12 19:26:32 +01:00
Lennart Poettering 5905d7cf5b tree-wide: use SD_ID128_STRING_MAX where appropriate 2019-12-10 11:56:18 +01:00
Lennart Poettering b5ea030d65 id128: introduce ID128_UUID_STRING_MAX for sizing UUID buffers 2019-12-10 11:56:18 +01:00
Yu Watanabe ec34e7d1ab
Merge pull request #14229 from yuwata/nspawn-network-interface-14223
nspawn: do not fail if udev is not running
2019-12-05 16:10:29 +09:00
Yu Watanabe 26208d5b96 nspawn: do not fail if udev is not running
If /sys is read only filesystem, e.g., nspawn is running in container,
then usually udev is not running. In such a case, let's assume that
the interface is already initialized. Also, this makes nspawn refuse
to use the network interface which is under renaming.

Fixes #14223.
2019-12-05 08:22:16 +09:00
Lennart Poettering e08f94acf5 loop-util: accept loopback flags when creating loopback device
This way callers can choose if they want partition scanning or not.
2019-12-02 10:05:09 +01:00
Lennart Poettering a7f8c9ce60 nspawn-oci: use new json_variant_strv() helper 2019-12-02 09:47:00 +01:00
Lennart Poettering d642f640bf json: add flags parameter to json_parse_file(), for parsing "sensitive" data
This will call json_variant_sensitive() internally while parsing for
each allocated sub-variant. This is better than calling it a posteriori
at the end, because partially parsed variants will always be properly
erased from memory this way.
2019-12-02 09:47:00 +01:00
afg c152a2ba54 nspawn: allow Capability=all in systemd.nspawn [EXEC] section
Just like --capability=all is allowed in the systemd-nspawn
command line.
2019-11-29 14:42:27 +01:00
Lennart Poettering 37a92352d6 nspawn: highlight description string in --help text
We do so in most tools now, do so here, too.
2019-11-28 11:41:24 +01:00
Zbigniew Jędrzejewski-Szmek f47bd09749 nspawn: log syscalls we cannot add at debug level
Without out at least a debug log line it is hard to figure out when something
goes wrong.

Reduce scope of a variable while at it.
2019-11-22 10:23:32 +01:00
Zbigniew Jędrzejewski-Szmek 8a99bd0c46 nspawn: dump capability list with --capabilities=help 2019-11-22 10:15:46 +01:00
Torsten Hilbrich 7be830c6e8 nspawn: Allow Capability= to overrule private network setting
The commit:

a3fc6b55ac nspawn: mask out CAP_NET_ADMIN again if settings file turns off private networking

turned off the CAP_NET_ADMIN capability whenever no private networking
feature was enabled. This broke configurations where the CAP_NET_ADMIN
capability was explicitly requested in the configuration.

Changing the order of evalution here to allow the Capability= setting
to overrule this implicit setting:

Order of evaluation:

1. if no private network setting is enabled, CAP_NET_ADMIN is removed
2. if a private network setting is enabled, CAP_NET_ADMIN is added
3. the settings of Capability= are added
4. the settings of DropCapability= are removed

This allows the fix for #11755 to be retained and to still allow the
admin to specify CAP_NET_ADMIN as additional capability.

Fixes: a3fc6b55ac
Fixes: #13995
2019-11-15 10:13:51 +01:00
Zbigniew Jędrzejewski-Szmek d5fc5b2f8d nspawn: do not emit any warning when $UNIFIED_CGROUP_HIERARCHY is used
Initially I thought this is a good idea, but when reviewing a different PR
(https://github.com/systemd/systemd/pull/13862#discussion_r340604313) I changed
my mind about this. At some point we probably should start warning about the
old option name, and yet later remove it. But it'll make it easier for people
to transition to the new option name if there's a period of support for both
names without any fuss. There's nothing particularly wrong about the old name,
and there is no support cost.

Fixes #13919 (by avoiding the issue completely).
2019-11-13 12:21:18 +01:00
Zbigniew Jędrzejewski-Szmek 9493b16871 Add @pkey syscall group
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1769299.
This change doesn't solve the issue, but makes it easier to whitelist the
syscall group.
2019-11-08 14:41:22 +01:00
Yu Watanabe 1405cb653a tree-wide: drop stdio.h when stdio-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe 021cdf8330 tree-wide: drop signal.h when signal-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe 0fb81b8abe tree-wide: drop magic.h when missing_magic.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe e30e8b5073 tree-wide: drop stat.h or statfs.h when stat-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe adb29d588e tree-wide: drop blkid.h when blkid-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe e259108494 tree-wide: drop acl.h when acl-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe 927d2351d7 tree-wide: drop pwd.h and grp.h when user-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe df26692947 tree-wide: drop sched.h when missing_sched.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe 455fa9610c tree-wide: drop string.h when string-util.h or friends are included 2019-11-04 00:30:32 +09:00
Justin Trudell 0ccdaa79ca nspawn: respect quiet on capabilities warning 2019-11-03 22:05:48 +09:00
Lennart Poettering 43c3fb4680 nspawn: mangle slice name
It's user-facing, parsed from the command line and we typically mangle
in these cases, let's do so here too. (In particular as the identical
switch for systemd-run already does it.)
2019-11-03 21:32:56 +09:00
Yu Watanabe f5947a5e92 tree-wide: drop missing.h 2019-10-31 17:57:03 +09:00
Lennart Poettering a93503e86f
Merge pull request #13866 from keszybz/nspawn-restarts
Make 'machinectl reboot' functional
2019-10-30 10:53:28 +01:00
Zbigniew Jędrzejewski-Szmek 0bb0a9faa7 nspawn: when stopping the machine, just deregister the machine
We already shut the machine down ourselves (and pid1 will also do
cleanup for us after we exit if anything was left behind). No need for
systemd-machined to try to stop the unit too.

(This calls the new machined method. If we are running against an older
machined, we will not deregister the machine. If we are simply exiting,
machined should notice that the unit is gone on its own. If we are restarting,
we will fail to register the machine after restart and fail. But this case
was already broken, because machined would create a stop job, breaking the
restart. So not doing anything with old machined should not make anything
more broken than it already is.)

Fixes #13766.
2019-10-29 10:54:45 +01:00
Zbigniew Jędrzejewski-Szmek df7c4eb62a various tools: be more explicit when a glob is passed when not supported
See https://bugzilla.redhat.com/show_bug.cgi?id=1763488: when we say that
'foo@*.service' is not a valid unit name, this is not clear enough. Let's
include the name of the operation that does not support globbing in the
error message:

$ build/systemctl enable 'foo@*.service'
Glob pattern passed to enable, but globs are not supported for this.
Invalid unit name "foo@*.service" escaped as "foo@\x2a.service".
...
2019-10-25 13:41:49 +09:00
Zbigniew Jędrzejewski-Szmek a5648b8094 basic/fs-util: change CHASE_OPEN flag into a separate output parameter
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.

(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
2019-10-24 22:44:24 +09:00
Zbigniew Jędrzejewski-Szmek dce66ffedb nspawn: fix handling of --console=help
We shouldn't continue to run the container after printing help.
2019-10-23 09:35:36 +02:00
Zbigniew Jędrzejewski-Szmek 86e94d95d0
Merge pull request #13246 from keszybz/add-SystemdOptions-efi-variable
Add efi variable to augment /proc/cmdline
2019-10-03 12:19:44 +02:00
Nicolas Douma de1b29f375 nspawn: surrender controlling terminal to PID2 when using the PID1 stub 2019-10-03 12:13:03 +02:00
Zbigniew Jędrzejewski-Szmek c78c095b1e nspawn: rename UNIFIED_CGROUP_HIERARCHY to SYSTEMD_NSPAWN_UNIFIED_HIERARCHY
We should never have used an unprefixed environment variable name.
All other systemd-nspawn variables have the "SYSTEMD_NSPAWN_" prefix,
and all other systemd variables have the "SYSTEMD_" prefix.

The new variable name takes precedence, but we fall back to checking the
old one. If only the old one is found, a warning is emitted.

In addition, SYSTEMD_NSPAWN_UNIFIED_HIERARCHY="" is accepted as an override
to avoid looking for the old variable name.

We have a variable with the same name ($UNIFIED_CGROUP_HIERARCHY) in tests,
which governs both systemd-nspawn and qemu behaviour. It is not renamed.
2019-10-01 10:21:13 -07:00
Zbigniew Jędrzejewski-Szmek 490486842b nspawn: consistenly fail if parsing the environment fails
We would parse the environment twice (to re-apply settings after reading
config from disk), but we would not check the return code first time.
This means that for some settings we would ignore invalid values, while
for others, we'd fail at some point.

Let's just consistently fail. Those environment variables define important
aspects of behaviour, and it is better for the user if we ignore invalid
values. (Unknown settings are still ignored, so forward compatibility is
maintained.)
2019-10-01 10:21:13 -07:00
Zbigniew Jędrzejewski-Szmek 75b0d8b89d nspawn: default to unified hierarchy if --as-pid2 is used
See comment added in the patch.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1756143.
2019-10-01 10:21:13 -07:00
Frantisek Sumsal 38288f0bb8 tree-wide: various code-formatting improvements
Reported/found by Coccinelle
2019-09-22 07:17:27 +02:00
Zbigniew Jędrzejewski-Szmek fdb3decaa7 util-lib: move some functions from basic/cgroup-util to shared/cgroup-setup
This way less stuff needs to be in basic. Initially, I wanted to move all the
parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because
efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the
move done in this patch is not necessary anymore. Nevertheless, it is still
valid on its own. If at some point we want to expose libbasic, it is better to
to not have stuff that belong in libshared there.
2019-09-16 18:08:00 +02:00
Zbigniew Jędrzejewski-Szmek 55fced5a19 util-lib: move yes_no() and friends to string-util.h 2019-09-16 18:06:20 +02:00
Zbigniew Jędrzejewski-Szmek d4d99bc6e4 basic/cgroup-util: let cgroup_unified_flush() return the detected hierarchy
This avoid the use of the global variable.

Also rename cgroup_unified_update() to cgroup_unified_cached() and
cgroup_unified_flush() to cgroup_unified() to better reflect their new roles.
2019-09-16 18:06:20 +02:00
Lennart Poettering 07e324af43
Merge pull request #13209 from poettering/nspawn-volatile-merged-usr
make incompatibility of non-/usr-merged distros with --volatile=yes more discoverable
2019-07-29 14:25:04 +02:00
Lennart Poettering 07b9f3f03c nspawn: print an explanatory error when people try to use --volatile=yes on distros that are not /usr-merged 2019-07-29 11:30:47 +02:00
Lennart Poettering 6f83d3d149 nspawn: when operating on the host image, let's move the root to a different directory first, via a bind mount 2019-07-29 09:57:04 +02:00
Lennart Poettering 6992459c12 nspawn: always take exclusive locks of ephemeral OS tree copies 2019-07-29 09:52:02 +02:00
Lennart Poettering cd6e3914f7 nspawn: don't look for .nspawn file above the top-level directory, it makes no sense 2019-07-29 09:52:02 +02:00
Lennart Poettering b35ca61ae2 nspawn: allow --volatile=yes instances of -D / 2019-07-29 09:52:02 +02:00
Iago López Galeiras a11fd4067b Revert "nspawn: remove unnecessary mount option parsing logic"
This reverts commit 72d967df3e.

Revert this because it broke the `norbind` option of the bind flags
because it does bind-mounts unconditionally recursive.

Let's bring the old logic back.

Fixes: #13170
2019-07-24 17:17:42 +02:00
Lennart Poettering 76c887fdaa
Merge pull request #13092 from keszybz/coverity-fixes
Coverity fixes
2019-07-17 14:18:49 +02:00
Zbigniew Jędrzejewski-Szmek fa8b675ae0 nspawn: fix misplaced parenthesis and merge two error handling paths
I don't think we need to provide the two separate error messages,
let's shorten the code a bit by merging them.

Coverity CID#1402320.
2019-07-17 11:35:04 +02:00
Zbigniew Jędrzejewski-Szmek 622ecfa869 nspawn: fix memleak in argument parsing
Coverity CID#1402297.
2019-07-17 11:35:04 +02:00
Lennart Poettering 7bf011e355 nspawn: make use of SIGINT handling when copying files
Fixes: #13079
2019-07-17 11:14:11 +02:00
Yu Watanabe 8cec0a5c32 tree-wide: drop duplicated blank lines
```
$ for i in */*.[ch] */*/*.[ch]; do sed -e '/^$/ {N; s/\n$//g}' -i $i; done
$ git checkout HEAD -- basic/linux shared/linux
```
2019-07-15 18:41:27 +02:00
Lennart Poettering b910cc72c0 tree-wide: get rid of strappend()
It's a special case of strjoin(), so no need to keep both. In particular
as typing strjoin() is even shoert than strappend().
2019-07-12 14:31:12 +09:00
Zbigniew Jędrzejewski-Szmek cd132992bb nspawn: fix abort when we cannot execve
If execve failed, we would die in safe_close(), because master was already
closed by fdset_close_others() on line 3123. IIUC, we don't need to keep the
fd open after sending it, so let's just close it immediately.

Reproducer:
sudo build/systemd-nspawn -M rawhide fooooooo

Fixup for 3acc84ebd9.
2019-07-09 01:24:20 +02:00
Yu Watanabe 2d9b74ba87 tree-wide: replace strjoin() with path_join() 2019-06-24 23:59:38 +09:00
Lennart Poettering cee97d5768
Merge pull request #12836 from yuwata/tree-wide-replace-strjoin
tree-wide: replace strjoin() with path_join()
2019-06-22 20:02:46 +02:00
Lennart Poettering c6134d3e2f path-util: get rid of prefix_root()
prefix_root() is equivalent to path_join() in almost all ways, hence
let's remove it.

There are subtle differences though: prefix_root() will try shorten
multiple "/" before and after the prefix. path_join() doesn't do that.
This means prefix_root() might return a string shorter than both its
inputs combined, while path_join() never does that. I like the
path_join() semantics better, hence I think dropping prefix_root() is
totally OK. In the end the strings generated by both functon should
always be identical in terms of path_equal() if not streq().

This leaves prefix_roota() in place. Ideally we'd have path_joina(), but
I don't think we can reasonably implement that as a macro. or maybe we
can? (if so, sounds like something for a later PR)

Also add in a few missing OOM checks
2019-06-21 08:42:55 +09:00
Anita Zhang f66ad46066 nspawn: don't hard fail when setting capabilities
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes #12539
2019-06-20 21:46:36 +02:00
Yu Watanabe 657ee2d82b tree-wide: replace strjoin() with path_join() 2019-06-21 03:26:16 +09:00
Franck Bui dc98caea32 nspawn: make use of openpt_allocate() 2019-06-18 09:27:06 +02:00
Franck Bui 3acc84ebd9 nspawn: allocate the pty used for /dev/console within the container
The console tty is now allocated from within the container so it's not
necessary anymore to allocate it from the host and bind mount the pty slave
into the container. The pty master is sent to the host.

/dev/console is now a symlink pointing to the pty slave.

This might also be less confusing for applications running inside the container
and the overall result looks cleaner (we don't need to apply manually the
passed selinux context, if any, to the allocated pty for instance).
2019-06-18 08:17:34 +02:00
Franck Bui ba72801d66 nspawn: use correct error variable when logging errors returned by send_one_fd() 2019-06-18 07:54:51 +02:00
Michal Sekletar 3f09629c22
Merge pull request #12628 from keszybz/dbus-execute
Rework cpu affinity parsing
2019-05-30 12:32:53 +02:00
Yu Watanabe a0267b30f8 nspawn: also support ifindex when specifying network interface 2019-05-30 11:04:05 +02:00
Zbigniew Jędrzejewski-Szmek 0985c7c4e2 Rework cpu affinity parsing
The CPU_SET_S api is pretty bad. In particular, it has a parameter for the size
of the array, but operations which take two (CPU_EQUAL_S) or even three arrays
(CPU_{AND,OR,XOR}_S) still take just one size. This means that all arrays must
be of the same size, or buffer overruns will occur. This is exactly what our
code would do, if it received an array of unexpected size over the network.
("Unexpected" here means anything different from what cpu_set_malloc() detects
as the "right" size.)

Let's rework this, and store the size in bytes of the allocated storage area.

The code will now parse any number up to 8191, independently of what the current
kernel supports. This matches the kernel maximum setting for any architecture,
to make things more portable.

Fixes #12605.
2019-05-29 10:20:42 +02:00
Zbigniew Jędrzejewski-Szmek 127c167cdb
Merge pull request #12390 from poettering/string-file-mkdir
fileio: add a WRITE_STRING_FILE_MKDIR_0755 flag to write_string_file() that creates parent directories if needed
2019-05-28 14:42:55 +02:00
Lennart Poettering f9a3d8e2f3 nspawn: expose the new seccomp actions in the OCI logic 2019-05-24 10:48:28 +02:00
Lennart Poettering e82e549fb2 tree-wide: make use of the new WRITE_STRING_FILE_MKDIR_0755 flag 2019-05-08 06:36:20 -04:00
Franck Bui 9f3f596477 meson: make source files including nspawn-settings.h depend on libseccomp
Since nspawn-settings.h includes seccomp.h, any file that includes
nspawn-settings.h should depend on libseccomp so the correct header path where
seccomp.h lives is added to the header search paths.

It's especially important for distros such as openSUSE where seccomp.h is not
shipped in /usr/include but /usr/include/libseccomp.

This patch is similar to 8238423095.
2019-04-30 19:31:22 +02:00
Ben Boeckel 5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Ben Boeckel 8f8dfb9552 nspawn-expose-ports: fix a typo in error message 2019-04-26 23:42:55 +02:00
Dominick Grift 8f1ed04ad6 nspawn: Fix volatile SELinux label
nspawn should associate the specified nspawn container apifs object label instead of the nspawn container process label with the volatile tmpfs
2019-04-13 12:03:02 +02:00
Lennart Poettering 33d60b8d57 json: simplify JSON_VARIANT_OBJECT_FOREACH() macro a bit
There's no point in returning the "key" within each loop iteration as
JsonVariant object. Let's simplify things and return it as string. That
simplifies usage (since the caller doesn't have to convert the object to
the string anymore) and is safe since we already validate that keys are
strings when an object JsonVariant is allocated.
2019-04-12 13:11:11 +02:00
Anita Zhang 7bc5e0b12b seccomp: check more error codes from seccomp_load()
We noticed in our tests that occasionally SystemCallFilter= would
fail to set and the service would run with no syscall filtering.
Most of the time the same tests would apply the filter and fail
the service as expected. While it's not totally clear why this happens,
we noticed seccomp_load() in the systemd code base would fail open for
all errors except EPERM and EACCES.

ENOMEM, EINVAL, and EFAULT seem like reasonable values to add to the
error set based on what I gather from libseccomp code and man pages:

-ENOMEM: out of memory, failed to allocate space for a libseccomp structure, or would exceed a defined constant
-EINVAL: kernel isn't configured to support the operations, args are invalid (to seccomp_load(), seccomp(), or prctl())
-EFAULT: addresses passed as args are invalid
2019-04-12 10:23:07 +02:00
Lennart Poettering 1eacc47062 nspawn: create boot_id and kmsg files for overmounting in /run, not /tmp
/tmp might not be mounted at all yet (given that we support
SYSTEMD_NSPAWN_TMPFS_TMP=0 to turn this off), and /tmp is a dir systemd
usually tries to unmount during shutdown (unlike /run), and we shouldn't
keep it busy. Hence let's just move these deleted files to /run so that
we don't keep /tmp needlessly busy.
2019-04-07 08:55:31 +02:00
Lennart Poettering c614711386 tree-wide: use SYNTHETIC_ERRNO() where appropriate 2019-04-02 14:54:42 +02:00
Lennart Poettering 8a016c746e util-lib: when copying files make sure to apply some chattrs early, some late
Some chattrs only work sensible if you set them right after opening a
file for create (think: FS_NOCOW_FL). Others only work when they are
applied when the file is fully written (think: FS_IMMUTABLE_FL). Let's
take that into account when copying files and applying a chattr to them.
2019-03-28 18:43:04 +01:00
Zbigniew Jędrzejewski-Szmek ca78ad1de9 headers: remove unneeded includes from util.h
This means we need to include many more headers in various files that simply
included util.h before, but it seems cleaner to do it this way.
2019-03-27 11:53:12 +01:00
Zbigniew Jędrzejewski-Szmek e1af3bc62a
Merge pull request #12106 from poettering/nosuidns
add "nosuid" flag to exec directory mounts of DynamicUser=1 services
2019-03-26 08:58:00 +01:00
Zbigniew Jędrzejewski-Szmek 99f57a4fea
Merge pull request #12105 from poettering/api-vfs-mount-flags
some API VFS mount flag tweaks
2019-03-26 08:32:53 +01:00
Lennart Poettering 25e68fd397 nspawn: minor improvements to --help text 2019-03-26 08:06:00 +01:00
Lennart Poettering 849b9b85b8 nspawn: mount mqueue with nodev,noexec,nosuid, too
The host mounts it like that, nspawn hence should do too.

Moreover, mount the file system after doing CLONEW_NEWIPC so that it
actually reflects the right mqueues. Finally, mount it wthout
considering it fatal, since POSIX mqueue support is little used and it
should be fine not to support it in the kernel.
2019-03-25 19:53:05 +01:00
Lennart Poettering 64e82c1976 mount-util: beef up bind_remount_recursive() to be able to toggle more than MS_RDONLY
The function is otherwise generic enough to toggle other bind mount
flags beyond MS_RDONLY (for example: MS_NOSUID or MS_NODEV), hence let's
beef it up slightly to support that too.
2019-03-25 19:33:55 +01:00
Lennart Poettering 83276695c6
Merge pull request #12079 from keszybz/fuzz-nspawn-oci
Add fuzzer for nspawn-oci
2019-03-22 21:06:17 +01:00
Lennart Poettering e4077ff6f3 nspawn: don't free "fds" twice
Previously both run() and run_container() would free 'fds'. Let's fix
that, and let run() free it but make run_container() already remove all
fds from it, because that's what we actually want to do.

Fixes: #12073
2019-03-22 18:11:27 +01:00
Zbigniew Jędrzejewski-Szmek b2645747b7 nspawn-oci: fix double free
Also rename function to make it clear that it also frees the array
object itself.
2019-03-22 17:39:12 +01:00
Zbigniew Jędrzejewski-Szmek 094eecd29d
Merge pull request #12055 from poettering/save-argc-argv
main-func.h and systemctl argc/argv improvements
2019-03-22 16:58:18 +01:00
Zbigniew Jędrzejewski-Szmek b1f13b0e75 nspawn-oci: mount source is optional 2019-03-22 12:04:32 +01:00
Zbigniew Jędrzejewski-Szmek b2e07b1a02 nspawn-oci: use _cleanup_ in one more place 2019-03-22 11:51:21 +01:00
Lennart Poettering ae408d77a9 nspawn: conditionalize libseccomp use
We support compilation without libseccomp, hence don't rely on its
symbols.
2019-03-22 11:07:03 +01:00
Lennart Poettering 60ffa37a65 main-func: implicitly save argc/argv in DEFINE_MAIN_FUNCTION() functions
Let's remove the risk of forgetting to save argc/argv if
DEFINE_MAIN_FUNCTION() is used.
2019-03-21 18:10:06 +01:00
Lennart Poettering 36fea15565 util: introduce save_argc_argv() helper 2019-03-21 18:08:56 +01:00
Lennart Poettering c82cfae00b
Merge pull request #12062 from poettering/nspawn-main-func
nspawn: port to DEFINE_MAIN_FUNCTION()
2019-03-21 18:08:27 +01:00
Zbigniew Jędrzejewski-Szmek bb068de080 nspawn: add --no-pager switch
It only matters for --help.
2019-03-21 17:42:43 +01:00
Lennart Poettering 04f590a4a4 nspawn: voidify sd_notify() calls 2019-03-21 16:32:46 +01:00
Lennart Poettering 6145bb4f78 nspawn: port to static destructors 2019-03-21 16:32:46 +01:00
Lennart Poettering 44dbef90f1 nspawn: port to main-func.h logic 2019-03-21 16:32:46 +01:00
Zbigniew Jędrzejewski-Szmek fa28e4e377
Merge pull request #12059 from poettering/nspawn-typos
some typo and other fixes result of the OCI nspawn merge
2019-03-21 15:14:11 +01:00
Lennart Poettering c3d13d2ad5
Merge pull request #12058 from keszybz/oci-simplifications
Follow-ups for nspawn-oci review
2019-03-21 13:55:09 +01:00
Lennart Poettering f4e803c809 nspawn: add a few missing flags from --help text 2019-03-21 13:31:09 +01:00
Lennart Poettering 2514865391 nspawn: reorder --help text, and add section
The list is so long, let's add a bit of structure and order things a
bit.
2019-03-21 13:27:19 +01:00
Lennart Poettering 2c9b7a7e62 mount: when we fail to establish an inaccessible mount gracefully, undo the mount 2019-03-21 12:41:02 +01:00
Zbigniew Jędrzejewski-Szmek 6757a01356 util-lib: get rid of a helper variable 2019-03-21 11:08:58 +01:00
Zbigniew Jędrzejewski-Szmek f1531db5af nspawn-oci: add helper function for free_and_strdup with oom check 2019-03-21 11:08:58 +01:00
Zbigniew Jędrzejewski-Szmek d0b6a10c00
Merge pull request #9762 from poettering/nspawn-oci
OCI runtime support for nspawn
2019-03-21 11:01:53 +01:00
Zbigniew Jędrzejewski-Szmek 19130626a0 nspawn-oci: use SYNTHETIC_ERRNO 2019-03-21 10:51:43 +01:00
Topi Miettinen ebcf697685 tree-wide: fix false search hits with ppp (typos) 2019-03-18 14:25:56 +01:00
Lennart Poettering 95658673a0
Merge pull request #12016 from yuwata/fix-two-memleaks-found-by-oss-fuzz
Fix two memleaks found by oss fuzz
2019-03-15 17:33:48 +01:00
Yu Watanabe 1d0c1146ea nspawn: fix memleak
Fixes oss-fuzz#13691.
2019-03-15 23:53:05 +09:00
Zbigniew Jędrzejewski-Szmek 7acf581a58 Handle or voidify all calls to close_all_fds()
In activate, it is important that we close the fds. In other cases, meh.
2019-03-15 15:46:41 +01:00
Lennart Poettering a3fc6b55ac nspawn: mask out CAP_NET_ADMIN again if settings file turns off private networking
Fixes: #11755
2019-03-15 15:42:21 +01:00
Lennart Poettering bd4b15f274 nspawn: use right constant for shifting for uint64_t caps 2019-03-15 15:42:20 +01:00
Lennart Poettering de40a3037a nspawn: add support for executing OCI runtime bundles with nspawn
This is a pretty large patch, and adds support for OCI runtime bundles
to nspawn. A new switch --oci-bundle= is added that takes a path to an
OCI bundle. The JSON file included therein is read similar to a .nspawn
settings files, however with a different feature set.

Implementation-wise this mostly extends the pre-existing Settings object
to carry additional properties for OCI. However, OCI supports some
concepts .nspawn files did not support yet, which this patch also adds:

1. Support for "masking" files and directories. This functionatly is now
   also available via the new --inaccesible= cmdline command, and
   Inaccessible= in .nspawn files.

2. Support for mounting arbitrary file systems. (not exposed through
   nspawn cmdline nor .nspawn files, because probably not a good idea)

3. Ability to configure the console settings for a container. This
   functionality is now also available on the nspawn cmdline in the new
   --console= switch (not added to .nspawn for now, as it is something
   specific to the invocation really, not a property of the container)

4. Console width/height configuration. Not exposed through
   .nspawn/cmdline, but this may be controlled through $COLUMNS and
   $LINES like in most other UNIX tools.

5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on
   the cmdline, since containers likely have different user tables, and
   the existing --user= switch appears to be the better option)

6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to
   OCI)

7. Creation of additional devices nodes in /dev. Most likely not a good
   idea, hence not exposed in .nspawn/cmdline. There's already --bind=
   to achieve the same, which is the better alternative.

8. Explicit syscall filters. This is not a good idea, due to the skewed
   arch support, hence not exposed through .nspawn/cmdline.

9. Configuration of some sysctls on a whitelist. Questionnable, not
   supported in .nspawn/cmdline for now.

10. Configuration of all 5 types of capabilities. Not a useful concept,
    since the kernel will reduce the caps on execve() anyway. Not
    exposed through .nspawn/cmdline as this is not very useful hence.

Note that this only implements the OCI runtime logic itself. It does not
provide a runc-compatible command line tool. This is left for a later
PR. Only with that in place tools such as "buildah" can use the OCI
support in nspawn as drop-in replacement.

Currently still missing is OCI hook support, but it's already parsed and
everything, and should be easy to add. Other than that it's OCI is
implemented pretty comprehensively.

There's a list of incompatibilities in the nspawn-oci.c file. In a later
PR I'd like to convert this into proper markdown and add it to the
documentation directory.
2019-03-15 15:41:28 +01:00
Lennart Poettering 5ef4cb7ad0 nspawn: (void)ify more stuff 2019-03-15 15:33:09 +01:00
Lennart Poettering 61b4443361 nspawn: refactor setuid code a bit
Let's separate out the raw uid_t/gid_t handling from the username
handling. This is useful later on.

Also, let's use the right gid_t type for group types wherever
appropriate.
2019-03-15 15:33:09 +01:00
Lennart Poettering d8b4d14df4 util: split out nulstr related stuff to nulstr-util.[ch] 2019-03-14 13:25:52 +01:00
Lennart Poettering e45c81b8bc shared: split out code to wait for jobs to complet into its own source file
It's complex enough and quite a few functions. Let's hence split this
out.

No code change, just some rearranging of source files.
2019-03-13 17:39:24 +01:00
Lennart Poettering 760877e90c util: split out sorting related calls to new sort-util.[ch] 2019-03-13 12:16:43 +01:00
Lennart Poettering 0cb8e3d118 util: split out namespace related stuff into a new namespace-util.[ch] pair
Just some minor reorganiztion.
2019-03-13 12:16:38 +01:00
Zbigniew Jędrzejewski-Szmek 0e636bf51a nspawn: fix memleak uncovered by fuzzer
Also use TAKE_PTR as appropriate.
2019-03-11 14:29:30 +01:00
Lennart Poettering 27da7ef0d0 nspawn: move payload to sub-cgroup first, then sync cgroup trees
if we sync the legacy and unified trees before moving to the right
subcgroup then ultimately the cgroup paths in the hierarchies will be
out-of-sync... Hence, let's move the payload first, and sync then.

Addresses: https://github.com/systemd/systemd/pull/9762#issuecomment-441187979
2019-03-07 11:26:17 +01:00
Lennart Poettering adc6f43b14 copy: don't synthesize a 'user.crtime_usec' xattr on copy unless explicitly requested
Previously, when we'd copy an individual file we'd synthesize a
user.crtime_usec xattr with the source's creation time if we can
determine it. As the creation/birth time was until recently not
queriable form userspace this effectively just propagated the same xattr
on the source to the same xattr on the destination. However, current
kernels now allow to query the birthtime using statx() and we do make
use of that now. Which means that suddenly we started synthesizing these
xattrs much more regularly.

Doing this actually does make sense, but only in very few cases:
not for the typical regular files we copy, but certainly when dealing
with disk images. Hence, let's keep this kind of propagation, but let's
make it a flag and default to off. Then turn it on whenever we deal with
disk images, and leave it off otherwise.

This is particularly relevant as overlayfs combining a real fs, and a
tmpfs on top will result in EOPNOTSUPP when it is attempted to open a
file with xattrs for writing, as tmpfs does not support xattrs, and
hence the copy-up cannot work. Hence, let's avoid synthesizing this
needlessly, to increase compat with overlayfs.
2019-03-01 14:11:07 +01:00
Lennart Poettering e5a4bb0d4e nspawn: rework how arg_read_only is initialized in --volatile= mode
Previously, we'd refuse the combination, and claimed we'd imply it, but
actually didn't. Let's allow the combination and imply read-only from
--volatile=, because that's what's documented, what we claim we do, and
what makes sense.
2019-03-01 14:11:07 +01:00
Lennart Poettering 83205269c0 nspawn: refactor how we determine whether it's OK to write to /etc 2019-03-01 14:11:07 +01:00
Lennart Poettering e50cd82f68 nspawn: no need to make top-level directory a bind mount if we just dissected an image 2019-03-01 14:11:07 +01:00
Lennart Poettering 7d0ecdd62d nspawn: slightly reorder mount logic
Let's first setup the volatile logic, and only then mount secondary
partitions of the image in.
2019-03-01 14:11:07 +01:00
Lennart Poettering 6c610acaaa nspawn: add --volatile=overlay support
Fixes: #11054 #3847
2019-03-01 14:11:06 +01:00
Lennart Poettering c55d0ae764 nspawn: fix an error path 2019-03-01 14:11:06 +01:00
Lennart Poettering e5b43a04b6 nspawn: add volatile mode multiplexer call setup_volatile_mode()
Just some refactoring, no change in behaviour.
2019-03-01 14:11:06 +01:00
Lennart Poettering 0646d3c3dd nspawn: explicitly refuse mounts over /
Previously this would fail later on, but let's filter this out at the
time of parsing.
2019-03-01 14:11:06 +01:00
Lennart Poettering 6e9417f5b4 tree-wide: use newa() instead of alloca() wherever we can
Typesafety is nice. And this way we can take benefit of the new size
assert() the previous commit added.
2019-01-26 16:17:04 +01:00
Lennart Poettering 2949ff2691 nspawn: ignore SIGPIPE for nspawn itself
Let's not abort due to a dead stdout.

Fixes: #11533
2019-01-26 13:54:44 +01:00
Lennart Poettering b2238e380e test,systemctl,nspawn: use "const char*" instead of "char*" as iterator for FOREACH_STRING()
The macro iterates through literal strings (i.e. constant strings),
hence it's more correct to have the iterator const too.
2019-01-16 12:29:30 +01:00
Chris Down e92aaed30e tree-wide: Remove O_CLOEXEC from fdopen
fdopen doesn't accept "e", it's ignored. Let's not mislead people into
believing that it actually sets O_CLOEXEC.

From `man 3 fdopen`:

> e (since glibc 2.7):
> Open the file with the O_CLOEXEC flag. See open(2) for more information. This flag is ignored for fdopen()

As mentioned by @jlebon in #11131.
2018-12-12 20:47:40 +01:00