Commit graph

506 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Lennart Poettering 2bce2acce8 nspawn: if we set up a loopback device, try to mount it with "discard"
Let's make sure that our loopback files remain sparse, hence let's set
"discard" as mount option on file systems that support it if the backing device
is a loopback.
2016-11-02 11:39:49 -06:00
Evgeny Vereshchagin 6d66bd3b2a nspawn: become a new root early
036d523641

> vfs: Don't create inodes with a uid or gid unknown to the vfs
  It is expected that filesystems can not represent uids and gids from
  outside of their user namespace.  Keep things simple by not even
  trying to create filesystem nodes with non-sense uids and gids.

So, we actually should `reset_uid_gid` early to prevent https://github.com/systemd/systemd/pull/4223#issuecomment-252522955

$ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target

Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide.
Press ^] three times within 1s to kill container.
Child died too early.
Selected user namespace base 1073283072 and range 65536.
Failed to mount to /sys/fs/cgroup/systemd: No such file or directory

Details: https://github.com/systemd/systemd/pull/4223#issuecomment-253046519
Fixes: #4352
2016-10-23 23:23:42 -04:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 24597ee0e6 nspawn, NEWS: add missing "s" in --private-users-chown (#4438) 2016-10-21 06:03:26 +03:00
Evgeny Vereshchagin f0bef277a4 nspawn: cleanup and chown the synced cgroup hierarchy (#4223)
Fixes: #4181
2016-10-13 09:50:46 -04:00
Zbigniew Jędrzejewski-Szmek 60e76d4897 nspawn,mount-util: add [u]mount_verbose and use it in nspawn
This makes it easier to debug failed nspawn invocations:

Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")...
Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")...
Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")...
Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")...
Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")...
Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11 16:50:07 -04:00
Zbigniew Jędrzejewski-Szmek ada5412039 nspawn: simplify arg_us_cgns passing
We would check the condition cg_ns_supported() twice. No functional
change.
2016-10-11 16:46:58 -04:00
Lennart Poettering 6dca2fe325 Merge pull request #4332 from keszybz/nspawn-arguments-3
nspawn --private-users parsing, v2
2016-10-10 19:51:51 +02:00
Evgeny Vereshchagin a0f72a24e0 Merge pull request #4310 from keszybz/nspawn-autodetect
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10 20:47:25 +03:00
Zbigniew Jędrzejewski-Szmek be7157316c nspawn: better error messages for parsing errors
In particular, the check for arg_uid_range <= 0 is moved to the end, so that
"foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek ae209204d8 nspawn,man: fix parsing of numeric args for --private-users, accept any boolean
This is like the previous reverted commit, but any boolean is still accepted,
not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek 6c2058b35e Revert "nspawn: fix parsing of numeric arguments for --private-users"
This reverts commit bfd292ec35.
2016-10-10 11:17:40 -04:00
Zbigniew Jędrzejewski-Szmek bfd292ec35 nspawn: fix parsing of numeric arguments for --private-users
The documentation says lists "yes", "no", "pick", and numeric arguments.
But parse_boolean was attempted first, so various numeric arguments were
misinterpreted.

In particular, this fixes --private-users=0 to mean the same thing as
--private-users=0:65536.

While at it, use strndupa to avoid some error handling.
Also give a better error for an empty UID range. I think it's likely that
people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09 11:52:35 -04:00
Zbigniew Jędrzejewski-Szmek 27eb8e9028 nspawn: reindent table 2016-10-09 11:51:18 -04:00
Zbigniew Jędrzejewski-Szmek a8725a06e6 nspawn: also fall back to legacy cgroup hierarchy for old containers
Current systemd version detection routine cannot detect systemd 230,
only systmed >= 231. This means that we'll still use the legacy hierarchy
in some cases where we wouldn't have too. If somebody figures out a nice
way to detect systemd 230 this can be later improved.
2016-10-08 19:03:53 -04:00
Zbigniew Jędrzejewski-Szmek 0fd9563fde nspawn: use mixed cgroup hierarchy only when container has new systemd
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy.
So make an educated guess, and use the mixed hierarchy in that case.

Tested by running the host with mixed hierarchy (i.e. simply using a recent
kernel with systemd from git), and booting first a container with older systemd,
and then one with a newer systemd.

Fixes #4008.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 27e29a1e43 nspawn: fix spurious reboot if container process returns 133 2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek b006762524 nspawn: move the main loop body out to a new function
The new function has 416 lines by itself!

"return log_error_errno" is used to nicely reduce the volume of error
handling code.

A few minor issues are fixed on the way:
- positive value was used as error value (EIO), causing systemd-nspawn
  to return success, even though it shouldn't.
- In two places random values were used as error status, when the
  actual value was in an unusual place (etc_password_lock, notify_socket).

Those are the only functional changes.

There is another potential issue, which is marked with a comment, and left
unresolved: the container can also return 133 by itself, causing a spurious
reboot.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 98afd6af3a nspawn: check env var first, detect second
If we are going to use the env var to override the detection result
anyway, there is not point in doing the detection, especially that
it can fail.
2016-10-08 14:48:41 -04:00
Lennart Poettering 7429b2eb83 tree-wide: drop some misleading compiler warnings
gcc at some optimization levels thinks thes variables were used without
initialization. it's wrong, but let's make the message go anyway.
2016-10-06 19:04:10 +02:00
Djalal Harouni 41eb436265 nspawn: add log message to let users know that nspawn needs an empty /dev directory (#4226)
Fixes https://github.com/systemd/systemd/issues/3695

At the same time it adds a protection against userns chown of inodes of
a shared mount point.
2016-10-05 06:57:02 +02:00
Alban Crequy 19caffac75 nspawn: set shared propagation mode for the container 2016-10-03 14:19:27 +02:00
Evgeny Vereshchagin cc238590e4 Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-28 04:50:30 +03:00
Torstein Husebø d23a0044a3 treewide: fix typos (#4217) 2016-09-26 11:32:47 +02:00
Lennart Poettering 6b7c9f8bce namespace: rework how ReadWritePaths= is applied
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths=
specification, then we'd first recursively apply the ReadOnlyPaths= paths, and
make everything below read-only, only in order to then flip the read-only bit
again for the subdirs listed in ReadWritePaths= below it.

This is not only ugly (as for the dirs in question we first turn on the RO bit,
only to turn it off again immediately after), but also problematic in
containers, where a container manager might have marked a set of dirs read-only
and this code will undo this is ReadWritePaths= is set for any.

With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be
applied to the children listed in ReadWritePaths= in the first place, so that
we do not need to turn off the RO bit for those after all.

This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the
RO bit, but never to turn it off again. Or to say this differently: if some
dirs are marked read-only via some external tool, then ReadWritePaths= will not
undo it.

This is not only the safer option, but also more in-line with what the man page
currently claims:

        "Entries (files or directories) listed in ReadWritePaths= are
        accessible from within the namespace with the same access rights as
        from outside."

To implement this change bind_remount_recursive() gained a new "blacklist"
string list parameter, which when passed may contain subdirs that shall be
excluded from the read-only mounting.

A number of functions are updated to add more debug logging to make this more
digestable.
2016-09-25 10:40:51 +02:00
Luca Bruno 48a8d337a6 nspawn: decouple --boot from CLONE_NEWIPC (#4180)
This commit is a minor tweak after the split of `--share-system`, decoupling the `--boot`
option from IPC namespacing.

Historically there has been a single `--share-system` option for sharing IPC/PID/UTS with the
host, which was incompatible with boot/pid1 mode. After the split, it is now possible to express
the requirements with better granularity.

For reference, this is a followup to #4023 which contains references to previous discussions.
I realized too late that CLONE_NEWIPC is not strictly needed for boot mode.
2016-09-24 08:30:42 -04:00
Michael Pope 21dc02277d nspawn: fix comment typo in setup_timezone example (#4183) 2016-09-20 07:30:48 +02:00
Michael Pope 0b493a0263 nspawn: clarify log warning for /etc/localtime not being a symbolic link (#4163) 2016-09-17 09:59:28 +02:00
Luca Bruno 0c582db0c6 nspawn: split down SYSTEMD_NSPAWN_SHARE_SYSTEM (#4023)
This commit follows further on the deprecation path for --share-system,
by splitting and gating each share-able namespace behind its own
environment flag.
2016-08-26 00:08:26 +02:00
Tejun Heo 5da38d0768 core: use the unified hierarchy for the systemd cgroup controller hierarchy
Currently, systemd uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, systemd uses a named legacy hierarchy
mounted on /sys/fs/cgroup/systemd without any kernel controllers for process
management.  Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.

Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for systemd to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies.  It can simply mount the unified hierarchy under
/sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test.  In time, this would also allow deleting the said
complexities.

This patch updates systemd so that it prefers the unified hierarchy for the
systemd cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.

* cg_unified(@controller) is introduced which tests whether the specific
  controller in on unified hierarchy and used to choose the unified hierarchy
  code path for process and service management when available.  Kernel
  controller specific operations remain gated by cg_all_unified().

* "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to
  force the use of legacy hierarchy for systemd cgroup controller.

* nspawn: By default nspawn uses the same hierarchies as the host.  If
  UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all.  If
  0, legacy for all.

* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
  three options - legacy, only systemd controller on unified, and unified.  The
  value is passed into mount setup functions and controls cgroup configuration.

* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
  option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
  appropriate action depending on the configuration of the host.

v2: - CGroupUnified enum replaces open coded integer values to indicate the
      cgroup operation mode.
    - Various style updates.

v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.

v4: Restored legacy container on unified host support and fixed another bug in
    detect_unified_cgroup_hierarchy().
2016-08-17 17:44:36 -04:00
Tejun Heo ca2f6384aa core: rename cg_unified() to cg_all_unified()
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies.  This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is.  In preparation, this patch renames cg_unified() to
cg_all_unified().

This patch doesn't cause any functional changes.
2016-08-15 18:13:36 -04:00
Lennart Poettering 07a1734a13 Merge pull request #3885 from keszybz/help-output
Update help for "short-full" and shorten to 80 columns
2016-08-04 16:11:38 +02:00
Zbigniew Jędrzejewski-Szmek 90b4a64d77 nspawn,resolve: short --help output to fit within 80 columns
make dist-check-help FTW!
2016-08-04 09:03:42 -04:00
Lennart Poettering f7b7b3df9e nspawn: if we can't mark the boot ID RO let's fail
It's probably better to be safe here.
2016-08-03 14:52:16 +02:00
Lennart Poettering a6b5216c7c nspawn: deprecate --share-system support
This removes the --share-system switch: from the documentation, the --help text
as well as the command line parsing. It's an ugly option, given that it kinda
contradicts the whole concept of PID namespaces that nspawn implements. Since
it's barely ever used, let's just deprecate it and remove it from the options.

It might be useful as a debugging option, hence the functionality is kept
around for now, exposed via an undocumented $SYSTEMD_NSPAWN_SHARE_SYSTEM
environment variable.
2016-08-03 14:52:16 +02:00
Lennart Poettering 3539724c26 nspawn: try to bind mount resolved's resolv.conf snippet into the container
This has the benefit that the container can follow the host's DNS server
changes without us having to constantly update the container's resolv.conf
settings.
2016-08-03 14:52:16 +02:00
Christian Brauner 5a8ff0e61d nspawn: add SYSTEMD_NSPAWN_USE_CGNS env variable (#3809)
SYSTEMD_NSPAWN_USE_CGNS allows to disable the use of cgroup namespaces.
2016-07-26 16:49:15 +02:00
Zbigniew Jędrzejewski-Szmek e28973ee18 Merge pull request #3757 from poettering/efi-search 2016-07-25 16:34:18 -04:00
Lennart Poettering 1a0b98c437 Merge pull request #3589 from brauner/cgroup_namespace
Cgroup namespace
2016-07-25 22:23:00 +02:00
Zbigniew Jędrzejewski-Szmek 476b8254d9 nspawn: don't skip cleanup on locking error 2016-07-22 21:25:09 -04:00
Lennart Poettering 15b1248a6b machine-id-setup: port machine_id_commit() to new id128-util.c APIs 2016-07-22 12:59:36 +02:00
Lennart Poettering 317feb4d9f nspawn: rework /etc/machine-id handling
With this change we'll no longer write to /etc/machine-id from nspawn, as that
breaks the --volatile= operation, as it ensures the image is never considered
in "first boot", since that's bound to the pre-existance of /etc/machine-id.

The new logic works like this:

- If /etc/machine-id already exists in the container, it is read by nspawn and
  exposed in "machinectl status" and friends.

- If the file doesn't exist yet, but --uuid= is passed on the nspawn cmdline,
  this UUID is passed in $container_uuid to PID 1, and PID 1 is then expected
  to persist this to /etc/machine-id for future boots (which systemd already
  does).

- If the file doesn#t exist yet, and no --uuid= is passed a random UUID is
  generated and passed via $container_uuid.

The result is that /etc/machine-id is never initialized by nspawn itself, thus
unbreaking the volatile mode. However still the machine ID configured in the
machine always matches nspawn's and thus machined's idea of it.

Fixes: #3611
2016-07-22 12:59:36 +02:00
Lennart Poettering 691675ba9f nspawn: rework machine/boot ID handling code to use new calls from id128-util.[ch] 2016-07-22 12:59:36 +02:00
Lennart Poettering 910fd145f4 sd-id128: split UUID file read/write code into new id128-util.[ch]
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.

The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().

In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
2016-07-22 12:59:36 +02:00
Lennart Poettering 3bbaff3e08 tree-wide: use sd_id128_is_null() instead of sd_id128_equal where appropriate
It's a bit easier to read because shorter. Also, most likely a tiny bit faster.
2016-07-22 12:38:08 +02:00
Lennart Poettering a6bc7db980 nspawn: if an ESP is part of the disk image to operate on, mount it to /efi or /boot
Matching the behaviour of gpt-auto-generator, if we find an ESP while
dissecting a container image, mount it to /efi or /boot if those dirs exist and
are empty.

This should enable us to run "bootctl" inside a container and do the right
thing.
2016-07-21 11:10:35 +02:00
Lennart Poettering 1ddc1272e7 nspawn: when netns is on, mount /proc/sys/net writable
Normally we make all of /proc/sys read-only in a container, but if we do have
netns enabled we can make /proc/sys/net writable, as things are virtualized
then.
2016-07-20 14:53:15 +02:00
Lennart Poettering 065d31c360 nspawn: document why the uid shift range is the way it is 2016-07-20 14:53:15 +02:00