Commit Graph

5582 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 2aed63f427 tree-wide: fix spelling of "fallback"
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
2020-08-20 17:45:32 +02:00
Lennart Poettering 5b14956385
Merge pull request #16543 from poettering/nspawn-run-host
nspawn: /run/host/ tweaks
2020-08-20 16:20:05 +02:00
Zbigniew Jędrzejewski-Szmek ec673ad4ab
Merge pull request #16559 from benzea/benzea/memory-recursiveprot
mount-setup: Enable memory_recursiveprot for cgroup2
2020-08-20 13:05:07 +02:00
Lennart Poettering 3242980582 core: create per-user inaccessible node from the service manager
Previously, we'd create them from user-runtime-dir@.service. That has
one benefit: since this service runs privileged, we can create the full
set of device nodes. It has one major drawback though: it security-wise
problematic to create files/directories in directories as privileged
user in directories owned by unprivileged users, since they can use
symlinks to redirect what we want to do. As a general rule we hence
avoid this logic: only unpriv code should populate unpriv directories.

Hence, let's move this code to an appropriate place in the service
manager. This means we lose the inaccessible block device node, but
since there's already a fallback in place, this shouldn't be too bad.
2020-08-20 10:18:02 +02:00
Lennart Poettering 9fac502920 nspawn,pid1: pass "inaccessible" nodes from cntr mgr to pid1 payload via /run/host
Let's make /run/host the sole place we pass stuff from host to container
in and place the "inaccessible" nodes in /run/host too.

In contrast to the previous two commits this is a minor compat break, but
not a relevant one I think. Previously the container manager would place
these nodes in /run/systemd/inaccessible/ and that's where PID 1 in the
container would try to add them too when missing. Container manager and
PID 1 in the container would thus manage the same dir together.

With this change the container manager now passes an immutable directory
to the container and leaves /run/systemd entirely untouched, and managed
exclusively by PID 1 inside the container, which is nice to have clear
separation on who manages what.

In order to make sure systemd then usses the /run/host/inaccesible/
nodes this commit changes PID 1 to look for that dir and if it exists
will symlink it to /run/systemd/inaccessible.

Now, this will work fine if new nspawn and new pid 1 in the container
work together. as then the symlink is created and the difference between
the two dirs won't matter.

For the case where an old nspawn invokes a new PID 1: in this case
things work as they always worked: the dir is managed together.

For the case where different container manager invokes a new PID 1: in
this case the nodes aren't typically passed in, and PID 1 in the
container will try to create them and will likely fail partially (though
gracefully) when trying to create char/block device nodes. THis is fine
though as there are fallbacks in place for that case.

For the case where a new nspawn invokes an old PID1: this is were the
(minor) incompatibily happens: in this case new nspawn will place the
nodes in the /run/host/inaccessible/ subdir, but the PID 1 in the
container won't look for them there. Since the nodes are also not
pre-created in /run/systed/inaccessible/ PID 1 will try to create them
there as if a different container manager sets them up. This is of
course not sexy, but is not a total loss, since as mentioned fallbacks
are in place anyway. Hence I think it's OK to accept this minor
incompatibility.
2020-08-20 10:17:52 +02:00
Zbigniew Jędrzejewski-Szmek 2eecdd1d69
Merge pull request #16790 from poettering/core-if-block-merge
core: merge a few if blocks
2020-08-20 10:15:01 +02:00
Lennart Poettering 1f894e682c machine-id-setup: don't use KVM or container manager supplied uuid if in chroot env
Fixes: #16758
2020-08-19 18:23:11 +02:00
Lennart Poettering 4428c49db9 mount-setup: drop pointless zero initialization 2020-08-19 18:11:00 +02:00
Lennart Poettering 3196e42393 core: merge a few if blocks
arg_system == true and getpid() == 1 hold under the very same condition
this early in the main() function (this only changes later when we start
parsing command lines, where arg_system = true is set if users invoke us
in test mode even when getpid() != 1.

Hence, let's simplify things, and merge a couple of if branches and not
pretend they were orthogonal.
2020-08-19 18:06:12 +02:00
Benjamin Berg 56f47800d8 mount-setup: Enable memory_recursiveprot for cgroup2
When available, enable memory_recursiveprot. Realistically it always
makes sense to delegate MemoryLow= and MemoryMin= to all children of a
slice/unit.

The kernel option is not enabled by default as it might cause
regressions in some setups. However, it is the better default in
general, and it results in a more flexible and obvious behaviour.

The alternative to using this option would be for user's to also set
DefaultMemoryLow= on slices when assigning MemoryLow=. However, this
makes the effect of MemoryLow= on some children less obvious, as it
could result in a lower protection rather than increasing it.

From the kernel documentation:

  memory_recursiveprot

        Recursively apply memory.min and memory.low protection to
        entire subtrees, without requiring explicit downward
        propagation into leaf cgroups.  This allows protecting entire
        subtrees from one another, while retaining free competition
        within those subtrees.  This should have been the default
        behavior but is a mount-option to avoid regressing setups
        relying on the original semantics (e.g. specifying bogusly
        high 'bypass' protection values at higher tree levels).

This was added in kernel commit 8a931f801340c (mm: memcontrol:
recursive memory.low protection), which became available in 5.7 and was
subsequently fixed in kernel 5.7.7 (mm: memcontrol: handle div0 crash
race condition in memory.low).
2020-08-19 11:17:01 +02:00
Alyssa Ross 556a7bbed6
load-fragment: fix grammar in error messages 2020-08-18 20:56:59 +00:00
Lennart Poettering 3f181262f4 namespace: fix minor memory leak 2020-08-14 15:33:04 +02:00
Lennart Poettering 0a388dfcc5 core,home,machined: generate description fields for all groups we synthesize 2020-08-07 08:39:52 +02:00
Luca Boccassi b3d133148e core: new feature MountImages
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.

Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
2020-08-05 21:34:55 +01:00
Axel Rasmussen a119185c02 selinux: improve comment about getcon_raw semantics
This code was changed in this pull request:
https://github.com/systemd/systemd/pull/16571

After some discussion and more investigation, we better understand
what's going on. So, update the comment, so things are more clear
to future readers.
2020-08-05 20:20:45 +02:00
Zbigniew Jędrzejewski-Szmek d06bd2e785 Merge pull request #16596 from poettering/event-time-rel
Conflict in src/libsystemd-network/test-ndisc-rs.c fixed manually.
2020-08-04 16:07:03 +02:00
Zbigniew Jędrzejewski-Szmek 94efaa3181 core: reset bus error before reuse
From a report in https://bugzilla.redhat.com/show_bug.cgi?id=1861463:
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Trying to enqueue job usb-gadget.target/start/fail
usb-gadget.target: Failed to load configuration: No such file or directory
Assertion '!bus_error_is_dirty(e)' failed at src/libsystemd/sd-bus/bus-error.c:239, function bus_error_setfv(). Ignoring.
sys-devices-platform-soc-2100000.bus-2184000.usb-ci_hdrc.0-udc-ci_hdrc.0.device: Failed to enqueue SYSTEMD_WANTS= job, ignoring: Unit usb-gadget.target not found.

I *think* this is the place where the reuse occurs: we call
bus_unit_validate_load_state(unit, e) twice in a row.
2020-08-03 17:54:32 +02:00
Zbigniew Jędrzejewski-Szmek 7e62257219
Merge pull request #16308 from bluca/root_image_options
service: add new RootImageOptions feature
2020-08-03 10:04:36 +02:00
Zbigniew Jędrzejewski-Szmek b67ec8e5b2 pid1: stop limiting size of /dev/shm
The explicit limit is dropped, which means that we return to the kernel default
of 50% of RAM. See 362a55fc14 for a discussion why that is not as much as it
seems. It turns out various applications need more space in /dev/shm and we
would break them by imposing a low limit.

While at it, rename the define and use a single macro for various tmpfs mounts.
We don't really care what the purpose of the given tmpfs is, so it seems
reasonable to use a single macro.

This effectively reverts part of 7d85383edb. Fixes #16617.
2020-07-30 18:48:35 +02:00
Luca Boccassi 18d7370587 service: add new RootImageOptions feature
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:

RootImageOptions=1:ro,dev 2:nosuid nodev

In absence of a partition number, 0 is assumed.
2020-07-29 17:17:32 +01:00
Lennart Poettering 39cf0351c5 tree-wide: make use of new relative time events in sd-event.h 2020-07-28 11:24:55 +02:00
Axel Rasmussen 199a892218 selinux: handle getcon_raw producing a NULL pointer, despite returning 0
Previously, we assumed that success meant we definitely got a valid
pointer. There is at least one edge case where this is not true (i.e.,
we can get both a 0 return value, and *also* a NULL pointer):
4246bb550d/libselinux/src/procattr.c (L175)

When this case occurrs, if we don't check the pointer we SIGSEGV in
early initialization.
2020-07-24 13:34:27 +09:00
Lennart Poettering 8047ac8fdc core: clean more env vars from env block pid1 receives
We generally clean all env vars we use ourselves to communicate with out
childrens. We forgot some more recent additions however. Let's correct
that.
2020-07-23 18:30:15 +02:00
Lennart Poettering 00b868e857
Merge pull request #16542 from keszybz/make-targets-fail-again
Make targets fail again
2020-07-23 08:37:47 +02:00
Lennart Poettering c3f8a065e9 execute: take ownership of more fields in ExecParameters
Let's simplify things a bit, and take ownership of more fields in
ExecParameters, so that they are automatically freed when the structure
is released.
2020-07-23 08:37:21 +02:00
Zbigniew Jędrzejewski-Szmek 94d1ddbd7c pid1: target units can fail through dependencies
Fixes #16401.

c80a9a33d0 introduced the .can_fail field,
but didn't set it on .targets. Targets can fail through dependencies.
This leaves .slice and .device units as the types that cannot fail.

$ systemctl cat bad.service bad.target bad-fallback.service
[Service]
Type=oneshot
ExecStart=false

[Unit]
OnFailure=bad-fallback.service

[Service]
Type=oneshot
ExecStart=echo Fixing everythign!

$ sudo systemctl start bad.target
systemd[1]: Starting bad.service...
systemd[1]: bad.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: bad.service: Failed with result 'exit-code'.
systemd[1]: Failed to start bad.service.
systemd[1]: Dependency failed for bad.target.
systemd[1]: bad.target: Job bad.target/start failed with result 'dependency'.
systemd[1]: bad.target: Triggering OnFailure= dependencies.
systemd[1]: Starting bad-fallback.service...
echo[46901]: Fixing everythign!
systemd[1]: bad-fallback.service: Succeeded.
systemd[1]: Finished bad-fallback.service.
2020-07-22 17:58:12 +02:00
Zbigniew Jędrzejewski-Szmek 771b52427a core/job: adjust whitespace and comment 2020-07-22 17:58:12 +02:00
Lennart Poettering 58afc4f8e4 core: don't acquire dual timestamp needlessly if we don't need it in .timer handling
Follow-up for: 26698337f3
2020-07-21 17:33:47 +02:00
Yu Watanabe 9e54462cd5
Merge pull request #16482 from poettering/coverity-246
two coverity fixes
2020-07-16 20:23:23 +09:00
Lennart Poettering 3cd4459003 Revert "selinux: cache enforced status and treat retrieve failure as enforced mode"
This reverts commit 257188f80c.
2020-07-16 08:49:35 +02:00
Lennart Poettering f63ef93703 execute: fix if check
Fixes: coverity 1430459
2020-07-16 08:35:18 +02:00
Lennart Poettering 330f899079 load-fragment: downgrade log messages we ignore to LOG_WARNING
We typically don't log above LOG_WARNING about issues we then go on to
ignore. Do so here, too
2020-07-16 14:58:05 +09:00
Lennart Poettering 8d5bb13d78 core: fix invalid assertion
We miscounted here, and would hit an assert once too early.
2020-07-16 09:13:04 +09:00
Filipe Brandenburger 26698337f3 timer: Adjust calendar timers based on monotonic timer instead of realtime
When the RTC time at boot is off in the future by a few days, OnCalendar=
timers will be scheduled based on the time at boot. But if the time has been
adjusted since boot, the timers will end up scheduled way in the future, which
may cause them not to fire as shortly or often as expected.

Update the logic so that the time will be adjusted based on monotonic time.
We do that by calculating the adjusted manager startup realtime from the
monotonic time stored at that time, by comparing that time with the realtime
and monotonic time of the current time.

Added a test case to validate this works as expected. The test case creates a
QEMU virtual machine with the clock 3 days in the future. Then we adjust the
clock back 3 days, and test creating a timer with an OnCalendar= for every 15
minutes. We also check the manager startup timestamp from both `systemd-analyze
dump` and from D-Bus.

Test output without the corresponding code changes that fix the issue:

  Timer elapse outside of the expected 20 minute window.
    next_elapsed=1594686119
    now=1594426921
    time_delta=259198

With the code changes in, the test passes as expected.
2020-07-15 09:23:09 +02:00
Zbigniew Jędrzejewski-Szmek 76830e2500
Merge pull request #16462 from keszybz/rpm-macro-warnings
Emit better errors for rpm macro misuse
2020-07-15 08:56:28 +02:00
Zbigniew Jędrzejewski-Szmek 6cdc429454
Merge pull request #16340 from keszybz/var-tmp-readonly
Create ro private /var/tmp dir when /var/tmp is read-only
2020-07-14 19:59:48 +02:00
Zbigniew Jędrzejewski-Szmek 56a13a495c pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only
Read-only /var/tmp is more likely, because it's backed by a real device. /tmp
is (by default) backed by tmpfs, but it doesn't have to be. In both cases the
same consideration applies.

If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail
because we cannot create the subdir under /var/tmp to mount the private directory.
But many services actually don't require /var/tmp (either because they only use
it occasionally, or because they only use /tmp, or even because they don't use the
temporary directories at all, and PrivateTmp=yes is used to isolate them from
the rest of the system).

To handle both cases let's create a read-only directory under /run/systemd and
mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into
dumping too much data in /run.)

$ sudo systemd-run -t -p PrivateTmp=yes bash
Running as unit: run-u14.service
Press ^] three times within 1s to disconnect TTY.
[root@workstation /]# ls -l /tmp/
total 0
[root@workstation /]# ls -l /var/tmp/
total 0
[root@workstation /]# touch /tmp/f
[root@workstation /]# touch /var/tmp/f
touch: cannot touch '/var/tmp/f': Read-only file system

This commit has more changes than I like to put in one commit, but it's touching all
the same paths so it's hard to split.
exec_runtime_make() was using the wrong cleanup function, so the directory would be
left behind on error.
2020-07-14 19:47:15 +02:00
Zbigniew Jędrzejewski-Szmek 1061fc1c17 rpm: include macro name in errors for two args macros too 2020-07-14 19:22:42 +02:00
Zbigniew Jędrzejewski-Szmek 281014b73e rpm: adjust various macros to print macro name in the error message
Based on initial patch by Jan Engelhardt <jengelh@inai.de>.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856122.
2020-07-14 19:21:12 +02:00
Zbigniew Jędrzejewski-Szmek 8800df5f71
Merge pull request #16430 from mikhailnov/fix-rpm-create-package-macros
Fix RPM *_create_package macros
2020-07-14 19:02:09 +02:00
Christian Göttsche f2df56bfea namespace: unify logging in mount_tmpfs
Fixes: abad72be4d
Follow up: #16426
2020-07-11 21:25:39 +02:00
Mikhail Novosyolov 3e6e0856cd rpm: avoid hiding errors and output in *_create_package macros
Commit b0ca726585 "rpm: avoid hiding errors from systemd commands" remove hiding errors and output
for other macros, but did not do that for %sysusers_create_package and %tmpfiles_create_package.

This change syncs their behaviour with %sysusers_create and %tmpfiles_create

Signed-off-by: Mikhail Novosyolov <m.novosyolov@rosalinux.ru>
2020-07-11 17:20:23 +03:00
Mikhail Novosyolov 93406fd379 rpm: avoid odd symbols in EOF indicator
The last line in this macros was actually "SYSTEMD_INLINE_EOF " with a space at the end,
but the shell was instructed to look for a line without space.

Macros %sysusers_create_inline and %tmpfiles_create_inline did not have this mistake.

An example:
[root@rosa-2019 bind-server]# cat /etc/passwd | grep named
[root@rosa-2019 bind-server]# cat /tmp/bs
systemd-sysusers --replace=/usr/lib/sysusers.d/named.conf - <<SYSTEMD_INLINE_EOF >/dev/null 2>&1 || :
u named - "BIND DNS Server" /var/lib/named
g named - -
m named named
SYSTEMD_INLINE_EOF
[root@rosa-2019 bind-server]# sh /tmp/bs
/tmp/bs: line 5: warning: here-document at line 1 delimited by end-of-file (wanted `SYSTEMD_INLINE_EOF')
[root@rosa-2019 bind-server]# bash /tmp/bs
/tmp/bs: line 5: warning: here-document at line 1 delimited by end-of-file (wanted `SYSTEMD_INLINE_EOF')
[root@rosa-2019 bind-server]# bash --version
GNU bash, version 5.0.17(1)-release (x86_64-openmandriva-linux-gnu)

The user and group named were NOT created!

Now I remove the trailing space after "SYSTEMD_INLINE_EOF" and rerun:
[root@rosa-2019 bind-server]# sh /tmp/bs
[root@rosa-2019 bind-server]# tail -n 1 /etc/group
named485:named
[root@rosa-2019 bind-server]#

The user and group have been created correctly.

Signed-off-by: Mikhail Novosyolov <m.novosyolov@rosalinux.ru>
2020-07-11 17:20:16 +03:00
Christian Göttsche abad72be4d namespace: fix MAC labels of TemporaryFileSystem=
Reproducible with:
  systemd-run -p TemporaryFileSystem=/root -t /bin/bash
    ls -dZ /root

Prior:
  root:object_r:tmpfs_t:s0 /root
Past:
  root:object_r:user_home_dir_t:s0 /root
2020-07-11 00:09:05 +02:00
Zbigniew Jędrzejewski-Szmek 02b0109af5
Merge pull request #15955 from anitazha/nullorempty
core: check null_or_empty_path for masked units instead of /dev/null
2020-07-08 22:18:17 +02:00
Zbigniew Jędrzejewski-Szmek cbc056c819 core: wrap some long lines and other formatting changes 2020-07-08 16:37:23 +02:00
Alan Perry 5dc60faae5 add error message when bind mount src missing 2020-07-07 20:04:19 +02:00
Zbigniew Jędrzejewski-Szmek 2b0bf3ccf8
Merge pull request #16301 from poettering/firstboot-image
Add --image= switch to firstboot, similar to --root= but with support for operating on disk image
2020-07-07 19:44:12 +02:00
Zbigniew Jędrzejewski-Szmek cd990847b9 tree-wide: more repeated words 2020-07-07 12:08:22 +02:00
Lennart Poettering e2ec9c4d3a namespace-util: introduce helper for combining unshare() + MS_SLAVE remount
We have multiple places we do these two non-trivial operations together,
let's introduce a unified helper for doing both at once.
2020-07-07 11:20:42 +02:00