Commit graph

5187 commits

Author SHA1 Message Date
Anita Zhang adae5eb977
Merge pull request #14219 from poettering/homed-preparatory-loop
preparatory /dev/loopN support split out of homed PR
2019-12-04 16:07:41 -08:00
Lennart Poettering eaadc03d61
Merge pull request #14133 from keur/clear_ambient_inherited
Clear ambient inherited
2019-12-04 10:30:58 +01:00
Lennart Poettering b51d61fec6
Merge pull request #14177 from keszybz/use-initrd.target
Use initrd.target in the initramfs
2019-12-04 10:30:32 +01:00
Christian Göttsche a9dfac21ec core: reload SELinux label cache on daemon-reload
Reloading the SELinux label cache here enables a light-wight follow-up of a SELinux policy change, e.g. adding a label for a RuntimeDirectory.

Closes: #13363
2019-12-04 10:29:46 +01:00
Lennart Poettering 68d58f3869 pid1: add new kernel cmdline arg systemd.cpu_affinity=
Let's allow configuration of the CPU affinity via the kernel cmdline,
overriding CPUAffinity= in /etc/systemd/system.conf

Prompted by:

https://lists.freedesktop.org/archives/systemd-devel/2019-November/043754.html
2019-12-04 10:28:43 +01:00
Jérémy Rosen a652f050a7 Create parent directories when creating systemd-private subdirs
This is needed when systemd is compiled without systemd-tmpfiles
2019-12-04 09:22:52 +01:00
Topi Miettinen 7477451b69 core: swap priority can be negative
Negative priorities are useful for swap targets which should be only used as
last resort.
2019-12-04 08:57:08 +01:00
Lennart Poettering e08f94acf5 loop-util: accept loopback flags when creating loopback device
This way callers can choose if they want partition scanning or not.
2019-12-02 10:05:09 +01:00
Zbigniew Jędrzejewski-Szmek 8755dbad5b pid1: use initrd.target in the initramfs by default
This makes the code do what the documentation says. The code had no inkling
about initrd.target, so I think this change is fairly risky. As a fallback,
default.target will be loaded, so initramfses which relied on current behaviour
will still work, as along as they don't have a different initrd.target.

In an initramfs created with recent dracut:
$ ls -l usr/lib/systemd/system/{default.target,initrd.target}
lrwxrwxrwx. usr/lib/systemd/system/default.target -> initrd.target
-rw-r--r--. usr/lib/systemd/system/initrd.target
So at least for dracut, there should be no difference.

Also avoid a pointless allocation.
2019-11-28 19:59:33 +01:00
Yu Watanabe d2a56598d0
Merge pull request #14166 from keszybz/transient-unit-settings
Fix docs and some transient unit property passing
2019-11-28 17:23:30 +09:00
Kevin Kuehler 943800f4e7 execute: Call capability_ambient_set_apply even if ambient set is 0
The function capability_ambient_set_apply() now drops capabilities not
in the capability_ambient_set(), so it is necessary to call it when
the ambient set is empty.

Fixes #13163
2019-11-27 10:57:23 -08:00
Zbigniew Jędrzejewski-Szmek e737017b85 pid1: make TimeoutAbortSec settable for transient units
It was documented to be, but implementation was missing.
2019-11-27 13:56:29 +01:00
Zbigniew Jędrzejewski-Szmek a61d68748a pid1: fix setting of DefaultTimeoutAbortSec
This partially reverts a07a7324ad.
We have two pieces of information: the value and a boolean.
config_parse_timeout_abort() added in the reverted commit would write
the boolean to the usec_t value, making a mess.

The code is reworked to have just one implementation and two wrappers
which pass two pointers.
2019-11-27 13:56:28 +01:00
Zbigniew Jędrzejewski-Szmek b9d9fbe411 shared/conf-parser: remove unnecessary whitespace skipping
The conf-parser machinery already removed whitespace before and after "=", no
need to repeat this step.

The test is adjusted to pass. It was testing an code path that doesn't happen
normally, no point in doing that.
2019-11-27 13:56:28 +01:00
Lennart Poettering 540ac9338e core: prefer non-@ syntax for ExecStart=
If the zeroth and first argv[] element on the same we don't need to
generate the "@" syntax for ExecStart= and friends.
2019-11-27 12:32:14 +01:00
Lennart Poettering f14bf01312 core: write out correct field name when creating transient service units 2019-11-27 12:23:00 +01:00
Yu Watanabe cfbb1c6def
Merge pull request #14134 from keszybz/variables-and-docs
Documentation and option parsing fixes
2019-11-26 12:40:30 +09:00
Anita Zhang 05d6628ad2
Merge pull request #14151 from mk-fg/fix-timer-dump-syntax-bug
core.timer: fix "systemd-analyze dump" and docs syntax inconsistencies wrt OnTimezoneChange=
2019-11-25 15:56:33 -08:00
Mike Kazantsev 0810e39628 core.timer: fix "systemd-analyze dump" and docs syntax inconsistencies wrt OnTimezoneChange= 2019-11-26 04:29:03 +05:00
Zbigniew Jędrzejewski-Szmek 0b8d307587 pid1: fix the names of AllowedCPUs= and AllowedMemoryNodes=
The original PR was submitted with CPUSetCpus and CPUSetMems, which was later
changed to AllowedCPUs and AllowedMemmoryNodes everywhere (including the parser
used by systemd-run), but not in the parser for unit files.

Since we already released -rc1, let's keep support for the old names. I think
we can remove it in a release or two if anyone remembers to do that.

Fixes #14126. Follow-up for 047f5d63d7.
2019-11-25 14:02:14 +01:00
Zbigniew Jędrzejewski-Szmek 868f7d36cc core/service: downgrade "scheduling restart" message to debug
I see we log this during every boot, even though it is a routine expected event:
Nov 12 14:50:01 krowka systemd[1]: systemd-journald.service: Service has no hold-off time (RestartSec=0), scheduling restart.
(and for other services too). Let's downgrade this to debug level.

https://bugzilla.redhat.com/show_bug.cgi?id=1614871
2019-11-22 14:19:51 +01:00
Lennart Poettering 3288ea8f32 core: set "trusted.delegate" xattr on cgroups that are delegation boundaries
Let's mark cgroups that are delegation boundaries to us. This can then
be used by tools such as "systemd-cgls" to show where the next manager
takes over.
2019-11-20 17:50:12 +01:00
Lennart Poettering 59a49b1bcd
Merge pull request #14090 from poettering/clonenewns-fix
make sure systemd-logind.service can start if unshare() is blocked
2019-11-20 17:27:56 +01:00
Zbigniew Jędrzejewski-Szmek 2d8898f564
Merge pull request #14074 from keszybz/rename-system-options
Rename system-options
2019-11-20 16:13:46 +01:00
Lennart Poettering 6d19b71876 core: don't insist on ProtectHostname= if unshare() is blocked
Previously we'd only skip ProtectHostname= if kernel support for
namespaces was lacking. With this change we also accept if unshare()
fails because it is blocked.
2019-11-20 12:49:06 +01:00
Lennart Poettering 4e67759960 core: be more lenient when checking whether sandboxing is necessary
In some containers unshare() is made unavailable entirely. Let's deal
with this that more gracefully and disable our sandboxing of services
then, so that we work in a container, under the assumption the container
manager is then responsible for sandboxing if we can't do it ourselves.

Previously, we'd insist on sandboxing as soon as any form of BindPath=
is used. With this change we only insist on it if we have a setting like
that where source and destination differ, i.e. there's a mapping
established that actually rearranges things, and thus would result in
systematically different behaviour if skipped (as opposed to mappings
that just make stuff read-only/writable that otherwise arent').

(Let's also update a test that intended to test for this behaviour with
a more specific configuration that still triggers the behaviour with
this change in place)

Fixes: #13955

(For testing purposes unshare() can easily be blocked with
systemd-nspawn --system-call-filter=~unshare.)
2019-11-20 12:30:04 +01:00
Zbigniew Jędrzejewski-Szmek 3049e6a233
Merge pull request #14030 from keszybz/path-no-trigger
Fix spurious triggering of PathExists=
2019-11-18 22:20:07 +01:00
Zbigniew Jędrzejewski-Szmek fe67137895
Merge pull request #14007 from keszybz/tasks-max-dynamic
Calculate fractional TasksMax= before actual use
2019-11-18 22:18:33 +01:00
Zbigniew Jędrzejewski-Szmek 93e63b2a35 core/path: minor simplification 2019-11-18 20:18:34 +01:00
Zbigniew Jędrzejewski-Szmek d7cf8c24d4 core/path: fix spurious triggering of PathExists= on restart/reload
Our handling of the condition was inconsistent. Normally, we'd only fire when
the file was created (or removed and subsequently created again). But on restarts,
we'd do a "recheck" from path_coldplug(), and if the file existed, we'd
always trigger. Daemon restarts and reloads should not be observeable, in
the sense that they should not trigger units which were already triggered and
would not be started again under normal circumstances.

Note that the mechanism for checks is racy: we get a notification from inotify,
and by the time we check, the file could have been created and removed again,
or removed and created again. It would be better if we inotify would give as
an unambiguous signal that the file was created, but it doesn't: IN_DELETE_SELF
triggers on inode removal, not directory entry, so we need to include IN_ATTRIB,
which obviously triggers on other conditions.

Fixes #12801.
2019-11-18 14:20:05 +01:00
Zbigniew Jędrzejewski-Szmek 7a16cd4b05 core/path: serialize the previous_exists state
Without that we are prone to generate spurious events after daemon
reload/restart.
2019-11-18 14:13:05 +01:00
Lennart Poettering bf7eedbf8f mount: do not update exec deps on mountinfo changes
Fixes: #13978
2019-11-16 13:53:48 +01:00
Lennart Poettering b8e5776d38 mount: extend list of extrinsic mounts a bit 2019-11-16 13:53:48 +01:00
Lennart Poettering 8af381679d
Merge pull request #13940 from keur/protect_kernel_logs
Add ProtectKernelLogs to systemd.exec
2019-11-15 16:26:10 +01:00
Kevin Kuehler 94a7b2759d core: ProtectKernelLogs= mask kmsg in proc and sys
Block access to /dev/kmsg and /proc/kmsg when ProtectKernelLogs is set.
2019-11-14 12:58:43 -08:00
Zbigniew Jędrzejewski-Szmek 0877d4e0cf core: write cgroup limits as permilles
We allow expressing configuration as a fraction with granularity of 0.001, but
when writing out the unit file, we'd round that up to 0.01.

Longer term, I think it'd be nicer to simply use floats and do away with
arbitrary restrictions on precision.
2019-11-14 18:41:54 +01:00
Zbigniew Jędrzejewski-Szmek e617e2ccd7 core/dbus-cgroup: use %.*s instead of strndupa() 2019-11-14 18:41:54 +01:00
Zbigniew Jędrzejewski-Szmek 1454ab403e core/dbus-cgroup: drop unnecessary parens
'mask' is a macro parameter, so it cannot have commas. We don't need to
parenthesize.
2019-11-14 18:41:54 +01:00
Zbigniew Jędrzejewski-Szmek 3a0f06c41a core: make TasksMax a partially dynamic property
TasksMax= and DefaultTasksMax= can be specified as percentages. We don't
actually document of what the percentage is relative to, but the implementation
uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max,
and /sys/fs/cgroup/pids.max (when present). When the value is a percentage,
we immediately convert it to an absolute value. If the limit later changes
(which can happen e.g. when systemd-sysctl runs), the absolute value becomes
outdated.

So let's store either the percentage or absolute value, whatever was specified,
and only convert to an absolute value when the value is used. For example, when
starting a unit, the absolute value will be calculated when the cgroup for
the unit is created.

Fixes #13419.
2019-11-14 18:41:54 +01:00
Kevin Kuehler 8470304018 core: Add ProtectKernelLogs
If seccomp is enabled, load the SYSCALL_FILTER_SET_SYSLOG into the
seccomp filter set. Drop the CAP_SYSLOG capability.
2019-11-11 12:12:02 -08:00
Zbigniew Jędrzejewski-Szmek 45669ae264 bpf: make sure the kernel do not submit an invalid program if no pattern matched
It turns out that the kernel verifier would reject a program we would build
if there was a whitelist, but no entries in the whitelist matched.
The program would approximately like this:
   0: (61) r2 = *(u32 *)(r1 +0)
   1: (54) w2 &= 65535
   2: (61) r3 = *(u32 *)(r1 +0)
   3: (74) w3 >>= 16
   4: (61) r4 = *(u32 *)(r1 +4)
   5: (61) r5 = *(u32 *)(r1 +8)
  48: (b7) r0 = 0
  49: (05) goto pc+1
  50: (b7) r0 = 1
  51: (95) exit
and insn 50 is unreachable, which is illegal. We would then either keep a
previous version of the program or allow everything. Make sure we build a
valid program that simply rejects everything.
2019-11-11 15:14:09 +01:00
Zbigniew Jędrzejewski-Szmek 0048657828 bpf: optimize device type access away most of the time
Most of the time, we specify the allowed access mode as "rwm", so the check
always trivially passes. In that case, skip the check.

The repeating part changes from:
   5: (55) if r2 != 0x2 goto pc+6
   6: (bc) w1 = w3
   7: (54) w1 &= 7
   8: (5d) if r1 != r3 goto pc+3
   9: (55) if r4 != 0x1 goto pc+2
  10: (55) if r5 != 0x3 goto pc+1
  11: (05) goto pc+8
to
   6: (55) if r2 != 0x2 goto pc+3
   7: (55) if r4 != 0x1 goto pc+2
   8: (55) if r5 != 0x3 goto pc+1
   9: (05) goto pc+40
2019-11-11 15:14:02 +01:00
Zbigniew Jędrzejewski-Szmek 8ad08622d6 bpf: convert 'c'/'b' to bpf_type at the very end
This makes the code a bit longer, but easier to read I think, because
the cgroup v1 and v2 code paths are more similar. And whent he type is
a char, any backtrace is easier to interpret.
2019-11-11 15:13:56 +01:00
Zbigniew Jędrzejewski-Szmek 7973f56468 test-bpf-devices: new test for the devices bpf code 2019-11-11 15:13:38 +01:00
Zbigniew Jędrzejewski-Szmek a72a5326a4 bpf: fix off-by-one in class whitelisting
We would jump one insn too many, landing in the middle of the subsequent block.
2019-11-11 14:55:57 +01:00
Zbigniew Jędrzejewski-Szmek 415fe5ec7d bpf: fix device type filter
On big endian arches, we were taking the wrong half-word, so the check
was giving bogus results.

https://bugzilla.redhat.com/show_bug.cgi?id=1769148.
2019-11-11 14:55:57 +01:00
Zbigniew Jędrzejewski-Szmek 786cce0099 bpf: add trace logging
Very helpful when trying to figure out what exactly is going on.
2019-11-11 14:55:57 +01:00
Zbigniew Jędrzejewski-Szmek 0848715cab bpf: make bpf_devices_apply_policy() independent of any unit code 2019-11-11 14:55:57 +01:00
Zbigniew Jędrzejewski-Szmek 8b139557fe core: split out one more function 2019-11-11 14:55:52 +01:00
Zbigniew Jędrzejewski-Szmek a9aac7d8dd core: also split out helper to handle static device nodes 2019-11-10 23:22:15 +01:00