Commit graph

5796 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 48904c8bfd core/execute: escape the separator in exported paths
Our paths shouldn't even contain ":", but let's escape it if one somehow sneaks
in.
2020-09-25 13:36:34 +02:00
Zbigniew Jędrzejewski-Szmek d4d9f034b1 basic/strv: allow escaping the separator in strv_join()
The new parameter is false everywhere except for tests, so no functional change
is expected.
2020-09-25 13:36:34 +02:00
Zbigniew Jędrzejewski-Szmek fe79f107ef tree-wide: drop assignments to r when we only need errno
If the whole call is simple and we don't need to look at the return value
apart from the conditional, let's use a form without assignment of the return
value. When the function call is more complicated, it still makes sense to
use a temporary variable.
2020-09-24 16:36:43 +02:00
Zbigniew Jędrzejewski-Szmek 6119878480 core: turn on MountAPIVFS=true when RootImage or RootDirectory are specified
Lennart wanted to do this back in
01c33c1eff.
For better or worse, this wasn't done because I thought that turning on MountAPIVFS
is a compat break for RootDirectory and people might be negatively surprised by it.
Without this, search for binaries doesn't work (access_fd() requires /proc).
Let's turn it on, but still allow overriding to "no".

When RootDirectory=/, MountAPIVFS=1 doesn't work. This might be a buglet on its
own, but this patch doesn't change the situation.
2020-09-24 10:03:18 +02:00
Zbigniew Jędrzejewski-Szmek 5e98086d16 core: remember when we set ExecContext.mount_apivfs
No functional change intended so far.
2020-09-24 10:03:18 +02:00
Lennart Poettering bcaf20dc38
Merge pull request #17143 from keszybz/late-exec-resolution-alt
Late exec resolution (subset)
2020-09-24 09:38:36 +02:00
Lennart Poettering 21935150a0 tree-wide: switch remaining mount() invocations over to mount_nofollow_verbose()
(Well, at least the ones where that makes sense. Where it does't make
sense are the ones that re invoked on the root path, which cannot
possibly be a symlink.)
2020-09-23 18:57:37 +02:00
Lennart Poettering 30f5d10421 mount-util: rework umount_verbose() to take log level and flags arg
Let's make umount_verbose() more like mount_verbose_xyz(), i.e. take log
level and flags param. In particular the latter matters, since we
typically don't actually want to follow symlinks when unmounting.
2020-09-23 18:57:36 +02:00
Lennart Poettering 511a8cfe30 mount-util: switch most mount_verbose() code over to not follow symlinks 2020-09-23 18:57:36 +02:00
Zbigniew Jędrzejewski-Szmek 8038b99d0d run: let systemd resolve the path with RootDirectory=/RootImage=
Fixes #13338.
2020-09-23 14:49:37 +02:00
Zbigniew Jędrzejewski-Szmek 526e3cbbdd core: don't try to load units from non-absolute paths
The error message disagreed with the check that was actually performed. Adjust the check.
2020-09-23 14:49:37 +02:00
Lennart Poettering 6b6737119a
Merge pull request #17130 from keszybz/static-analyzer-cleanups
Trivial cleanups based on static analysis
2020-09-23 13:23:56 +02:00
Zbigniew Jędrzejewski-Szmek 89de370edd core/namespace: drop bitfield annotations from boolean fields
Such microoptimization makes sense when the structure is used in many many copies,
but here's it's not, and the few bytes we save are not worth the extra code the
compiler has to generate:

    return  ns_info->mount_apivfs ||
            ns_info->protect_control_groups ||
            ns_info->protect_kernel_tunables ||
            ...
before:
  49b187:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  49b18b:       0f b6 00                movzbl (%rax),%eax
  49b18e:       83 e0 80                and    $0xffffff80,%eax
  49b191:       84 c0                   test   %al,%al
  49b193:       75 32                   jne    49b1c7 <namespace_info_mount_apivfs+0x80>
  49b195:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  49b199:       0f b6 00                movzbl (%rax),%eax
  49b19c:       83 e0 08                and    $0x8,%eax
  49b19f:       84 c0                   test   %al,%al
  49b1a1:       75 24                   jne    49b1c7 <namespace_info_mount_apivfs+0x80>
  49b1a3:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  49b1a7:       0f b6 00                movzbl (%rax),%eax
  49b1aa:       83 e0 10                and    $0x10,%eax
  49b1ad:       84 c0                   test   %al,%al
  49b1af:       75 16                   jne    49b1c7 <namespace_info_mount_apivfs+0x80>

after:
  49b024:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  49b028:       0f b6 40 07             movzbl 0x7(%rax),%eax
  49b02c:       84 c0                   test   %al,%al
  49b02e:       75 2e                   jne    49b05e <namespace_info_mount_apivfs+0x7a>
  49b030:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  49b034:       0f b6 40 03             movzbl 0x3(%rax),%eax
  49b038:       84 c0                   test   %al,%al
  49b03a:       75 22                   jne    49b05e <namespace_info_mount_apivfs+0x7a>
  49b03c:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  49b040:       0f b6 40 04             movzbl 0x4(%rax),%eax
  49b044:       84 c0                   test   %al,%al
  49b046:       75 16                   jne    49b05e <namespace_info_mount_apivfs+0x7a>
2020-09-22 17:58:11 +02:00
Lennart Poettering 065b47749d tree-wide: use ERRNO_IS_PRIVILEGE() whereever appropriate 2020-09-22 16:25:22 +02:00
Lennart Poettering aee36b4ea2 dissect-image: process /usr/ GPT partition type 2020-09-19 21:19:51 +02:00
Zbigniew Jędrzejewski-Szmek 0af07108e4 core/execute: reduce indentation level a bit 2020-09-18 15:28:48 +02:00
Zbigniew Jędrzejewski-Szmek 9f71ba8d95 core: resolve binary names immediately before execution
This has two advantages:
- we save a bit of IO in early boot because we don't look for executables
  which we might never call
- if the executable is in a different place and it was specified as a
  non-absolute path, it is OK if it moves to a different place. This should
  solve the case paths are different in the initramfs.

Since the executable path is only available quite late, the call to
mac_selinux_get_child_mls_label() which uses the path needs to be moved down
too.

Fixes #16076.
2020-09-18 15:28:48 +02:00
Zbigniew Jędrzejewski-Szmek 0706c01259 Add CLOSE_AND_REPLACE helper
Similar to free_and_replace. I think this should be uppercase to make it
clear that this is a macro. free_and_replace should probably be uppercased
too.
2020-09-18 15:28:48 +02:00
Zbigniew Jędrzejewski-Szmek 831d57953e core: use X_OK when looking for executables
Other tools silently ignore non-executable names found in path. By checking
F_OK, we would could pick non-executable path even though there is an executable
one later.
2020-09-18 15:28:48 +02:00
Zbigniew Jędrzejewski-Szmek 598c47c86e core/load-fragment: don't treat "; ;" as "/usr/bin/;"
We had a special test case that the second semicolon would be interpreted
as an executable name. We would then try to find the executable and rely
on ";" not being found to cause ENOEXEC to be returned. I think that's just
crazy. Let's treat the second semicolon as a separator and ignore the
whole thing as we would whitespace.
2020-09-18 15:28:48 +02:00
Lennart Poettering 89e62e0bd3 dissect: wrap verity settings in new VeritySettings structure
Just some refactoring: let's place the various verity related parameters
in a common structure, and pass that around instead of the individual
parameters.

Also, let's load the PKCS#7 signature data when finding metadata
right-away, instead of delaying this until we need it. In all cases we
call this there's not much time difference between the metdata finding
and the loading, hence this simplifies things and makes sure root hash
data and its signature is now always acquired together.
2020-09-17 20:36:23 +09:00
Lennart Poettering eb5e26112e
Merge pull request #17076 from poettering/dissect-cleanup
minor cleanups to the dissector code
2020-09-16 18:42:12 +02:00
Lennart Poettering 569a0e42ec dissect: introduce PartitionDesignator as real type 2020-09-16 16:14:01 +02:00
Topi Miettinen 9df2cdd8ec exec: SystemCallLog= directive
With new directive SystemCallLog= it's possible to list system calls to be
logged. This can be used for auditing or temporarily when constructing system
call filters.

---
v5: drop intermediary, update HASHMAP_FOREACH_KEY() use
v4: skip useless debug messages, actually parse directive
v3: don't declare unused variables with old libseccomp
v2: fix build without seccomp or old libseccomp
2020-09-15 12:54:17 +03:00
Topi Miettinen 005bfaf118 exec: Add kill action to system call filters
Define explicit action "kill" for SystemCallErrorNumber=.

In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.

---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
 init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
2020-09-15 12:54:17 +03:00
Yu Watanabe 8cc53fae36 core: use strv_free_and_replace() at one more place 2020-09-15 09:36:34 +02:00
Yu Watanabe 87bc687a8c core/device: remove .device unit corresponding to DEVPATH_OLD
Partially fixes #16967.
2020-09-15 09:40:08 +09:00
Lennart Poettering 2a407487b2
Merge pull request #17049 from mrc0mmand/code-and-spell-check
tree-wide: assorted cleanups/fixes
2020-09-14 23:00:02 +02:00
Zbigniew Jędrzejewski-Szmek 094c6fc338
Merge pull request #17031 from poettering/path-start-limit
core: propagate start limit hit from triggered unit to path unit
2020-09-14 21:51:39 +02:00
Zbigniew Jędrzejewski-Szmek bc2ed3bbf0
Merge pull request #17039 from poettering/dbus-default-dep
tweak when we synthesize dbus deps for service units
2020-09-14 21:45:53 +02:00
Lennart Poettering 2a03b9ed21 tree-wide: don't needlessly negate error number passed to bus_error_message()
Like it's customary in our codebase bus_error_message() internally takes
abs() of the passed error anyway, hence no need to explicitly negate it.
We mostly got this right, but in too many cases we didn't. Fix that.
2020-09-14 21:42:22 +02:00
Frantisek Sumsal 973bc32ab6 core: consolidate alloc & put operations into one statement 2020-09-14 16:13:44 +02:00
Frantisek Sumsal 69e3234db7 tree-wide: fix typos found by codespell
Reported by Fossies.org
2020-09-14 15:32:37 +02:00
Lennart Poettering 47ab8f73e3 core: propagate unit start limit hit state to triggering path unit
We already do this for socket and automount units, do it for path units
too: if the triggered service keeps hitting the start limit, then fail
the triggering unit too, so that we don#t busy loop forever.

(Note that this leaves only timer units out in the cold for this kind of
protection, but it shouldn't matter there, as they are naturally
protected against busy loops: they are scheduled by time anyway).

Fixes: #16669
2020-09-14 13:05:09 +02:00
Lennart Poettering 0377cd2936 core: propagate triggered unit in more load states
In 4c2ef32767 we enabled propagating
triggered unit state to the triggering unit for service units in more
load states, so that we don't accidentally stop tracking state
correctly.

Do the same for our other triggering unit states: automounts, paths, and
timers.

Also, make this an assertion rather than a simple test. After all it
should never happen that we get called for half-loaded units or units of
the wrong type. The load routines should already have made this
impossible.
2020-09-14 13:05:09 +02:00
Lennart Poettering a7f49f0b7c service: add implicit dbus deps only for Type=dbus units
We want to be able to use BusName= in services that run during early boot
already, and thus don't synthesize deps on dbus there. Instead add them
when Type=dbus is set, because in that case we actually really need
D-Bus support.

Fixes: #17037
2020-09-14 11:07:30 +02:00
Lennart Poettering 31d74c66e2 core: don't warn if BusName= is used for non-Type=dbus services
It's useful for more than just Type=dbus now, given #16976. Hence, let's
drop the warning.
2020-09-14 11:07:12 +02:00
Zbigniew Jędrzejewski-Szmek 4b6bc95c01
Merge pull request #17009 from poettering/rootprefix-noslash
remove duplicate slashes in systemd-path output if rootprefixdir is "/"
2020-09-12 10:07:40 +02:00
Lennart Poettering 35b4e3c1bc socket: downgrade log warnings about inability to set socket buffer sizes
In containers we might lack the privs to up the socket buffers. Let's
not complain so loudly about that. Let's hence downgrade this to debug
logging if it's a permission problem.

(This wasn't an issue before b92f350789
because back then the failures wouldn't be detected at all.)
2020-09-12 08:14:54 +02:00
Lennart Poettering 6e65df89c3 pkg-config: prefix is not really configurable, don't pretend it was
We generally don't support prefix being != /usr, and this is hardcoded
all over the place. In the systemd.pc file it wasn't so far. Let's
adjust this to match the rest of the codebase.
2020-09-11 13:09:06 +02:00
Lennart Poettering 5d0fe4233b tree-wide: add helper for IPv4/IPv6 sockopts
A variety of sockopts exist both for IPv4 and IPv6 but require a
different pair of sockopt level/option number. Let's add helpers for
these that internally determine the right sockopt to call.

This should shorten code that generically wants to support both ipv4 +
ipv6 and for the first time adds correct support for some cases where we
only called the ipv4 versions, and not the ipv6 options.
2020-09-11 10:33:13 +02:00
Yu Watanabe 323dda7806 core: downgrade error level and ignore several non-critical errors 2020-09-10 16:24:31 +09:00
Lennart Poettering 4934ba2121 socket: fix copy/paste error
Fixes: CID1432653
2020-09-09 20:14:25 +02:00
Lennart Poettering 12ce0f4173
Merge pull request #16635 from keszybz/do-not-for-each-word
Drop FOREACH_WORD
2020-09-09 17:43:38 +02:00
Lennart Poettering a6b3be1abf
Merge pull request #16972 from wusto/ambient-and-keep-caps-corrections
Ambient capabilities documenation and keep-caps usage corrections
2020-09-09 17:09:42 +02:00
Lennart Poettering 244d9793ee
Merge pull request #16984 from yuwata/make-log_xxx_error-void
Make log_xxx_error() or friends return void
2020-09-09 16:28:51 +02:00
Tobias Kaufmann 198dc17845 core: fix set keep caps for ambient capabilities
The securebit keep-caps retains the capabilities in the permitted set
over an UID change (ambient capabilities are cleared though).

Setting the keep-caps securebit after the uid change and before execve
doesn't make sense as it is cleared during execve and there is no
additional user ID change after this point.

Altough the documentation (man 7 capabilities) is ambigious, keep-caps
is reset during execve although keep-caps-locked is set. After execve
only keep-caps-locked is set and keep-caps is cleared.
2020-09-09 11:17:42 +02:00
Tobias Kaufmann 16fcb1918a core: fix comments on ambient capabilities
The comments on the code for ambient capabilities was wrong/outdated.
2020-09-09 11:17:42 +02:00
Zbigniew Jędrzejewski-Szmek 7896ad8f66 core/load-fragment: use extract_first_word()
This is much nicer, and also fixes a potential overflow when we used
'word' in log_error() as if it was a NUL-terminated string.
2020-09-09 09:34:54 +02:00
Yu Watanabe ded71ab3bc core/socket: use fd_set_{rcv,snd}buf() 2020-09-09 06:39:05 +09:00
Lennart Poettering f3f4abad29
Merge pull request #16979 from keszybz/return-log-debug
Fix 'return log_error()' and 'return log_warning()' patterns
2020-09-08 19:54:38 +02:00
Michal Sekletár 9a1e90aee5 cgroup: freezer action must be NOP when cgroup v2 freezer is not available
Low-level cgroup freezer state manipulation is invoked directly from the
job engine when we are about to execute the job in order to make sure
the unit is not frozen and job execution is not blocked because of
that.

Currently with cgroup v1 we would needlessly do a bunch of work in the
function and even falsely update the freezer state. Don't do any of this
and skip the function silently when v2 freezer is not available.

Following bug is fixed by this commit,

$ systemd-run --unit foo.service /bin/sleep infinity
$ systemctl restart foo.service
$ systemctl show -p FreezerState foo.service

Before (cgroup v1, i.e. full "legacy" mode):
FreezerState=thawing

After:
FreezerState=running
2020-09-08 19:54:13 +02:00
Yu Watanabe 8ed6f81ba3 core: make log_unit_error() or friends return void 2020-09-09 02:34:38 +09:00
Yu Watanabe 93c5b90459 core/slice: explicitly specify return value 2020-09-09 02:34:38 +09:00
Lennart Poettering c6552f7cd5
Merge pull request #16955 from keszybz/test-execute-cleanup
One patch for test-execute and assorted cleanups
2020-09-08 18:33:12 +02:00
Zbigniew Jędrzejewski-Szmek c413bb28df tree-wide: correct cases where return log_{error,warning} is used without value
In various cases, we would say 'return log_warning()' or 'return log_error()'. Those
functions return 0 if no error is passed in. For log_warning or log_error this doesn't
make sense, and we generally want to propagate the error. In the few cases where
the error should be ignored, I think it's better to split it in two, and call 'return 0'
on a separate line.
2020-09-08 17:40:46 +02:00
Zbigniew Jędrzejewski-Szmek 90e74a66e6 tree-wide: define iterator inside of the macro 2020-09-08 12:14:05 +02:00
Zbigniew Jędrzejewski-Szmek 12375b95dd core/unit: reduce scope of variables 2020-09-08 12:07:05 +02:00
Michal Sekletár 332d387f47 core: introduce support for setting NUMAMask= to special "all" value
Fixes #14113
2020-09-08 08:16:03 +02:00
Christian Göttsche e813a74ae8 selinux: create /run/user/${USERID}/systemd with default context 2020-09-05 21:39:44 +02:00
Zbigniew Jędrzejewski-Szmek 9978e631cd core/manager: reindent table for readability 2020-09-04 18:14:26 +02:00
Zbigniew Jędrzejewski-Szmek 5b10116e49 core/{execute, manager}: reduce scope of iterator variables a bit 2020-09-04 18:14:26 +02:00
Luca Boccassi 836540070d core: add [Enable|Disable]UnitFilesWithFlags DBUS methods
The new methods work as the unflavoured ones, but takes flags as a
single uint64_t DBUS parameters instead of different booleans, so
that it can be extended without breaking backward compatibility.
Add new flag to allow adding/removing symlinks in
[/etc|/run]/systemd/system.attached so that portable services
configuration files can be self-contained in those directories, without
affecting the system services directories.
Use the new methods and flags from portablectl --enable.

Useful in case /etc is read-only, with only the portable services
directories being mounted read-write.
2020-09-04 17:56:37 +02:00
Lennart Poettering 7cc60ea414
Merge pull request #16821 from cgzones/selinux_status
selinux: use SELinux status page
2020-09-03 14:55:08 +02:00
Lennart Poettering c457bf4741
Merge pull request #16940 from keszybz/socket-enotconn-cleanup
Cleanup socket enotconn handling
2020-09-03 14:51:02 +02:00
Zbigniew Jędrzejewski-Szmek 5cf09553c3 core/socket: use _cleanup_ to close the connection fd
Removing the gotos would lead to a lot of duplicated code, so I left them
as they were.
2020-09-02 18:18:28 +02:00
Zbigniew Jędrzejewski-Szmek b669c20f97 core/socket: fold socket_instantiate_service() into socket_enter_running()
socket_instantiate_service() was doing unit_ref_set(), and the caller was
immediately doing unit_ref_unset(). After we get rid of this, it doesn't seem
worth it to have two functions.
2020-09-02 18:18:28 +02:00
Zbigniew Jędrzejewski-Szmek 86e045ecef core/socket: we may get ENOTCONN from socket_instantiate_service()
This means that the connection was aborted before we even got to figure out
what the service name will be. Let's treat this as a non-event and close the
connection fd without any further messages.

Code last changed in 934ef6a5.
Reported-by: Thiago Macieira <thiago.macieira@intel.com>

With the patch:
systemd[1]: foobar.socket: Incoming traffic
systemd[1]: foobar.socket: Got ENOTCONN on incoming socket, assuming aborted connection attempt, ignoring.
...

Also, when we get ENOMEM, don't give the hint about missing unit.
2020-09-02 18:17:30 +02:00
Zbigniew Jędrzejewski-Szmek 6ee37b1a7d
Merge pull request #16853 from poettering/udev-current-tag2
udev: make uevents "sticky"
2020-09-02 08:12:56 +02:00
Zbigniew Jędrzejewski-Szmek 47b04ef632
Merge pull request #16925 from cgzones/selinux_create_label
selinux/core: create several file objects with default SELinux context
2020-09-01 22:19:52 +02:00
Lennart Poettering 242c1c075a core: make sure to recheck current udev tag "systemd" before considering a device ready
Let's ensure that a device once tagged can become active/inactive simply
by toggling the current tag.

Note that this means that a device once tagged with "systemd" will
always have a matching .device unit. However, the active/inactive state
of the unit reflects whether it is currently tagged that way (and
doesn't have SYSTEMD_READY=0 set).

Fixes: #7587
2020-09-01 17:40:12 +02:00
Lennart Poettering 895abf3fdd
Merge pull request #16727 from wusto/core-fix-securebits
core: fix securebits setting
2020-09-01 17:21:48 +02:00
Renaud Métrich 3e5f04bf64 socket: New option 'FlushPending' (boolean) to flush socket before entering listening state
Disabled by default. When Enabled, before listening on the socket, flush the content.
Applies when Accept=no only.
2020-09-01 17:20:23 +02:00
Christian Göttsche 63e00ccd8e selinux: create /run/systemd/userdb directory and sockets with default SELinux context 2020-09-01 16:26:12 +02:00
Christian Göttsche 45ae2f725e selinux: create systemd/notify socket with default SELinux context 2020-09-01 16:25:06 +02:00
Christian Göttsche a3f5fd964b selinux: create unit invocation links with default SELinux context 2020-09-01 15:48:53 +02:00
Tobias Kaufmann dbdc4098f6 core: fix securebits setting
Desired functionality:
Set securebits for services started as non-root user.

Failure:
The starting of the service fails if no ambient capability shall be
raised.
... systemd[217941]: ...: Failed to set process secure bits: Operation
not permitted
... systemd[217941]: ...: Failed at step SECUREBITS spawning
/usr/bin/abc.service: Operation not permitted
... systemd[1]: abc.service: Failed with result 'exit-code'.

Reason:
For setting securebits the capability CAP_SETPCAP is required. However
the securebits (if no ambient capability shall be raised) are set after
setresuid.
When setresuid is invoked all capabilities are dropped from the
permitted, effective and ambient capability set. If the securebit
SECBIT_KEEP_CAPS is set the permitted capability set is retained, but
the effective and the ambient set are cleared.

If ambient capabilities shall be set, the securebit SECBIT_KEEP_CAPS is
added to the securebits configured in the service file and set together
with the securebits from the service file before setresuid is executed
(in enforce_user).
Before setresuid is executed the capabilities are the same as for pid1.
This means that all capabilities in the effective, permitted and
bounding set are set. Thus the capability CAP_SETPCAP is in the
effective set and the prctl(PR_SET_SECUREBITS, ...) succeeds.
However, if the secure bits aren't set before setresuid is invoked they
shall be set shortly after the uid change in enforce_user.
This fails as SECBIT_KEEP_CAPS wasn't set before setresuid and in
consequence the effective and permitted set was cleared, hence
CAP_SETPCAP is not set in the effective set (and cannot be raised any
longer) and prctl(PR_SET_SECUREBITS, ...) failes with EPERM.

Proposed solution:
The proposed solution consists of three parts
1. Check in enforce_user, if securebits are configured in the service
   file. If securebits are configured, set SECBIT_KEEP_CAPS
   before invoking setresuid.
2. Don't set any other securebits than SECBIT_KEEP_CAPS in enforce_user,
   but set all requested ones after enforce_user.
   This has the advantage that securebits are set at the same place for
   root and non-root services.
3. Raise CAP_SETPCAP to the effective set (if not already set) before
   setting the securebits to avoid EPERM during the prctl syscall.

For gaining CAP_SETPCAP the function capability_bounding_set_drop is
splitted into two functions:
- The first one raises CAP_SETPCAP (required for dropping bounding
  capabilities)
- The second drops the bounding capabilities

Why are ambient capabilities not affected by this change?
Ambient capabilities get cleared during setresuid, no matter if
SECBIT_KEEP_CAPS is set or not.
For raising ambient capabilities for a user different to root, the
requested capability has to be raised in the inheritable set first. Then
the SECBIT_KEEP_CAPS securebit needs to be set before setresuid is
invoked. Afterwards the ambient capability can be raised, because it is
in the inheritable and permitted set.

Security considerations:
Although the manpage is ambiguous SECBIT_KEEP_CAPS is cleared during
execve no matter if SECBIT_KEEP_CAPS_LOCKED is set or not. If both are
set only SECBIT_KEEP_CAPS_LOCKED is set after execve.
Setting SECBIT_KEEP_CAPS in enforce_user for being able to set
securebits is no security risk, as the effective and permitted set are
set to the value of the ambient set during execve (if the executed file
has no file capabilities. For details check man 7 capabilities).

Remark:
In capability-util.c is a comment complaining about the missing
capability CAP_SETPCAP in the effective set, after the kernel executed
/sbin/init. Thus it is checked there if this capability has to be raised
in the effective set before dropping capabilities from the bounding set.
If this were true all the time, ambient capabilities couldn't be set
without dropping at least one capability from the bounding set, as the
capability CAP_SETPCAP would miss and setting SECBIT_KEEP_CAPS would
fail with EPERM.
2020-09-01 10:53:26 +02:00
Anita Zhang 0419dae715
Merge pull request #16885 from keszybz/rework-cache-timestamps
Rework cache timestamps
2020-08-31 23:21:12 -07:00
Zbigniew Jędrzejewski-Szmek c2911d48ff Rework how we cache mtime to figure out if units changed
Instead of assuming that more-recently modified directories have higher mtime,
just look for any mtime changes, up or down. Since we don't want to remember
individual mtimes, hash them to obtain a single value.

This should help us behave properly in the case when the time jumps backwards
during boot: various files might have mtimes that in the future, but we won't
care. This fixes the following scenario:

We have /etc/systemd/system with T1. T1 is initially far in the past.
We have /run/systemd/generator with time T2.
The time is adjusted backwards, so T2 will be always in the future for a while.
Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'.
Nevertheless, T1 < T1' << T2.
We would consider our cache to be up-to-date, falsely.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 02103e5716 core: always try to reload not-found unit
This check was added in d904afc730. It would only
apply in the case where the cache hasn't been loaded yet. I think we pretty
much always have the cache loaded when we reach this point, but even if we
didn't, it seems better to try to reload the unit. So let's drop this check.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek c149d2b491 pid1: use the cache mtime not clock to "mark" load attempts
We really only care if the cache has been reloaded between the time when we
last attempted to load this unit and now. So instead of recording the actual
time we try to load the unit, just store the timestamp of the cache. This has
the advantage that we'll notice if the cache mtime jumps forward or backward.

Also rename fragment_loadtime to fragment_not_found_time. It only gets set when
we failed to load the unit and the old name was suggesting it is always set.

In https://bugzilla.redhat.com/show_bug.cgi?id=1871327
(and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930
and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try
to load a non-existent unit over and over from transaction_add_job_and_dependencies().
My understanding is that the clock was in the future during inital boot,
so cache_mtime is always in the future (since we don't touch the fs after initial boot),
so no matter how many times we try to load the unit and set
fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime,
so manager_unit_cache_should_retry_load() always returns true.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 81be23886d core: rename manager_unit_file_maybe_loadable_from_cache()
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
2020-08-31 20:53:38 +02:00
Lennart Poettering b519529104
Merge pull request #16841 from keszybz/acl-util-bitmask
Use a bitmask in fd_add_uid_acl_permission()
2020-08-31 16:45:13 +02:00
fangxiuning c53aafb7b5
tree-wide: drop pointless zero initialization (#16884)
tree-wide: drop pointless zero initialization
2020-08-28 17:45:54 +02:00
Lennart Poettering ae6ad21e0b device: propagate reload events from devices on everything but "add", and "remove"
Any uevent other then the initial and the last uevent we see for a
device (which is "add" and "remove") should result in a reload being
triggered, including "bind" and "unbind". Hence, let's fix up the check.

("move" is kinda a combined "remove" + "add", hence cover that too)
2020-08-28 13:30:13 +02:00
Yu Watanabe 8062e643e6 core: clear bind mounts on error
Follow-up for bbb4e7f39f.

Fixes CID#1431998.
2020-08-27 18:20:34 +09:00
Christian Göttsche 2df2152c20 selinux: fork label-aware children with up-to-date label database
The parent process may not perform any label operation, so the
database might not get updated on a SELinux policy change on its own.

Reload the label database once on a policy change, instead of n times
in every started child.
2020-08-27 10:28:53 +02:00
Christian Göttsche fd5e402fa9 selinux: use SELinux status page
Switch from security_getenforce() and netlink notifications to the
SELinux status page.

This usage saves system calls and will also be the default in
libselinux > 3.1 [1].

[1]: 05bdc03130
2020-08-27 10:28:53 +02:00
Zbigniew Jędrzejewski-Szmek 567aeb5801 shared/acl-util: convert rd,wr,ex to a bitmask
I find this version much more readable.

Add replacement defines so that when acl/libacl.h is not available, the
ACL_{READ,WRITE,EXECUTE} constants are also defined. Those constants were
declared in the kernel headers already in 1da177e4c3f41524e886b7f1b8a0c1f,
so they should be the same pretty much everywhere.
2020-08-27 10:20:12 +02:00
PhoenixDiscord e8607daf7d
Replace gendered pronouns with gender neutral ones. (#16844) 2020-08-27 11:52:48 +09:00
Lennart Poettering bbb4e7f39f core: hide /run/credentials whenever namespacing is requested
Ideally we would like to hide all other service's credentials for all
services. That would imply for us to enable mount namespacing for all
services, which is something we cannot do, both due to compatibility
with the status quo ante, and because a number of services legitimately
should be able to install mounts in the host hierarchy.

Hence we do the second best thing, we hide the credentials automatically
for all services that opt into mount namespacing otherwise. This is
quite different from other mount sandboxing options: usually you have to
explicitly opt into each. However, given that the credentials logic is a
brand new concept we invented right here and now, and particularly
security sensitive it's OK to reverse this, and by default hide
credentials whenever we can (i.e. whenever mount namespacing is
otherwise opt-ed in to).

Long story short: if you want to hide other service's credentials, the
most basic options is to just turn on PrivateMounts= and there you go,
they should all be gone.
2020-08-25 19:45:38 +02:00
Lennart Poettering bb0c0d6f29 core: add credentials logic
Fixes: #15778 #16060
2020-08-25 19:45:35 +02:00
Lennart Poettering 45374f6503
Merge pull request #15662 from Werkov/fix-cgroup-disable
Fix unsetting cgroup restrictions
2020-08-25 17:36:07 +02:00
Lennart Poettering f053c9477b core: drop redundant comment
Since 625a164069 we don't need to update
analyze-condition.c separately anymore, hence drop the comment
suggesting otherwise.
2020-08-25 07:47:50 +02:00
Lennart Poettering 4e39995371 core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options
Kernel 5.8 gained a hidepid= implementation that is truly per procfs,
which allows us to mount a distinct once into every unit, with
individual hidepid= settings. Let's expose this via two new settings:
ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=).

Replaces: #11670
2020-08-24 20:11:02 +02:00
Lennart Poettering df6b900a1b namespace: assert() first, use second 2020-08-24 20:10:58 +02:00
Lennart Poettering 52b3d6523f namespace: move protect_{home|system} into NamespaceInfo
it's not entirely clear what shall be passed via parameter and what via
struct, but these two definitely fit well with the other protect_xyz
fields, hence let's move them over.

We probably should move a lot more more fields into the structure
actuall (most? all even?).
2020-08-24 20:10:30 +02:00
Lennart Poettering 9aab8d7a98
Merge pull request #16804 from keszybz/conditionals-and-spelling-fixes
Conditionals and spelling fixes
2020-08-21 13:36:30 +02:00
Zbigniew Jędrzejewski-Szmek 3fb01017ee
Merge pull request #16686 from bluca/mount_images_opts
core: add mount options support for MountImages
2020-08-21 10:11:08 +02:00
Zbigniew Jędrzejewski-Szmek 990307c3da
Merge pull request #16803 from poettering/analyze-condition-rework
support missing conditions/asserts everywhere
2020-08-20 18:18:13 +02:00
Zbigniew Jędrzejewski-Szmek 2aed63f427 tree-wide: fix spelling of "fallback"
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
2020-08-20 17:45:32 +02:00
Lennart Poettering 5b14956385
Merge pull request #16543 from poettering/nspawn-run-host
nspawn: /run/host/ tweaks
2020-08-20 16:20:05 +02:00
Luca Boccassi 427353f668 core: add mount options support for MountImages
Follow the same model established for RootImage and RootImageOptions,
and allow to either append a single list of options or tuples of
partition_number:options.
2020-08-20 14:45:40 +01:00
Luca Boccassi 9ece644435 core: change RootImageOptions to use names instead of partition numbers
Follow the designations from the Discoverable Partitions Specification
2020-08-20 13:58:02 +01:00
Luca Boccassi bc8d56d305 core: use strv_split_colon_pairs when parsing RootImageOptions 2020-08-20 13:24:32 +01:00
Luca Boccassi c20acbb2bd core: cleanup unused variables
Leftovers from previous implementation of MountImages feature, unused now
2020-08-20 13:24:32 +01:00
Lennart Poettering 476cfe626d core: remove support for ConditionNull=
The concept is flawed, and mostly useless. Let's finally remove it.

It has been deprecated since 90a2ec10f2 (6
years ago) and we started to warn since
55dadc5c57 (1.5 years ago).

Let's get rid of it altogether.
2020-08-20 14:01:25 +02:00
Lennart Poettering 4f55a5b0bf core: add missing conditions/asserts to unit file parsing 2020-08-20 13:56:14 +02:00
Zbigniew Jędrzejewski-Szmek ec673ad4ab
Merge pull request #16559 from benzea/benzea/memory-recursiveprot
mount-setup: Enable memory_recursiveprot for cgroup2
2020-08-20 13:05:07 +02:00
Lennart Poettering 3242980582 core: create per-user inaccessible node from the service manager
Previously, we'd create them from user-runtime-dir@.service. That has
one benefit: since this service runs privileged, we can create the full
set of device nodes. It has one major drawback though: it security-wise
problematic to create files/directories in directories as privileged
user in directories owned by unprivileged users, since they can use
symlinks to redirect what we want to do. As a general rule we hence
avoid this logic: only unpriv code should populate unpriv directories.

Hence, let's move this code to an appropriate place in the service
manager. This means we lose the inaccessible block device node, but
since there's already a fallback in place, this shouldn't be too bad.
2020-08-20 10:18:02 +02:00
Lennart Poettering 9fac502920 nspawn,pid1: pass "inaccessible" nodes from cntr mgr to pid1 payload via /run/host
Let's make /run/host the sole place we pass stuff from host to container
in and place the "inaccessible" nodes in /run/host too.

In contrast to the previous two commits this is a minor compat break, but
not a relevant one I think. Previously the container manager would place
these nodes in /run/systemd/inaccessible/ and that's where PID 1 in the
container would try to add them too when missing. Container manager and
PID 1 in the container would thus manage the same dir together.

With this change the container manager now passes an immutable directory
to the container and leaves /run/systemd entirely untouched, and managed
exclusively by PID 1 inside the container, which is nice to have clear
separation on who manages what.

In order to make sure systemd then usses the /run/host/inaccesible/
nodes this commit changes PID 1 to look for that dir and if it exists
will symlink it to /run/systemd/inaccessible.

Now, this will work fine if new nspawn and new pid 1 in the container
work together. as then the symlink is created and the difference between
the two dirs won't matter.

For the case where an old nspawn invokes a new PID 1: in this case
things work as they always worked: the dir is managed together.

For the case where different container manager invokes a new PID 1: in
this case the nodes aren't typically passed in, and PID 1 in the
container will try to create them and will likely fail partially (though
gracefully) when trying to create char/block device nodes. THis is fine
though as there are fallbacks in place for that case.

For the case where a new nspawn invokes an old PID1: this is were the
(minor) incompatibily happens: in this case new nspawn will place the
nodes in the /run/host/inaccessible/ subdir, but the PID 1 in the
container won't look for them there. Since the nodes are also not
pre-created in /run/systed/inaccessible/ PID 1 will try to create them
there as if a different container manager sets them up. This is of
course not sexy, but is not a total loss, since as mentioned fallbacks
are in place anyway. Hence I think it's OK to accept this minor
incompatibility.
2020-08-20 10:17:52 +02:00
Zbigniew Jędrzejewski-Szmek 2eecdd1d69
Merge pull request #16790 from poettering/core-if-block-merge
core: merge a few if blocks
2020-08-20 10:15:01 +02:00
Lennart Poettering 1f894e682c machine-id-setup: don't use KVM or container manager supplied uuid if in chroot env
Fixes: #16758
2020-08-19 18:23:11 +02:00
Lennart Poettering 4428c49db9 mount-setup: drop pointless zero initialization 2020-08-19 18:11:00 +02:00
Lennart Poettering 3196e42393 core: merge a few if blocks
arg_system == true and getpid() == 1 hold under the very same condition
this early in the main() function (this only changes later when we start
parsing command lines, where arg_system = true is set if users invoke us
in test mode even when getpid() != 1.

Hence, let's simplify things, and merge a couple of if branches and not
pretend they were orthogonal.
2020-08-19 18:06:12 +02:00
Michal Koutný d9ef594454 cgroup: Cleanup function usage
Some masks shouldn't be needed externally, so keep their functions in
the module (others would fit there too but they're used in tests) to
think twice if something would depend on them.

Drop unused function cg_attach_many_everywhere.

Use cgroup_realized instead of cgroup_path when we actually ask for
realized.

This should not cause any functional changes.
2020-08-19 11:41:53 +02:00
Michal Koutný 12b975e065 cgroup: Reduce unit_get_ancestor_disable_mask use
The usage in unit_get_own_mask is redundant, we only need apply
disable_mask at the end befor application, i.e. calculating enable or
target mask.

(IOW, we allow all configurations, but disabling affects effective
controls.)

Modify tests accordingly and add testing of enable mask.

This is intended as cleanup, with no effect but changing unit_dump
output.
2020-08-19 11:41:53 +02:00
Michal Koutný 4c591f3996 cgroup: Introduce family queueing instead of siblings
The unit_add_siblings_to_cgroup_realize_queue does more than mere
siblings queueing, hence define a family of a unit as (immediate)
children of the unit and immediate children of all ancestors.

Working with this abstraction simplifies the queuing calls and it
shouldn't change the functionality.
2020-08-19 11:41:53 +02:00
Michal Koutný f23ba94db3 cgroup: Implicit unit_invalidate_cgroup_members_masks
Merge members mask invalidation into
unit_add_siblings_to_cgroup_realize_queue, this way unit_realize_cgroup
needn't be called with members mask invalidation.

We have to retain the members mask invalidation in unit_load -- although
active units would have cgroups (re)realized (unit_load queues for
realization), the realization would happen with potentially stale mask.
2020-08-19 11:41:53 +02:00
Michal Koutný fb46fca7e0 cgroup: Eager realization in unit_free
unit_free(u) realizes direct parent and invalidates members mask of all
ancestors. This isn't sufficient in v1 controller hierarchies since
siblings of the freed unit may have existed only because of the removed
unit.

We cannot be lazy about the siblings because if parent(u) is also
removed, it'd migrate and rmdir cgroups for siblings(u). However,
realized masks of siblings(u) won't reflect this change.

This was a non-issue earlier, because we weren't removing cgroup
directories properly (effectively matching the stale realized mask),
removal failed because of tasks left by missing migration (see previous
commit).

Therefore, ensure realization of all units necessary to clean up after
the free'd unit.

Fixes: #14149
2020-08-19 11:41:53 +02:00
Michal Koutný 7b63961415 cgroup: Swap cgroup v1 deletion and migration
When we are about to derealize a controller on v1 cgroup, we first
attempt to delete the controller cgroup and migrate afterwards. This
doesn't work in practice because populated cgroup cannot be deleted.
Furthermore, we leave out slices from migration completely, so
(un)setting a control value on them won't realize their controller
cgroup.

Rework actual realization, unit_create_cgroup() becomes
unit_update_cgroup() and make sure that controller hierarchies are
reduced when given controller cgroup ceased to be needed.

Note that with this we introduce slight deviation between v1 and v2 code
-- when a descendant unit turns off a delegated controller, we attempt
to disable it in ancestor slices. On v2 this may fail (kernel enforced,
because of child cgroups using the controller), on v1 we'll migrate
whole subtree and trim the subhierachy. (Previously, we wouldn't take
away delegated controller, however, derealization was broken anyway.)

Fixes: #14149
2020-08-19 11:41:53 +02:00
Benjamin Berg 56f47800d8 mount-setup: Enable memory_recursiveprot for cgroup2
When available, enable memory_recursiveprot. Realistically it always
makes sense to delegate MemoryLow= and MemoryMin= to all children of a
slice/unit.

The kernel option is not enabled by default as it might cause
regressions in some setups. However, it is the better default in
general, and it results in a more flexible and obvious behaviour.

The alternative to using this option would be for user's to also set
DefaultMemoryLow= on slices when assigning MemoryLow=. However, this
makes the effect of MemoryLow= on some children less obvious, as it
could result in a lower protection rather than increasing it.

From the kernel documentation:

  memory_recursiveprot

        Recursively apply memory.min and memory.low protection to
        entire subtrees, without requiring explicit downward
        propagation into leaf cgroups.  This allows protecting entire
        subtrees from one another, while retaining free competition
        within those subtrees.  This should have been the default
        behavior but is a mount-option to avoid regressing setups
        relying on the original semantics (e.g. specifying bogusly
        high 'bypass' protection values at higher tree levels).

This was added in kernel commit 8a931f801340c (mm: memcontrol:
recursive memory.low protection), which became available in 5.7 and was
subsequently fixed in kernel 5.7.7 (mm: memcontrol: handle div0 crash
race condition in memory.low).
2020-08-19 11:17:01 +02:00
Alyssa Ross 556a7bbed6
load-fragment: fix grammar in error messages 2020-08-18 20:56:59 +00:00
Lennart Poettering 3f181262f4 namespace: fix minor memory leak 2020-08-14 15:33:04 +02:00
Lennart Poettering 0a388dfcc5 core,home,machined: generate description fields for all groups we synthesize 2020-08-07 08:39:52 +02:00
Luca Boccassi b3d133148e core: new feature MountImages
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.

Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
2020-08-05 21:34:55 +01:00
Axel Rasmussen a119185c02 selinux: improve comment about getcon_raw semantics
This code was changed in this pull request:
https://github.com/systemd/systemd/pull/16571

After some discussion and more investigation, we better understand
what's going on. So, update the comment, so things are more clear
to future readers.
2020-08-05 20:20:45 +02:00
Zbigniew Jędrzejewski-Szmek d06bd2e785 Merge pull request #16596 from poettering/event-time-rel
Conflict in src/libsystemd-network/test-ndisc-rs.c fixed manually.
2020-08-04 16:07:03 +02:00
Zbigniew Jędrzejewski-Szmek 94efaa3181 core: reset bus error before reuse
From a report in https://bugzilla.redhat.com/show_bug.cgi?id=1861463:
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Trying to enqueue job usb-gadget.target/start/fail
usb-gadget.target: Failed to load configuration: No such file or directory
Assertion '!bus_error_is_dirty(e)' failed at src/libsystemd/sd-bus/bus-error.c:239, function bus_error_setfv(). Ignoring.
sys-devices-platform-soc-2100000.bus-2184000.usb-ci_hdrc.0-udc-ci_hdrc.0.device: Failed to enqueue SYSTEMD_WANTS= job, ignoring: Unit usb-gadget.target not found.

I *think* this is the place where the reuse occurs: we call
bus_unit_validate_load_state(unit, e) twice in a row.
2020-08-03 17:54:32 +02:00
Zbigniew Jędrzejewski-Szmek 7e62257219
Merge pull request #16308 from bluca/root_image_options
service: add new RootImageOptions feature
2020-08-03 10:04:36 +02:00
Zbigniew Jędrzejewski-Szmek b67ec8e5b2 pid1: stop limiting size of /dev/shm
The explicit limit is dropped, which means that we return to the kernel default
of 50% of RAM. See 362a55fc14 for a discussion why that is not as much as it
seems. It turns out various applications need more space in /dev/shm and we
would break them by imposing a low limit.

While at it, rename the define and use a single macro for various tmpfs mounts.
We don't really care what the purpose of the given tmpfs is, so it seems
reasonable to use a single macro.

This effectively reverts part of 7d85383edb. Fixes #16617.
2020-07-30 18:48:35 +02:00
Luca Boccassi 18d7370587 service: add new RootImageOptions feature
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:

RootImageOptions=1:ro,dev 2:nosuid nodev

In absence of a partition number, 0 is assumed.
2020-07-29 17:17:32 +01:00
Michal Koutný 30ad3ca086 cgroup: Add root slice to cgroup realization queue
When we're disabling controller on a direct child of root cgroup, we
forgot to add root slice into cgroup realization queue, which prevented
proper disabling of the controller (on unified hierarchy).

The mechanism relying on "bounce from bottom and propagate up" in
unit_create_cgroup doesn't work on unified hierarchy (leaves needn't be
enabled). Drop it as we rely on the ancestors to be queued -- that's now
intentional but was artifact of combining the two patches:

cb5e3bc37d ("cgroup: Don't explicitly check for member in UNIT_BEFORE") v240~78
65f6b6bdcb ("core: fix re-realization of cgroup siblings") v245-rc1~153^2

Fixes: #14917
2020-07-28 15:49:24 +02:00
Michal Koutný a479c21ed2 cgroup: Make realize_queue behave FIFO
The current implementation is LIFO, which is a) confusing b) prevents
some ordered operations on the cgroup tree (e.g. removing children
before parents).

Fix it quickly. Current list implementation turns this from O(1) to O(n)
operation. Rework the lists later.
2020-07-28 15:49:24 +02:00
Lennart Poettering 39cf0351c5 tree-wide: make use of new relative time events in sd-event.h 2020-07-28 11:24:55 +02:00
Axel Rasmussen 199a892218 selinux: handle getcon_raw producing a NULL pointer, despite returning 0
Previously, we assumed that success meant we definitely got a valid
pointer. There is at least one edge case where this is not true (i.e.,
we can get both a 0 return value, and *also* a NULL pointer):
4246bb550d/libselinux/src/procattr.c (L175)

When this case occurrs, if we don't check the pointer we SIGSEGV in
early initialization.
2020-07-24 13:34:27 +09:00
Lennart Poettering 8047ac8fdc core: clean more env vars from env block pid1 receives
We generally clean all env vars we use ourselves to communicate with out
childrens. We forgot some more recent additions however. Let's correct
that.
2020-07-23 18:30:15 +02:00
Lennart Poettering 00b868e857
Merge pull request #16542 from keszybz/make-targets-fail-again
Make targets fail again
2020-07-23 08:37:47 +02:00
Lennart Poettering c3f8a065e9 execute: take ownership of more fields in ExecParameters
Let's simplify things a bit, and take ownership of more fields in
ExecParameters, so that they are automatically freed when the structure
is released.
2020-07-23 08:37:21 +02:00
Zbigniew Jędrzejewski-Szmek 94d1ddbd7c pid1: target units can fail through dependencies
Fixes #16401.

c80a9a33d0 introduced the .can_fail field,
but didn't set it on .targets. Targets can fail through dependencies.
This leaves .slice and .device units as the types that cannot fail.

$ systemctl cat bad.service bad.target bad-fallback.service
[Service]
Type=oneshot
ExecStart=false

[Unit]
OnFailure=bad-fallback.service

[Service]
Type=oneshot
ExecStart=echo Fixing everythign!

$ sudo systemctl start bad.target
systemd[1]: Starting bad.service...
systemd[1]: bad.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: bad.service: Failed with result 'exit-code'.
systemd[1]: Failed to start bad.service.
systemd[1]: Dependency failed for bad.target.
systemd[1]: bad.target: Job bad.target/start failed with result 'dependency'.
systemd[1]: bad.target: Triggering OnFailure= dependencies.
systemd[1]: Starting bad-fallback.service...
echo[46901]: Fixing everythign!
systemd[1]: bad-fallback.service: Succeeded.
systemd[1]: Finished bad-fallback.service.
2020-07-22 17:58:12 +02:00
Zbigniew Jędrzejewski-Szmek 771b52427a core/job: adjust whitespace and comment 2020-07-22 17:58:12 +02:00
Lennart Poettering 58afc4f8e4 core: don't acquire dual timestamp needlessly if we don't need it in .timer handling
Follow-up for: 26698337f3
2020-07-21 17:33:47 +02:00
Yu Watanabe 9e54462cd5
Merge pull request #16482 from poettering/coverity-246
two coverity fixes
2020-07-16 20:23:23 +09:00
Lennart Poettering 3cd4459003 Revert "selinux: cache enforced status and treat retrieve failure as enforced mode"
This reverts commit 257188f80c.
2020-07-16 08:49:35 +02:00
Lennart Poettering f63ef93703 execute: fix if check
Fixes: coverity 1430459
2020-07-16 08:35:18 +02:00
Lennart Poettering 330f899079 load-fragment: downgrade log messages we ignore to LOG_WARNING
We typically don't log above LOG_WARNING about issues we then go on to
ignore. Do so here, too
2020-07-16 14:58:05 +09:00
Lennart Poettering 8d5bb13d78 core: fix invalid assertion
We miscounted here, and would hit an assert once too early.
2020-07-16 09:13:04 +09:00
Filipe Brandenburger 26698337f3 timer: Adjust calendar timers based on monotonic timer instead of realtime
When the RTC time at boot is off in the future by a few days, OnCalendar=
timers will be scheduled based on the time at boot. But if the time has been
adjusted since boot, the timers will end up scheduled way in the future, which
may cause them not to fire as shortly or often as expected.

Update the logic so that the time will be adjusted based on monotonic time.
We do that by calculating the adjusted manager startup realtime from the
monotonic time stored at that time, by comparing that time with the realtime
and monotonic time of the current time.

Added a test case to validate this works as expected. The test case creates a
QEMU virtual machine with the clock 3 days in the future. Then we adjust the
clock back 3 days, and test creating a timer with an OnCalendar= for every 15
minutes. We also check the manager startup timestamp from both `systemd-analyze
dump` and from D-Bus.

Test output without the corresponding code changes that fix the issue:

  Timer elapse outside of the expected 20 minute window.
    next_elapsed=1594686119
    now=1594426921
    time_delta=259198

With the code changes in, the test passes as expected.
2020-07-15 09:23:09 +02:00
Zbigniew Jędrzejewski-Szmek 76830e2500
Merge pull request #16462 from keszybz/rpm-macro-warnings
Emit better errors for rpm macro misuse
2020-07-15 08:56:28 +02:00
Zbigniew Jędrzejewski-Szmek 6cdc429454
Merge pull request #16340 from keszybz/var-tmp-readonly
Create ro private /var/tmp dir when /var/tmp is read-only
2020-07-14 19:59:48 +02:00