Commit graph

5751 commits

Author SHA1 Message Date
Lennart Poettering 244d9793ee
Merge pull request #16984 from yuwata/make-log_xxx_error-void
Make log_xxx_error() or friends return void
2020-09-09 16:28:51 +02:00
Tobias Kaufmann 198dc17845 core: fix set keep caps for ambient capabilities
The securebit keep-caps retains the capabilities in the permitted set
over an UID change (ambient capabilities are cleared though).

Setting the keep-caps securebit after the uid change and before execve
doesn't make sense as it is cleared during execve and there is no
additional user ID change after this point.

Altough the documentation (man 7 capabilities) is ambigious, keep-caps
is reset during execve although keep-caps-locked is set. After execve
only keep-caps-locked is set and keep-caps is cleared.
2020-09-09 11:17:42 +02:00
Tobias Kaufmann 16fcb1918a core: fix comments on ambient capabilities
The comments on the code for ambient capabilities was wrong/outdated.
2020-09-09 11:17:42 +02:00
Zbigniew Jędrzejewski-Szmek 7896ad8f66 core/load-fragment: use extract_first_word()
This is much nicer, and also fixes a potential overflow when we used
'word' in log_error() as if it was a NUL-terminated string.
2020-09-09 09:34:54 +02:00
Yu Watanabe ded71ab3bc core/socket: use fd_set_{rcv,snd}buf() 2020-09-09 06:39:05 +09:00
Lennart Poettering f3f4abad29
Merge pull request #16979 from keszybz/return-log-debug
Fix 'return log_error()' and 'return log_warning()' patterns
2020-09-08 19:54:38 +02:00
Michal Sekletár 9a1e90aee5 cgroup: freezer action must be NOP when cgroup v2 freezer is not available
Low-level cgroup freezer state manipulation is invoked directly from the
job engine when we are about to execute the job in order to make sure
the unit is not frozen and job execution is not blocked because of
that.

Currently with cgroup v1 we would needlessly do a bunch of work in the
function and even falsely update the freezer state. Don't do any of this
and skip the function silently when v2 freezer is not available.

Following bug is fixed by this commit,

$ systemd-run --unit foo.service /bin/sleep infinity
$ systemctl restart foo.service
$ systemctl show -p FreezerState foo.service

Before (cgroup v1, i.e. full "legacy" mode):
FreezerState=thawing

After:
FreezerState=running
2020-09-08 19:54:13 +02:00
Yu Watanabe 8ed6f81ba3 core: make log_unit_error() or friends return void 2020-09-09 02:34:38 +09:00
Yu Watanabe 93c5b90459 core/slice: explicitly specify return value 2020-09-09 02:34:38 +09:00
Lennart Poettering c6552f7cd5
Merge pull request #16955 from keszybz/test-execute-cleanup
One patch for test-execute and assorted cleanups
2020-09-08 18:33:12 +02:00
Zbigniew Jędrzejewski-Szmek c413bb28df tree-wide: correct cases where return log_{error,warning} is used without value
In various cases, we would say 'return log_warning()' or 'return log_error()'. Those
functions return 0 if no error is passed in. For log_warning or log_error this doesn't
make sense, and we generally want to propagate the error. In the few cases where
the error should be ignored, I think it's better to split it in two, and call 'return 0'
on a separate line.
2020-09-08 17:40:46 +02:00
Zbigniew Jędrzejewski-Szmek 90e74a66e6 tree-wide: define iterator inside of the macro 2020-09-08 12:14:05 +02:00
Zbigniew Jędrzejewski-Szmek 12375b95dd core/unit: reduce scope of variables 2020-09-08 12:07:05 +02:00
Michal Sekletár 332d387f47 core: introduce support for setting NUMAMask= to special "all" value
Fixes #14113
2020-09-08 08:16:03 +02:00
Christian Göttsche e813a74ae8 selinux: create /run/user/${USERID}/systemd with default context 2020-09-05 21:39:44 +02:00
Zbigniew Jędrzejewski-Szmek 9978e631cd core/manager: reindent table for readability 2020-09-04 18:14:26 +02:00
Zbigniew Jędrzejewski-Szmek 5b10116e49 core/{execute, manager}: reduce scope of iterator variables a bit 2020-09-04 18:14:26 +02:00
Luca Boccassi 836540070d core: add [Enable|Disable]UnitFilesWithFlags DBUS methods
The new methods work as the unflavoured ones, but takes flags as a
single uint64_t DBUS parameters instead of different booleans, so
that it can be extended without breaking backward compatibility.
Add new flag to allow adding/removing symlinks in
[/etc|/run]/systemd/system.attached so that portable services
configuration files can be self-contained in those directories, without
affecting the system services directories.
Use the new methods and flags from portablectl --enable.

Useful in case /etc is read-only, with only the portable services
directories being mounted read-write.
2020-09-04 17:56:37 +02:00
Lennart Poettering 7cc60ea414
Merge pull request #16821 from cgzones/selinux_status
selinux: use SELinux status page
2020-09-03 14:55:08 +02:00
Lennart Poettering c457bf4741
Merge pull request #16940 from keszybz/socket-enotconn-cleanup
Cleanup socket enotconn handling
2020-09-03 14:51:02 +02:00
Zbigniew Jędrzejewski-Szmek 5cf09553c3 core/socket: use _cleanup_ to close the connection fd
Removing the gotos would lead to a lot of duplicated code, so I left them
as they were.
2020-09-02 18:18:28 +02:00
Zbigniew Jędrzejewski-Szmek b669c20f97 core/socket: fold socket_instantiate_service() into socket_enter_running()
socket_instantiate_service() was doing unit_ref_set(), and the caller was
immediately doing unit_ref_unset(). After we get rid of this, it doesn't seem
worth it to have two functions.
2020-09-02 18:18:28 +02:00
Zbigniew Jędrzejewski-Szmek 86e045ecef core/socket: we may get ENOTCONN from socket_instantiate_service()
This means that the connection was aborted before we even got to figure out
what the service name will be. Let's treat this as a non-event and close the
connection fd without any further messages.

Code last changed in 934ef6a5.
Reported-by: Thiago Macieira <thiago.macieira@intel.com>

With the patch:
systemd[1]: foobar.socket: Incoming traffic
systemd[1]: foobar.socket: Got ENOTCONN on incoming socket, assuming aborted connection attempt, ignoring.
...

Also, when we get ENOMEM, don't give the hint about missing unit.
2020-09-02 18:17:30 +02:00
Zbigniew Jędrzejewski-Szmek 6ee37b1a7d
Merge pull request #16853 from poettering/udev-current-tag2
udev: make uevents "sticky"
2020-09-02 08:12:56 +02:00
Zbigniew Jędrzejewski-Szmek 47b04ef632
Merge pull request #16925 from cgzones/selinux_create_label
selinux/core: create several file objects with default SELinux context
2020-09-01 22:19:52 +02:00
Lennart Poettering 242c1c075a core: make sure to recheck current udev tag "systemd" before considering a device ready
Let's ensure that a device once tagged can become active/inactive simply
by toggling the current tag.

Note that this means that a device once tagged with "systemd" will
always have a matching .device unit. However, the active/inactive state
of the unit reflects whether it is currently tagged that way (and
doesn't have SYSTEMD_READY=0 set).

Fixes: #7587
2020-09-01 17:40:12 +02:00
Lennart Poettering 895abf3fdd
Merge pull request #16727 from wusto/core-fix-securebits
core: fix securebits setting
2020-09-01 17:21:48 +02:00
Renaud Métrich 3e5f04bf64 socket: New option 'FlushPending' (boolean) to flush socket before entering listening state
Disabled by default. When Enabled, before listening on the socket, flush the content.
Applies when Accept=no only.
2020-09-01 17:20:23 +02:00
Christian Göttsche 63e00ccd8e selinux: create /run/systemd/userdb directory and sockets with default SELinux context 2020-09-01 16:26:12 +02:00
Christian Göttsche 45ae2f725e selinux: create systemd/notify socket with default SELinux context 2020-09-01 16:25:06 +02:00
Christian Göttsche a3f5fd964b selinux: create unit invocation links with default SELinux context 2020-09-01 15:48:53 +02:00
Tobias Kaufmann dbdc4098f6 core: fix securebits setting
Desired functionality:
Set securebits for services started as non-root user.

Failure:
The starting of the service fails if no ambient capability shall be
raised.
... systemd[217941]: ...: Failed to set process secure bits: Operation
not permitted
... systemd[217941]: ...: Failed at step SECUREBITS spawning
/usr/bin/abc.service: Operation not permitted
... systemd[1]: abc.service: Failed with result 'exit-code'.

Reason:
For setting securebits the capability CAP_SETPCAP is required. However
the securebits (if no ambient capability shall be raised) are set after
setresuid.
When setresuid is invoked all capabilities are dropped from the
permitted, effective and ambient capability set. If the securebit
SECBIT_KEEP_CAPS is set the permitted capability set is retained, but
the effective and the ambient set are cleared.

If ambient capabilities shall be set, the securebit SECBIT_KEEP_CAPS is
added to the securebits configured in the service file and set together
with the securebits from the service file before setresuid is executed
(in enforce_user).
Before setresuid is executed the capabilities are the same as for pid1.
This means that all capabilities in the effective, permitted and
bounding set are set. Thus the capability CAP_SETPCAP is in the
effective set and the prctl(PR_SET_SECUREBITS, ...) succeeds.
However, if the secure bits aren't set before setresuid is invoked they
shall be set shortly after the uid change in enforce_user.
This fails as SECBIT_KEEP_CAPS wasn't set before setresuid and in
consequence the effective and permitted set was cleared, hence
CAP_SETPCAP is not set in the effective set (and cannot be raised any
longer) and prctl(PR_SET_SECUREBITS, ...) failes with EPERM.

Proposed solution:
The proposed solution consists of three parts
1. Check in enforce_user, if securebits are configured in the service
   file. If securebits are configured, set SECBIT_KEEP_CAPS
   before invoking setresuid.
2. Don't set any other securebits than SECBIT_KEEP_CAPS in enforce_user,
   but set all requested ones after enforce_user.
   This has the advantage that securebits are set at the same place for
   root and non-root services.
3. Raise CAP_SETPCAP to the effective set (if not already set) before
   setting the securebits to avoid EPERM during the prctl syscall.

For gaining CAP_SETPCAP the function capability_bounding_set_drop is
splitted into two functions:
- The first one raises CAP_SETPCAP (required for dropping bounding
  capabilities)
- The second drops the bounding capabilities

Why are ambient capabilities not affected by this change?
Ambient capabilities get cleared during setresuid, no matter if
SECBIT_KEEP_CAPS is set or not.
For raising ambient capabilities for a user different to root, the
requested capability has to be raised in the inheritable set first. Then
the SECBIT_KEEP_CAPS securebit needs to be set before setresuid is
invoked. Afterwards the ambient capability can be raised, because it is
in the inheritable and permitted set.

Security considerations:
Although the manpage is ambiguous SECBIT_KEEP_CAPS is cleared during
execve no matter if SECBIT_KEEP_CAPS_LOCKED is set or not. If both are
set only SECBIT_KEEP_CAPS_LOCKED is set after execve.
Setting SECBIT_KEEP_CAPS in enforce_user for being able to set
securebits is no security risk, as the effective and permitted set are
set to the value of the ambient set during execve (if the executed file
has no file capabilities. For details check man 7 capabilities).

Remark:
In capability-util.c is a comment complaining about the missing
capability CAP_SETPCAP in the effective set, after the kernel executed
/sbin/init. Thus it is checked there if this capability has to be raised
in the effective set before dropping capabilities from the bounding set.
If this were true all the time, ambient capabilities couldn't be set
without dropping at least one capability from the bounding set, as the
capability CAP_SETPCAP would miss and setting SECBIT_KEEP_CAPS would
fail with EPERM.
2020-09-01 10:53:26 +02:00
Anita Zhang 0419dae715
Merge pull request #16885 from keszybz/rework-cache-timestamps
Rework cache timestamps
2020-08-31 23:21:12 -07:00
Zbigniew Jędrzejewski-Szmek c2911d48ff Rework how we cache mtime to figure out if units changed
Instead of assuming that more-recently modified directories have higher mtime,
just look for any mtime changes, up or down. Since we don't want to remember
individual mtimes, hash them to obtain a single value.

This should help us behave properly in the case when the time jumps backwards
during boot: various files might have mtimes that in the future, but we won't
care. This fixes the following scenario:

We have /etc/systemd/system with T1. T1 is initially far in the past.
We have /run/systemd/generator with time T2.
The time is adjusted backwards, so T2 will be always in the future for a while.
Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'.
Nevertheless, T1 < T1' << T2.
We would consider our cache to be up-to-date, falsely.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 02103e5716 core: always try to reload not-found unit
This check was added in d904afc730. It would only
apply in the case where the cache hasn't been loaded yet. I think we pretty
much always have the cache loaded when we reach this point, but even if we
didn't, it seems better to try to reload the unit. So let's drop this check.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek c149d2b491 pid1: use the cache mtime not clock to "mark" load attempts
We really only care if the cache has been reloaded between the time when we
last attempted to load this unit and now. So instead of recording the actual
time we try to load the unit, just store the timestamp of the cache. This has
the advantage that we'll notice if the cache mtime jumps forward or backward.

Also rename fragment_loadtime to fragment_not_found_time. It only gets set when
we failed to load the unit and the old name was suggesting it is always set.

In https://bugzilla.redhat.com/show_bug.cgi?id=1871327
(and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930
and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try
to load a non-existent unit over and over from transaction_add_job_and_dependencies().
My understanding is that the clock was in the future during inital boot,
so cache_mtime is always in the future (since we don't touch the fs after initial boot),
so no matter how many times we try to load the unit and set
fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime,
so manager_unit_cache_should_retry_load() always returns true.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 81be23886d core: rename manager_unit_file_maybe_loadable_from_cache()
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
2020-08-31 20:53:38 +02:00
Lennart Poettering b519529104
Merge pull request #16841 from keszybz/acl-util-bitmask
Use a bitmask in fd_add_uid_acl_permission()
2020-08-31 16:45:13 +02:00
fangxiuning c53aafb7b5
tree-wide: drop pointless zero initialization (#16884)
tree-wide: drop pointless zero initialization
2020-08-28 17:45:54 +02:00
Lennart Poettering ae6ad21e0b device: propagate reload events from devices on everything but "add", and "remove"
Any uevent other then the initial and the last uevent we see for a
device (which is "add" and "remove") should result in a reload being
triggered, including "bind" and "unbind". Hence, let's fix up the check.

("move" is kinda a combined "remove" + "add", hence cover that too)
2020-08-28 13:30:13 +02:00
Yu Watanabe 8062e643e6 core: clear bind mounts on error
Follow-up for bbb4e7f39f.

Fixes CID#1431998.
2020-08-27 18:20:34 +09:00
Christian Göttsche 2df2152c20 selinux: fork label-aware children with up-to-date label database
The parent process may not perform any label operation, so the
database might not get updated on a SELinux policy change on its own.

Reload the label database once on a policy change, instead of n times
in every started child.
2020-08-27 10:28:53 +02:00
Christian Göttsche fd5e402fa9 selinux: use SELinux status page
Switch from security_getenforce() and netlink notifications to the
SELinux status page.

This usage saves system calls and will also be the default in
libselinux > 3.1 [1].

[1]: 05bdc03130
2020-08-27 10:28:53 +02:00
Zbigniew Jędrzejewski-Szmek 567aeb5801 shared/acl-util: convert rd,wr,ex to a bitmask
I find this version much more readable.

Add replacement defines so that when acl/libacl.h is not available, the
ACL_{READ,WRITE,EXECUTE} constants are also defined. Those constants were
declared in the kernel headers already in 1da177e4c3f41524e886b7f1b8a0c1f,
so they should be the same pretty much everywhere.
2020-08-27 10:20:12 +02:00
PhoenixDiscord e8607daf7d
Replace gendered pronouns with gender neutral ones. (#16844) 2020-08-27 11:52:48 +09:00
Lennart Poettering bbb4e7f39f core: hide /run/credentials whenever namespacing is requested
Ideally we would like to hide all other service's credentials for all
services. That would imply for us to enable mount namespacing for all
services, which is something we cannot do, both due to compatibility
with the status quo ante, and because a number of services legitimately
should be able to install mounts in the host hierarchy.

Hence we do the second best thing, we hide the credentials automatically
for all services that opt into mount namespacing otherwise. This is
quite different from other mount sandboxing options: usually you have to
explicitly opt into each. However, given that the credentials logic is a
brand new concept we invented right here and now, and particularly
security sensitive it's OK to reverse this, and by default hide
credentials whenever we can (i.e. whenever mount namespacing is
otherwise opt-ed in to).

Long story short: if you want to hide other service's credentials, the
most basic options is to just turn on PrivateMounts= and there you go,
they should all be gone.
2020-08-25 19:45:38 +02:00
Lennart Poettering bb0c0d6f29 core: add credentials logic
Fixes: #15778 #16060
2020-08-25 19:45:35 +02:00
Lennart Poettering 45374f6503
Merge pull request #15662 from Werkov/fix-cgroup-disable
Fix unsetting cgroup restrictions
2020-08-25 17:36:07 +02:00
Lennart Poettering f053c9477b core: drop redundant comment
Since 625a164069 we don't need to update
analyze-condition.c separately anymore, hence drop the comment
suggesting otherwise.
2020-08-25 07:47:50 +02:00
Lennart Poettering 4e39995371 core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs mount options
Kernel 5.8 gained a hidepid= implementation that is truly per procfs,
which allows us to mount a distinct once into every unit, with
individual hidepid= settings. Let's expose this via two new settings:
ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=).

Replaces: #11670
2020-08-24 20:11:02 +02:00
Lennart Poettering df6b900a1b namespace: assert() first, use second 2020-08-24 20:10:58 +02:00
Lennart Poettering 52b3d6523f namespace: move protect_{home|system} into NamespaceInfo
it's not entirely clear what shall be passed via parameter and what via
struct, but these two definitely fit well with the other protect_xyz
fields, hence let's move them over.

We probably should move a lot more more fields into the structure
actuall (most? all even?).
2020-08-24 20:10:30 +02:00
Lennart Poettering 9aab8d7a98
Merge pull request #16804 from keszybz/conditionals-and-spelling-fixes
Conditionals and spelling fixes
2020-08-21 13:36:30 +02:00
Zbigniew Jędrzejewski-Szmek 3fb01017ee
Merge pull request #16686 from bluca/mount_images_opts
core: add mount options support for MountImages
2020-08-21 10:11:08 +02:00
Zbigniew Jędrzejewski-Szmek 990307c3da
Merge pull request #16803 from poettering/analyze-condition-rework
support missing conditions/asserts everywhere
2020-08-20 18:18:13 +02:00
Zbigniew Jędrzejewski-Szmek 2aed63f427 tree-wide: fix spelling of "fallback"
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
2020-08-20 17:45:32 +02:00
Lennart Poettering 5b14956385
Merge pull request #16543 from poettering/nspawn-run-host
nspawn: /run/host/ tweaks
2020-08-20 16:20:05 +02:00
Luca Boccassi 427353f668 core: add mount options support for MountImages
Follow the same model established for RootImage and RootImageOptions,
and allow to either append a single list of options or tuples of
partition_number:options.
2020-08-20 14:45:40 +01:00
Luca Boccassi 9ece644435 core: change RootImageOptions to use names instead of partition numbers
Follow the designations from the Discoverable Partitions Specification
2020-08-20 13:58:02 +01:00
Luca Boccassi bc8d56d305 core: use strv_split_colon_pairs when parsing RootImageOptions 2020-08-20 13:24:32 +01:00
Luca Boccassi c20acbb2bd core: cleanup unused variables
Leftovers from previous implementation of MountImages feature, unused now
2020-08-20 13:24:32 +01:00
Lennart Poettering 476cfe626d core: remove support for ConditionNull=
The concept is flawed, and mostly useless. Let's finally remove it.

It has been deprecated since 90a2ec10f2 (6
years ago) and we started to warn since
55dadc5c57 (1.5 years ago).

Let's get rid of it altogether.
2020-08-20 14:01:25 +02:00
Lennart Poettering 4f55a5b0bf core: add missing conditions/asserts to unit file parsing 2020-08-20 13:56:14 +02:00
Zbigniew Jędrzejewski-Szmek ec673ad4ab
Merge pull request #16559 from benzea/benzea/memory-recursiveprot
mount-setup: Enable memory_recursiveprot for cgroup2
2020-08-20 13:05:07 +02:00
Lennart Poettering 3242980582 core: create per-user inaccessible node from the service manager
Previously, we'd create them from user-runtime-dir@.service. That has
one benefit: since this service runs privileged, we can create the full
set of device nodes. It has one major drawback though: it security-wise
problematic to create files/directories in directories as privileged
user in directories owned by unprivileged users, since they can use
symlinks to redirect what we want to do. As a general rule we hence
avoid this logic: only unpriv code should populate unpriv directories.

Hence, let's move this code to an appropriate place in the service
manager. This means we lose the inaccessible block device node, but
since there's already a fallback in place, this shouldn't be too bad.
2020-08-20 10:18:02 +02:00
Lennart Poettering 9fac502920 nspawn,pid1: pass "inaccessible" nodes from cntr mgr to pid1 payload via /run/host
Let's make /run/host the sole place we pass stuff from host to container
in and place the "inaccessible" nodes in /run/host too.

In contrast to the previous two commits this is a minor compat break, but
not a relevant one I think. Previously the container manager would place
these nodes in /run/systemd/inaccessible/ and that's where PID 1 in the
container would try to add them too when missing. Container manager and
PID 1 in the container would thus manage the same dir together.

With this change the container manager now passes an immutable directory
to the container and leaves /run/systemd entirely untouched, and managed
exclusively by PID 1 inside the container, which is nice to have clear
separation on who manages what.

In order to make sure systemd then usses the /run/host/inaccesible/
nodes this commit changes PID 1 to look for that dir and if it exists
will symlink it to /run/systemd/inaccessible.

Now, this will work fine if new nspawn and new pid 1 in the container
work together. as then the symlink is created and the difference between
the two dirs won't matter.

For the case where an old nspawn invokes a new PID 1: in this case
things work as they always worked: the dir is managed together.

For the case where different container manager invokes a new PID 1: in
this case the nodes aren't typically passed in, and PID 1 in the
container will try to create them and will likely fail partially (though
gracefully) when trying to create char/block device nodes. THis is fine
though as there are fallbacks in place for that case.

For the case where a new nspawn invokes an old PID1: this is were the
(minor) incompatibily happens: in this case new nspawn will place the
nodes in the /run/host/inaccessible/ subdir, but the PID 1 in the
container won't look for them there. Since the nodes are also not
pre-created in /run/systed/inaccessible/ PID 1 will try to create them
there as if a different container manager sets them up. This is of
course not sexy, but is not a total loss, since as mentioned fallbacks
are in place anyway. Hence I think it's OK to accept this minor
incompatibility.
2020-08-20 10:17:52 +02:00
Zbigniew Jędrzejewski-Szmek 2eecdd1d69
Merge pull request #16790 from poettering/core-if-block-merge
core: merge a few if blocks
2020-08-20 10:15:01 +02:00
Lennart Poettering 1f894e682c machine-id-setup: don't use KVM or container manager supplied uuid if in chroot env
Fixes: #16758
2020-08-19 18:23:11 +02:00
Lennart Poettering 4428c49db9 mount-setup: drop pointless zero initialization 2020-08-19 18:11:00 +02:00
Lennart Poettering 3196e42393 core: merge a few if blocks
arg_system == true and getpid() == 1 hold under the very same condition
this early in the main() function (this only changes later when we start
parsing command lines, where arg_system = true is set if users invoke us
in test mode even when getpid() != 1.

Hence, let's simplify things, and merge a couple of if branches and not
pretend they were orthogonal.
2020-08-19 18:06:12 +02:00
Michal Koutný d9ef594454 cgroup: Cleanup function usage
Some masks shouldn't be needed externally, so keep their functions in
the module (others would fit there too but they're used in tests) to
think twice if something would depend on them.

Drop unused function cg_attach_many_everywhere.

Use cgroup_realized instead of cgroup_path when we actually ask for
realized.

This should not cause any functional changes.
2020-08-19 11:41:53 +02:00
Michal Koutný 12b975e065 cgroup: Reduce unit_get_ancestor_disable_mask use
The usage in unit_get_own_mask is redundant, we only need apply
disable_mask at the end befor application, i.e. calculating enable or
target mask.

(IOW, we allow all configurations, but disabling affects effective
controls.)

Modify tests accordingly and add testing of enable mask.

This is intended as cleanup, with no effect but changing unit_dump
output.
2020-08-19 11:41:53 +02:00
Michal Koutný 4c591f3996 cgroup: Introduce family queueing instead of siblings
The unit_add_siblings_to_cgroup_realize_queue does more than mere
siblings queueing, hence define a family of a unit as (immediate)
children of the unit and immediate children of all ancestors.

Working with this abstraction simplifies the queuing calls and it
shouldn't change the functionality.
2020-08-19 11:41:53 +02:00
Michal Koutný f23ba94db3 cgroup: Implicit unit_invalidate_cgroup_members_masks
Merge members mask invalidation into
unit_add_siblings_to_cgroup_realize_queue, this way unit_realize_cgroup
needn't be called with members mask invalidation.

We have to retain the members mask invalidation in unit_load -- although
active units would have cgroups (re)realized (unit_load queues for
realization), the realization would happen with potentially stale mask.
2020-08-19 11:41:53 +02:00
Michal Koutný fb46fca7e0 cgroup: Eager realization in unit_free
unit_free(u) realizes direct parent and invalidates members mask of all
ancestors. This isn't sufficient in v1 controller hierarchies since
siblings of the freed unit may have existed only because of the removed
unit.

We cannot be lazy about the siblings because if parent(u) is also
removed, it'd migrate and rmdir cgroups for siblings(u). However,
realized masks of siblings(u) won't reflect this change.

This was a non-issue earlier, because we weren't removing cgroup
directories properly (effectively matching the stale realized mask),
removal failed because of tasks left by missing migration (see previous
commit).

Therefore, ensure realization of all units necessary to clean up after
the free'd unit.

Fixes: #14149
2020-08-19 11:41:53 +02:00
Michal Koutný 7b63961415 cgroup: Swap cgroup v1 deletion and migration
When we are about to derealize a controller on v1 cgroup, we first
attempt to delete the controller cgroup and migrate afterwards. This
doesn't work in practice because populated cgroup cannot be deleted.
Furthermore, we leave out slices from migration completely, so
(un)setting a control value on them won't realize their controller
cgroup.

Rework actual realization, unit_create_cgroup() becomes
unit_update_cgroup() and make sure that controller hierarchies are
reduced when given controller cgroup ceased to be needed.

Note that with this we introduce slight deviation between v1 and v2 code
-- when a descendant unit turns off a delegated controller, we attempt
to disable it in ancestor slices. On v2 this may fail (kernel enforced,
because of child cgroups using the controller), on v1 we'll migrate
whole subtree and trim the subhierachy. (Previously, we wouldn't take
away delegated controller, however, derealization was broken anyway.)

Fixes: #14149
2020-08-19 11:41:53 +02:00
Benjamin Berg 56f47800d8 mount-setup: Enable memory_recursiveprot for cgroup2
When available, enable memory_recursiveprot. Realistically it always
makes sense to delegate MemoryLow= and MemoryMin= to all children of a
slice/unit.

The kernel option is not enabled by default as it might cause
regressions in some setups. However, it is the better default in
general, and it results in a more flexible and obvious behaviour.

The alternative to using this option would be for user's to also set
DefaultMemoryLow= on slices when assigning MemoryLow=. However, this
makes the effect of MemoryLow= on some children less obvious, as it
could result in a lower protection rather than increasing it.

From the kernel documentation:

  memory_recursiveprot

        Recursively apply memory.min and memory.low protection to
        entire subtrees, without requiring explicit downward
        propagation into leaf cgroups.  This allows protecting entire
        subtrees from one another, while retaining free competition
        within those subtrees.  This should have been the default
        behavior but is a mount-option to avoid regressing setups
        relying on the original semantics (e.g. specifying bogusly
        high 'bypass' protection values at higher tree levels).

This was added in kernel commit 8a931f801340c (mm: memcontrol:
recursive memory.low protection), which became available in 5.7 and was
subsequently fixed in kernel 5.7.7 (mm: memcontrol: handle div0 crash
race condition in memory.low).
2020-08-19 11:17:01 +02:00
Alyssa Ross 556a7bbed6
load-fragment: fix grammar in error messages 2020-08-18 20:56:59 +00:00
Lennart Poettering 3f181262f4 namespace: fix minor memory leak 2020-08-14 15:33:04 +02:00
Lennart Poettering 0a388dfcc5 core,home,machined: generate description fields for all groups we synthesize 2020-08-07 08:39:52 +02:00
Luca Boccassi b3d133148e core: new feature MountImages
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.

Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
2020-08-05 21:34:55 +01:00
Axel Rasmussen a119185c02 selinux: improve comment about getcon_raw semantics
This code was changed in this pull request:
https://github.com/systemd/systemd/pull/16571

After some discussion and more investigation, we better understand
what's going on. So, update the comment, so things are more clear
to future readers.
2020-08-05 20:20:45 +02:00
Zbigniew Jędrzejewski-Szmek d06bd2e785 Merge pull request #16596 from poettering/event-time-rel
Conflict in src/libsystemd-network/test-ndisc-rs.c fixed manually.
2020-08-04 16:07:03 +02:00
Zbigniew Jędrzejewski-Szmek 94efaa3181 core: reset bus error before reuse
From a report in https://bugzilla.redhat.com/show_bug.cgi?id=1861463:
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Trying to enqueue job usb-gadget.target/start/fail
usb-gadget.target: Failed to load configuration: No such file or directory
Assertion '!bus_error_is_dirty(e)' failed at src/libsystemd/sd-bus/bus-error.c:239, function bus_error_setfv(). Ignoring.
sys-devices-platform-soc-2100000.bus-2184000.usb-ci_hdrc.0-udc-ci_hdrc.0.device: Failed to enqueue SYSTEMD_WANTS= job, ignoring: Unit usb-gadget.target not found.

I *think* this is the place where the reuse occurs: we call
bus_unit_validate_load_state(unit, e) twice in a row.
2020-08-03 17:54:32 +02:00
Zbigniew Jędrzejewski-Szmek 7e62257219
Merge pull request #16308 from bluca/root_image_options
service: add new RootImageOptions feature
2020-08-03 10:04:36 +02:00
Zbigniew Jędrzejewski-Szmek b67ec8e5b2 pid1: stop limiting size of /dev/shm
The explicit limit is dropped, which means that we return to the kernel default
of 50% of RAM. See 362a55fc14 for a discussion why that is not as much as it
seems. It turns out various applications need more space in /dev/shm and we
would break them by imposing a low limit.

While at it, rename the define and use a single macro for various tmpfs mounts.
We don't really care what the purpose of the given tmpfs is, so it seems
reasonable to use a single macro.

This effectively reverts part of 7d85383edb. Fixes #16617.
2020-07-30 18:48:35 +02:00
Luca Boccassi 18d7370587 service: add new RootImageOptions feature
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:

RootImageOptions=1:ro,dev 2:nosuid nodev

In absence of a partition number, 0 is assumed.
2020-07-29 17:17:32 +01:00
Michal Koutný 30ad3ca086 cgroup: Add root slice to cgroup realization queue
When we're disabling controller on a direct child of root cgroup, we
forgot to add root slice into cgroup realization queue, which prevented
proper disabling of the controller (on unified hierarchy).

The mechanism relying on "bounce from bottom and propagate up" in
unit_create_cgroup doesn't work on unified hierarchy (leaves needn't be
enabled). Drop it as we rely on the ancestors to be queued -- that's now
intentional but was artifact of combining the two patches:

cb5e3bc37d ("cgroup: Don't explicitly check for member in UNIT_BEFORE") v240~78
65f6b6bdcb ("core: fix re-realization of cgroup siblings") v245-rc1~153^2

Fixes: #14917
2020-07-28 15:49:24 +02:00
Michal Koutný a479c21ed2 cgroup: Make realize_queue behave FIFO
The current implementation is LIFO, which is a) confusing b) prevents
some ordered operations on the cgroup tree (e.g. removing children
before parents).

Fix it quickly. Current list implementation turns this from O(1) to O(n)
operation. Rework the lists later.
2020-07-28 15:49:24 +02:00
Lennart Poettering 39cf0351c5 tree-wide: make use of new relative time events in sd-event.h 2020-07-28 11:24:55 +02:00
Axel Rasmussen 199a892218 selinux: handle getcon_raw producing a NULL pointer, despite returning 0
Previously, we assumed that success meant we definitely got a valid
pointer. There is at least one edge case where this is not true (i.e.,
we can get both a 0 return value, and *also* a NULL pointer):
4246bb550d/libselinux/src/procattr.c (L175)

When this case occurrs, if we don't check the pointer we SIGSEGV in
early initialization.
2020-07-24 13:34:27 +09:00
Lennart Poettering 8047ac8fdc core: clean more env vars from env block pid1 receives
We generally clean all env vars we use ourselves to communicate with out
childrens. We forgot some more recent additions however. Let's correct
that.
2020-07-23 18:30:15 +02:00
Lennart Poettering 00b868e857
Merge pull request #16542 from keszybz/make-targets-fail-again
Make targets fail again
2020-07-23 08:37:47 +02:00
Lennart Poettering c3f8a065e9 execute: take ownership of more fields in ExecParameters
Let's simplify things a bit, and take ownership of more fields in
ExecParameters, so that they are automatically freed when the structure
is released.
2020-07-23 08:37:21 +02:00
Zbigniew Jędrzejewski-Szmek 94d1ddbd7c pid1: target units can fail through dependencies
Fixes #16401.

c80a9a33d0 introduced the .can_fail field,
but didn't set it on .targets. Targets can fail through dependencies.
This leaves .slice and .device units as the types that cannot fail.

$ systemctl cat bad.service bad.target bad-fallback.service
[Service]
Type=oneshot
ExecStart=false

[Unit]
OnFailure=bad-fallback.service

[Service]
Type=oneshot
ExecStart=echo Fixing everythign!

$ sudo systemctl start bad.target
systemd[1]: Starting bad.service...
systemd[1]: bad.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: bad.service: Failed with result 'exit-code'.
systemd[1]: Failed to start bad.service.
systemd[1]: Dependency failed for bad.target.
systemd[1]: bad.target: Job bad.target/start failed with result 'dependency'.
systemd[1]: bad.target: Triggering OnFailure= dependencies.
systemd[1]: Starting bad-fallback.service...
echo[46901]: Fixing everythign!
systemd[1]: bad-fallback.service: Succeeded.
systemd[1]: Finished bad-fallback.service.
2020-07-22 17:58:12 +02:00
Zbigniew Jędrzejewski-Szmek 771b52427a core/job: adjust whitespace and comment 2020-07-22 17:58:12 +02:00
Lennart Poettering 58afc4f8e4 core: don't acquire dual timestamp needlessly if we don't need it in .timer handling
Follow-up for: 26698337f3
2020-07-21 17:33:47 +02:00
Yu Watanabe 9e54462cd5
Merge pull request #16482 from poettering/coverity-246
two coverity fixes
2020-07-16 20:23:23 +09:00
Lennart Poettering 3cd4459003 Revert "selinux: cache enforced status and treat retrieve failure as enforced mode"
This reverts commit 257188f80c.
2020-07-16 08:49:35 +02:00
Lennart Poettering f63ef93703 execute: fix if check
Fixes: coverity 1430459
2020-07-16 08:35:18 +02:00
Lennart Poettering 330f899079 load-fragment: downgrade log messages we ignore to LOG_WARNING
We typically don't log above LOG_WARNING about issues we then go on to
ignore. Do so here, too
2020-07-16 14:58:05 +09:00
Lennart Poettering 8d5bb13d78 core: fix invalid assertion
We miscounted here, and would hit an assert once too early.
2020-07-16 09:13:04 +09:00
Filipe Brandenburger 26698337f3 timer: Adjust calendar timers based on monotonic timer instead of realtime
When the RTC time at boot is off in the future by a few days, OnCalendar=
timers will be scheduled based on the time at boot. But if the time has been
adjusted since boot, the timers will end up scheduled way in the future, which
may cause them not to fire as shortly or often as expected.

Update the logic so that the time will be adjusted based on monotonic time.
We do that by calculating the adjusted manager startup realtime from the
monotonic time stored at that time, by comparing that time with the realtime
and monotonic time of the current time.

Added a test case to validate this works as expected. The test case creates a
QEMU virtual machine with the clock 3 days in the future. Then we adjust the
clock back 3 days, and test creating a timer with an OnCalendar= for every 15
minutes. We also check the manager startup timestamp from both `systemd-analyze
dump` and from D-Bus.

Test output without the corresponding code changes that fix the issue:

  Timer elapse outside of the expected 20 minute window.
    next_elapsed=1594686119
    now=1594426921
    time_delta=259198

With the code changes in, the test passes as expected.
2020-07-15 09:23:09 +02:00
Zbigniew Jędrzejewski-Szmek 76830e2500
Merge pull request #16462 from keszybz/rpm-macro-warnings
Emit better errors for rpm macro misuse
2020-07-15 08:56:28 +02:00
Zbigniew Jędrzejewski-Szmek 6cdc429454
Merge pull request #16340 from keszybz/var-tmp-readonly
Create ro private /var/tmp dir when /var/tmp is read-only
2020-07-14 19:59:48 +02:00
Zbigniew Jędrzejewski-Szmek 56a13a495c pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only
Read-only /var/tmp is more likely, because it's backed by a real device. /tmp
is (by default) backed by tmpfs, but it doesn't have to be. In both cases the
same consideration applies.

If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail
because we cannot create the subdir under /var/tmp to mount the private directory.
But many services actually don't require /var/tmp (either because they only use
it occasionally, or because they only use /tmp, or even because they don't use the
temporary directories at all, and PrivateTmp=yes is used to isolate them from
the rest of the system).

To handle both cases let's create a read-only directory under /run/systemd and
mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into
dumping too much data in /run.)

$ sudo systemd-run -t -p PrivateTmp=yes bash
Running as unit: run-u14.service
Press ^] three times within 1s to disconnect TTY.
[root@workstation /]# ls -l /tmp/
total 0
[root@workstation /]# ls -l /var/tmp/
total 0
[root@workstation /]# touch /tmp/f
[root@workstation /]# touch /var/tmp/f
touch: cannot touch '/var/tmp/f': Read-only file system

This commit has more changes than I like to put in one commit, but it's touching all
the same paths so it's hard to split.
exec_runtime_make() was using the wrong cleanup function, so the directory would be
left behind on error.
2020-07-14 19:47:15 +02:00
Zbigniew Jędrzejewski-Szmek 1061fc1c17 rpm: include macro name in errors for two args macros too 2020-07-14 19:22:42 +02:00
Zbigniew Jędrzejewski-Szmek 281014b73e rpm: adjust various macros to print macro name in the error message
Based on initial patch by Jan Engelhardt <jengelh@inai.de>.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856122.
2020-07-14 19:21:12 +02:00
Zbigniew Jędrzejewski-Szmek 8800df5f71
Merge pull request #16430 from mikhailnov/fix-rpm-create-package-macros
Fix RPM *_create_package macros
2020-07-14 19:02:09 +02:00
Christian Göttsche f2df56bfea namespace: unify logging in mount_tmpfs
Fixes: abad72be4d
Follow up: #16426
2020-07-11 21:25:39 +02:00
Mikhail Novosyolov 3e6e0856cd rpm: avoid hiding errors and output in *_create_package macros
Commit b0ca726585 "rpm: avoid hiding errors from systemd commands" remove hiding errors and output
for other macros, but did not do that for %sysusers_create_package and %tmpfiles_create_package.

This change syncs their behaviour with %sysusers_create and %tmpfiles_create

Signed-off-by: Mikhail Novosyolov <m.novosyolov@rosalinux.ru>
2020-07-11 17:20:23 +03:00
Mikhail Novosyolov 93406fd379 rpm: avoid odd symbols in EOF indicator
The last line in this macros was actually "SYSTEMD_INLINE_EOF " with a space at the end,
but the shell was instructed to look for a line without space.

Macros %sysusers_create_inline and %tmpfiles_create_inline did not have this mistake.

An example:
[root@rosa-2019 bind-server]# cat /etc/passwd | grep named
[root@rosa-2019 bind-server]# cat /tmp/bs
systemd-sysusers --replace=/usr/lib/sysusers.d/named.conf - <<SYSTEMD_INLINE_EOF >/dev/null 2>&1 || :
u named - "BIND DNS Server" /var/lib/named
g named - -
m named named
SYSTEMD_INLINE_EOF
[root@rosa-2019 bind-server]# sh /tmp/bs
/tmp/bs: line 5: warning: here-document at line 1 delimited by end-of-file (wanted `SYSTEMD_INLINE_EOF')
[root@rosa-2019 bind-server]# bash /tmp/bs
/tmp/bs: line 5: warning: here-document at line 1 delimited by end-of-file (wanted `SYSTEMD_INLINE_EOF')
[root@rosa-2019 bind-server]# bash --version
GNU bash, version 5.0.17(1)-release (x86_64-openmandriva-linux-gnu)

The user and group named were NOT created!

Now I remove the trailing space after "SYSTEMD_INLINE_EOF" and rerun:
[root@rosa-2019 bind-server]# sh /tmp/bs
[root@rosa-2019 bind-server]# tail -n 1 /etc/group
named485:named
[root@rosa-2019 bind-server]#

The user and group have been created correctly.

Signed-off-by: Mikhail Novosyolov <m.novosyolov@rosalinux.ru>
2020-07-11 17:20:16 +03:00
Christian Göttsche abad72be4d namespace: fix MAC labels of TemporaryFileSystem=
Reproducible with:
  systemd-run -p TemporaryFileSystem=/root -t /bin/bash
    ls -dZ /root

Prior:
  root:object_r:tmpfs_t:s0 /root
Past:
  root:object_r:user_home_dir_t:s0 /root
2020-07-11 00:09:05 +02:00
Zbigniew Jędrzejewski-Szmek 02b0109af5
Merge pull request #15955 from anitazha/nullorempty
core: check null_or_empty_path for masked units instead of /dev/null
2020-07-08 22:18:17 +02:00
Zbigniew Jędrzejewski-Szmek cbc056c819 core: wrap some long lines and other formatting changes 2020-07-08 16:37:23 +02:00
Alan Perry 5dc60faae5 add error message when bind mount src missing 2020-07-07 20:04:19 +02:00
Zbigniew Jędrzejewski-Szmek 2b0bf3ccf8
Merge pull request #16301 from poettering/firstboot-image
Add --image= switch to firstboot, similar to --root= but with support for operating on disk image
2020-07-07 19:44:12 +02:00
Zbigniew Jędrzejewski-Szmek cd990847b9 tree-wide: more repeated words 2020-07-07 12:08:22 +02:00
Lennart Poettering e2ec9c4d3a namespace-util: introduce helper for combining unshare() + MS_SLAVE remount
We have multiple places we do these two non-trivial operations together,
let's introduce a unified helper for doing both at once.
2020-07-07 11:20:42 +02:00
Luca Boccassi cda667722c core: refresh unit cache when building a transaction if UNIT_NOT_FOUND
When a command asks to load a unit directly and it is in state
UNIT_NOT_FOUND, and the cache is outdated, we refresh it and
attempto to load again.
Use the same logic when building up a transaction and a dependency in
UNIT_NOT_FOUND state is encountered.
Update the unit test to exercise this code path.
2020-07-07 10:09:24 +02:00
Dan Callaghan 2fadbb4535 core: set private section name for automount units
Because this was left unset, the unit_write_setting() function was
refusing to write out the automount-specific TimeoutIdleSec= and
DirectoryMode= settings when creating transient automount units.
Set it to the proper value in line with other unit types.
2020-07-04 18:48:36 +02:00
gzjsgdsb 33d943d168 initialize arg_clock_usec 2020-07-03 14:52:20 +02:00
Anita Zhang 640f3b143d core: check null_or_empty for masked units instead of /dev/null
There's some inconsistency in the what is considered a masked unit:
some places (i.e. load-fragment.c) use `null_or_empty()` while others
check if the file path is symlinked to "/dev/null". Since the latter
doesn't account for things like non-absolute symlinks to "/dev/null",
this commit switches the check for "/dev/null" to use `null_or_empty_path()`
2020-07-03 02:33:50 -07:00
Zbigniew Jędrzejewski-Szmek 15e6a6e87b tree-wide: spell "lifecycle" without hyphen everywhere
We had 2 more instances of unhyphentated spelling.
2020-07-02 09:55:44 +02:00
Zbigniew Jędrzejewski-Szmek 37b22b3b47 tree: wide "the the" and other trivial grammar fixes 2020-07-02 09:51:38 +02:00
Yu Watanabe 9457b6bb21
Merge pull request #16303 from poettering/dbus-util-split
shared: split src/shared/bus-util.c into multiple files
2020-07-01 14:15:40 +09:00
Luca Boccassi 7233e91af0 core: store timestamps of unit load attempts
When the system is under heavy load, it can happen that the unit cache
is refreshed for an unrelated reason (in the test I simulate this by
attempting to start a non-existing unit). The new unit is found and
accounted for in the cache, but it's ignored since we are loading
something else.
When we actually look for it, by attempting to start it, the cache is
up to date so no refresh happens, and starting fails although we have
it loaded in the cache.

When the unit state is set to UNIT_NOT_FOUND, mark the timestamp in
u->fragment_loadtime. Then when attempting to load again we can check
both if the cache itself needs a refresh, OR if it was refreshed AFTER
the last failed attempt that resulted in the state being
UNIT_NOT_FOUND.

Update the test so that this issue reproduces more often.
2020-06-30 16:50:00 +02:00
Lennart Poettering 40af3d020f shared: split out property get helpers
No code changes, just some refactoring.
2020-06-30 15:10:17 +02:00
Lennart Poettering c664cf5607 shared: split out BusObjectImplementor APIs
Just some refactoring, no code changes
2020-06-30 15:08:35 +02:00
Zbigniew Jędrzejewski-Szmek 0e31a6c2ad
Merge pull request #16142 from poettering/random-seed-cmdline
pid1: add support for allowing to pass in random seed via kernel cmdline
2020-06-26 22:42:51 +02:00
Lennart Poettering bed0b7dfc0
pid1: warn if people use User=nobody (#16293) 2020-06-26 22:36:39 +02:00
Luca Boccassi 0cffae953a core: add device mapper to allow-list with DevicePolicy=closed and RootImage
To set up a verity/cryptsetup RootImage the forked child needs to
ioctl /dev/mapper/control and create a new mapper.
If PrivateDevices=yes and/or DevicePolicy=closed are used, this is
blocked by the cgroup setting, so add an exception like it's done
for loop devices (and also add a dependency on the kernel modules
implementing them).
2020-06-26 18:39:45 +02:00
Zbigniew Jędrzejewski-Szmek 98506a41fe
Merge pull request #15697 from OhNoMoreGit/fix-path-units
Recheck PathExists=, PathExistsGlob=, DirectoryNotEmpty= when triggered unit terminates
2020-06-25 18:23:47 +02:00
Luca Boccassi d4d55b0d13 core: add RootHashSignature service parameter
Allow to explicitly pass root hash signature as a unit option. Takes precedence
over implicit checks.
2020-06-25 08:45:21 +01:00
Luca Boccassi c2923fdcd7 dissect/nspawn: add support for dm-verity root hash signature
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it.
2020-06-25 08:45:21 +01:00
Zbigniew Jędrzejewski-Szmek e60d3b13df
Merge pull request #16265 from Werkov/fix-16248
cgroup: Parse infinity properly for memory protections
2020-06-25 09:25:18 +02:00
Lennart Poettering 6b000af4f2 tree-wide: avoid some loaded terms
https://tools.ietf.org/html/draft-knodel-terminology-02
https://lwn.net/Articles/823224/

This gets rid of most but not occasions of these loaded terms:

1. scsi_id and friends are something that is supposed to be removed from
   our tree (see #7594)

2. The test suite defines an API used by the ubuntu CI. We can remove
   this too later, but this needs to be done in sync with the ubuntu CI.

3. In some cases the terms are part of APIs we call or where we expose
   concepts the kernel names the way it names them. (In particular all
   remaining uses of the word "slave" in our codebase are like this,
   it's used by the POSIX PTY layer, by the network subsystem, the mount
   API and the block device subsystem). Getting rid of the term in these
   contexts would mean doing some major fixes of the kernel ABI first.

Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
2020-06-25 09:00:19 +02:00
Michal Koutný 67e2baff6b cgroup: Parse infinity properly for memory protections
This fixes commit db2b8d2e28 that
rectified parsing empty values but broke parsing explicit infinity.
Intended parsing semantics will be captured in a testcase in a follow up
commit.

Ref: #16248
2020-06-24 23:16:06 +02:00
Zbigniew Jędrzejewski-Szmek f83803a649
Merge pull request #16238 from keszybz/set-handling-more
Fix handling of cases where a duplicate item is added to a set and related cleanups
2020-06-24 17:42:13 +02:00
Lennart Poettering d247f232a8 core: add new systemd.random_seed= kernel command line option for seeding entropy pool
This is useful in test environments with entropy starved VMs.
2020-06-24 15:33:44 +02:00
Lennart Poettering 4dd055f907 random-util: add common helper random_write_entropy() for crediting entropy to the kernel's pool 2020-06-24 15:33:27 +02:00
Lennart Poettering 45250e66cc pid1: don't apply "systemd.clock_usec" kernel cmdline parameter outside of PID 1 2020-06-24 15:33:22 +02:00
Zbigniew Jędrzejewski-Szmek d02fd8b1c6 core/bpf-firewall: use the correct cleanup function
On error, we'd just free the object, and not close the fd.

While at it, let's use set_ensure_consume() to make sure we don't leak
the object if it was already in the set. I'm not sure if that condition
can be achieved.
2020-06-24 10:38:15 +02:00
Jay Burger a1ba8c5b71 feature to honor first shutdown request to completion
Create unit tests per established norm at position 52

check in_set first before getting unit
2020-06-24 09:42:01 +02:00
Christian Göttsche a9ba0e328f Make failures of mac_selinux_init() fatal 2020-06-23 19:10:07 +02:00
Christian Göttsche a11bfc17dc Initialize SELinux in user instances
Call mac_selinux_init() to setup the label cache, so objects can be
created with default SELinux context.

Fixes: #8004
2020-06-23 19:10:03 +02:00
Zbigniew Jędrzejewski-Szmek 311a0e2ee6 Revert "cgroup: Allow empty assignments of Memory{Low,Min}="
This reverts commit 53aa85af24.
The reason is that that patch changes the dbus api to be different than
the types declared by introspection api.

Replaces #16122.
2020-06-23 16:54:23 +02:00
Dave Reisner cc479760b4 Revert "job: Don't mark as redundant if deps are relevant"
This reverts commit 097537f07a.

At least Fedora and Debian have already reverted this at the distro
level because it causes more problems than it solves. Arch is debating
reverting it as well [0] but would strongly prefer that this happens
upstream first. Fixes #15188.

[0] https://bugs.archlinux.org/task/66458
2020-06-23 11:42:45 +02:00
Luca Boccassi 0389f4fa81 core: add RootHash and RootVerity service parameters
Allow to explicitly pass root hash (explicitly or as a file) and verity
device/file as unit options. Take precedence over implicit checks.
2020-06-23 10:50:09 +02:00
Zbigniew Jędrzejewski-Szmek de7fef4b6e tree-wide: use set_ensure_put()
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.

Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
2020-06-22 16:32:37 +02:00