Commit Graph

663 Commits

Author SHA1 Message Date
Michal Sekletár d9e45bc3ab core: introduce support for cgroup freezer
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
2020-04-30 19:02:51 +02:00
Frantisek Sumsal 86b52a3958 tree-wide: fix spelling errors
Based on a report from Fossies.org using Codespell.

Followup to #15436
2020-04-21 23:21:08 +02:00
Lennart Poettering 33b58dfb41 core: automatically add udev dependency for units using RootImage=
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.

Fixes: #14972
2020-04-21 16:31:06 +02:00
Zbigniew Jędrzejewski-Szmek 162392b75a tree-wide: spellcheck using codespell
Fixes #15436.
2020-04-16 18:00:40 +02:00
Zbigniew Jędrzejewski-Szmek a1a11d5610
Merge pull request #15365 from poettering/remount-fs-pstore-fix
pid1: automatically generate systemd-remount-fs.service deps, plus enable systemd-pstore from sysinit.target
2020-04-12 17:16:07 +02:00
Lennart Poettering 9b3c65ed36
Merge pull request #15352 from poettering/user-group-name-valdity-rework
user/group name validity rework
2020-04-09 18:49:22 +02:00
Lennart Poettering 611cb82612
Merge pull request #15318 from fbuihuu/inherit-umask-for-user-units
pid1: by default make user units inherit their umask from the user ma…
2020-04-09 17:15:55 +02:00
Franck Bui 5e37d1930b pid1: by default make user units inherit their umask from the user manager
This patch changes the way user managers set the default umask for the units it
manages.

Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.

Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.

Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.

Fixes #6077.
2020-04-09 14:17:07 +02:00
Lennart Poettering 7a8867abfa user-util: rework how we validate user names
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.

The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)

The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…

This effectively liberaralizes a lot what we expect from usernames.

The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.

Fixes: #15149 #15090
2020-04-08 17:11:20 +02:00
Lennart Poettering f3b7a79b97 core: automatically add dependency on systemd-remount-fs.service if StateDirectory= is used
And similar for other settings that require a writable /var/.

Rationale: if these options are used for early-boot services (such as
systemd-pstore.service) we need /var/ writable. And if /var/ is on the
root fs, then systemd-remount-fs.service is the service that ensures
that /var/ is writable.

This allows us to remove explicit deps in services such as
systemd-pstore.service.
2020-04-08 16:29:25 +02:00
Wen Yang acd1987a18 core/unit: print info when unit_add_name failed
When there are hundreds of mounts on the server, it will take a long
time to analyze the failure of a certain mount unit. So it is useful
to print the reason why unit_add_name() failed.
2020-03-27 18:46:13 +01:00
Zbigniew Jędrzejewski-Szmek 5bcf34ebf3 pid1: when showing error status, do not switch to status=temporary
We would flip to status=temporary mode on the first error, and then switch back
to status=auto after the initial transaction was done. This isn't very useful,
because usually all the messages about successfully started units and not
related to the original failure. In fact, all those messages most likely cause
the information about the prime error to scroll off screen. And if the user
requested quiet boot, there's no reason to think that they care about those
success messages.

Also, when logging about dependency cycles, treat this similarly to a unit
error and show the message even if the status is "soft disabled" (before we
wouldn't show it in that case).
2020-03-01 11:42:42 +01:00
Zbigniew Jędrzejewski-Szmek 5650ec7a11
Merge pull request #14156 from fbuihuu/deal-with-aliases-when-disabling
Consider aliases in /usr when disabling units
2020-02-06 10:46:21 +01:00
Christian Göttsche f156e60c66 core: unit_label_path(): take const unit 2020-02-04 18:36:19 +01:00
Zbigniew Jędrzejewski-Szmek dc9fd22d3d Merge pull request #14398 from poettering/mount-prep 2020-02-04 16:28:51 +01:00
Lennart Poettering dc5437c78b journald: add ability to activate by varlink socket
If we have exit on idle, then operations such as "journalctl
--namespace=foo --rotate" should work even if the journal daemon is
currently not running.

(Note that we don't do activation by varlink for the main instance of
journald, I am not sure the deadlocks it might introduce are worth it)
2020-01-31 15:03:55 +01:00
Lennart Poettering 91dd5f7cbe core: add new LogNamespace= execution setting 2020-01-31 15:01:43 +01:00
Kevin Kuehler fc64760dda core: shared: Add ProtectClock= to systemd.exec 2020-01-26 12:23:33 -08:00
Lennart Poettering 44b0d1fd59 core: add implicit ordering dep on blockdev@.target from all mount units
This way we shuld be able to order mounts properly against their backing
services in case complex storage is used (i.e. LUKS), even if the device
path used for mounting the devices is different from the expected device
node of the backing service.

Specifically, if we have a LUKS device /dev/mapper/foo that is mounted
by this name all is trivial as the relationship can be established a
priori easily. But if it is mounted via a /dev/disk/by-uuid/ symlink or
similar we only can relate the device node generated to the one mounted
at the moment the device is actually established. That's because the
UUID of the fs is stored inside the encrypted volume and thus not
knowable until the volume is set up. This patch tries to improve on this
situation: a implicit After=blockdev@.target dependency is generated for
all mounts, based on the data from /proc/self/mountinfo, which should be
the actual device node, with all symlinks resolved. This means that as
soon as the mount is established the ordering via blockdev@.target will
work, and that means during shutdown it is honoured, which is what we
are looking for.

Note that specifying /etc/fstab entries via UUID= for LUKS devices still
sucks and shouldn't be done, because it means we cannot know which LUKS
device to activate to make an fs appear, and that means unless the
volume is set up at boot anyway we can't really handle things
automatically when putting together transactions that need the mount.
2020-01-21 20:23:44 +01:00
Lennart Poettering b90cf10245 core: make a number of functions not used externally static 2020-01-21 11:51:45 +01:00
Lennart Poettering eea45a3399
Merge pull request #14424 from poettering/watch-bus-name-rework
pid1: simplify drastically how we watch bus names for service's BusName= setting
2020-01-15 11:46:11 +01:00
Franck Bui 29a743f993 core: explicit mention of unit ID is redundant with log_unit_*() 2020-01-10 14:20:28 +01:00
Lennart Poettering c80a9a33d0 core: clearly refuse OnFailure= deps on units that can't fail
Similar, refuse triggering deps on units that cannot trigger.

And rework how we ignore After= dependencies on device units, to work
the same way.

See: #14142
2020-01-09 11:03:53 +01:00
Lennart Poettering 867af7282b unit: make sure to pull in modprobe@loop.service when RootImage= is used with DeviceAllow=
Fixes: #14214
2020-01-07 18:53:31 +01:00
Lennart Poettering fc67a943d9 core: drop initial ListNames() bus call from PID 1
Previously, when first connecting to the bus after connecting to it we'd
issue a ListNames() bus call to the driver to figure out which bus names
are currently active. This information was then used to initialize the
initial state for services that use BusName=.

This change removes the whole code for this and replaces it with
something vastly simpler.

First of all, the ListNames() call was issues synchronosuly, which meant
if dbus was for some reason synchronously calling into PID1 for some
reason we'd deadlock. As it turns out there's now a good chance it does:
the nss-systemd userdb hookup means that any user dbus-daemon resolves
might result in a varlink call into PID 1, and dbus resolves quite a lot
of users while parsing its policy. My original goal was to fix this
deadlock.

But as it turns out we don't need the ListNames() call at all anymore,
since #12957 has been merged. That PR was supposed to fix a race where
asynchronous installation of bus matches would cause us missing the
initial owner of a bus name when a service is first started. It fixed it
(correctly) by enquiring with GetOwnerName() who currently owns the
name, right after installing the match. But this means whenever we start watching a bus name we anyway
issue a GetOwnerName() for it, and that means also when first connecting
to the bus we don't need to issue ListNames() anymore since that just
tells us the same info: which names are currently owned.

hence, let's drop ListNames() and instead make better use of the
GetOwnerName() result: if it failed the name is not owned.

Also, while we are at it, let's simplify the unit's owner_name_changed()
callback(): let's drop the "old_owner" argument. We never used that
besides logging, and it's hard to synthesize from just the return of a
GetOwnerName(), hence don't bother.
2020-01-06 15:21:47 +01:00
Lennart Poettering a5b0784795 core: create/remove unit bus name slots always together
When a service unit watches a bus name (i.e. because of BusName= being
set), then we do two things: we install a match slot to watch how its
ownership changes, and we inquire about the current owner. Make sure we
always do both together or neither.

This in particular fixes a corner-case memleak when destroying bus
connections, since we never freed the GetNameOwner() bus slots when
destroying a bus when they were still ongoing.
2020-01-06 15:21:44 +01:00
Lennart Poettering 5085ef0d71 core: no need to eat up error
This is a method call reply. We might as well propagate the error. The
worst that happens is that sd-bus logs about it.
2020-01-06 15:21:40 +01:00
Lennart Poettering 17bda1f19d core: shorten code a bit
The return parameter here cannot be NULL, the bus call either succeeds
or fails but will never uceed and return an empty owner.
2020-01-06 15:21:37 +01:00
Lennart Poettering a54654ba70 core: don't check potentially NULL error, it's not gonna work anyway 2020-01-06 15:21:33 +01:00
Lennart Poettering 42837b8134 core: don't check error parameter of get_name_owner_handler()
It's a *return* parameter, not an input parameter. Yes, this is a bit
confusing for method call replies, but we try to use the same message
handler for all incoming messages, hence the parameter. We are supposed
to write any error into it we encounter, if we want, and our caller will
log it, but that's it.
2020-01-06 15:21:30 +01:00
Anita Zhang 2f8c48b605 core,journal: export user units' InvocationID and use as _SYSTEMD_INVOCATION_ID
Write a user unit's invocation ID to /run/user/<uid>/systemd/units/ similar
to how a system unit's invocation ID is written to /run/systemd/units/.

This lets the journal read and add a user unit's invocation ID to the
_SYSTEMD_INVOCATION_ID field of logs instead of the user manager's
invocation ID.

Fixes #12474
2019-12-19 17:42:17 -08:00
Lennart Poettering 8af381679d
Merge pull request #13940 from keur/protect_kernel_logs
Add ProtectKernelLogs to systemd.exec
2019-11-15 16:26:10 +01:00
Kevin Kuehler 8470304018 core: Add ProtectKernelLogs
If seccomp is enabled, load the SYSCALL_FILTER_SET_SYSLOG into the
seccomp filter set. Drop the CAP_SYSLOG capability.
2019-11-11 12:12:02 -08:00
Zbigniew Jędrzejewski-Szmek 084870f9c0 core: rename CGROUP_AUTO/STRICT/CLOSED to CGROUP_DEVICE_POLICY_…
The old names were very generic, and when used without context it wasn't at all
clear that they are about the devices policy.
2019-11-10 23:22:15 +01:00
Yu Watanabe e30e8b5073 tree-wide: drop stat.h or statfs.h when stat-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe 455fa9610c tree-wide: drop string.h when string-util.h or friends are included 2019-11-04 00:30:32 +09:00
Yu Watanabe f5947a5e92 tree-wide: drop missing.h 2019-10-31 17:57:03 +09:00
Franck Bui d336ba9fa6 core: drop 'wants' parameter from unit_add_node_dependency()
Since Wants dependency is no more automagically added to swap and mount units,
this parameter is no more used hence this patch drops it.
2019-10-28 18:51:23 +01:00
Zbigniew Jędrzejewski-Szmek a5648b8094 basic/fs-util: change CHASE_OPEN flag into a separate output parameter
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.

(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
2019-10-24 22:44:24 +09:00
Zbigniew Jędrzejewski-Szmek c3784a7d78 core: simplify unit_load() a bit
Now all unit types define .load. But even if it wasn't defined, we'd need
to call unit_load_fragment_and_dropin() anyway, so this code would not have
worked correctly.

Also, unit_load_fragment_and_dropin() either returns -ENOENT or changes
UNIT_STUB to UNIT_LOADED, so we don't need to repeat this here.
2019-10-11 11:25:04 +02:00
Zbigniew Jędrzejewski-Szmek c362077087 core: turn unit_load_fragment_and_dropin_optional() into a flag
unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional()
are really the same, with one minor difference in behaviour. Let's drop
the second function.

"_optional" in the name suggests that it's the "dropin" part that is optional.
(Which it is, but in this case, we mean the fragment to be optional.)
I think the new version with a flag is easier to understand.
2019-10-11 10:45:33 +02:00
Zbigniew Jędrzejewski-Szmek 86e94d95d0
Merge pull request #13246 from keszybz/add-SystemdOptions-efi-variable
Add efi variable to augment /proc/cmdline
2019-10-03 12:19:44 +02:00
Zbigniew Jędrzejewski-Szmek 4ab1670f3d core: rework how logging level is calculated for kill operations
Setting the log level based on the signal made sense when signals that
were used were fixed. Since we allow signals to be configured, it doesn't
make sense to log at notice level about e.g. a restart or stop operation
just because the signal used is different.

This avoids messages like:
  six.service: Killing process 210356 (sleep) with signal SIGINT.
2019-10-02 14:01:40 +02:00
Zbigniew Jędrzejewski-Szmek a232ebcc2c core: add support for RestartKillSignal= to override signal used for restart jobs
v2:
- if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary
  to preserve backwards compatibility (and keep KillSignal= generally useful).
2019-10-02 14:01:25 +02:00
Zbigniew Jędrzejewski-Szmek 28a2dfe801 core: add helper function to check job status
Since job.h includes unit.h, and unit.h includes job.h, imports need to
be adjusted to make sure unit.h is included first if the helper is used.
2019-10-01 15:05:27 +02:00
Filipe Brandenburger 28b77ab246 log: Add missing "%" in "%m" log format strings
These were clearly intended to be "%m" to display the human readable version
of the error stored in errno.
2019-09-25 09:28:26 +02:00
Zbigniew Jędrzejewski-Szmek 5ac1530eca tree-wide: say "ratelimit" not "rate_limit"
"ratelimit" is a real word, so we don't need to use the other form anywhere.
We had both forms in various places, let's standarize on the shorter and more
correct one.
2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek 7bf081a1e5 pid1: rename start_limit to start_ratelimit
This way it is clearer what the type is. We also have auto_stop_ratelimit adjacent,
and it feels ugly to have a different suffix for those two.
2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek 8c227e7f2b Drop RATELIMIT macros
Using plain structure initialization is both shorter _and_ more clearer.
We get type safety for free.
2019-09-20 16:05:53 +02:00
ypf791 b49e14d5f3 core: coldplug possible nop_job 2019-09-17 13:46:21 +00:00