Commit Graph

245 Commits

Author SHA1 Message Date
Michal Sekletár d9e45bc3ab core: introduce support for cgroup freezer
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
2020-04-30 19:02:51 +02:00
Lennart Poettering 893f801d67 core: implement generic log control API in PID1 too
It has slightly different setters in place, so it needs some special
love, which is easy enough though.
2020-04-21 17:08:23 +02:00
Lennart Poettering 25141692e9 core: use generic implementations of log level/target bus propertier getters
The setters are slightly different, hence keep them as they are for now.
2020-04-21 17:08:23 +02:00
Lennart Poettering 09f8722801
Merge pull request #15396 from keszybz/dbus-api-docs
D-bus API docs
2020-04-17 23:40:50 +02:00
Zbigniew Jędrzejewski-Szmek dad97f0425 manager: add dbus parameter names 2020-04-16 19:46:40 +02:00
Lennart Poettering 7a8867abfa user-util: rework how we validate user names
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.

The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)

The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…

This effectively liberaralizes a lot what we expect from usernames.

The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.

Fixes: #15149 #15090
2020-04-08 17:11:20 +02:00
Zbigniew Jędrzejewski-Szmek 105a1a36cd tree-wide: fix spelling of lookup and setup verbs
"set up" and "look up" are the verbs, "setup" and "lookup" are the nouns.
2020-03-03 15:02:53 +01:00
Christian Göttsche c0f765cac8 core: move bus-util include out of selinux-access header 2020-02-04 19:26:38 +01:00
Yu Watanabe 60d0a5098b util: uid_t, gid_t, and pid_t must be 32bit
We already have assert_cc(sizeof(uid_t) == sizeof(uint32_t)) or friends
at various places.
2020-02-02 17:13:08 +01:00
Zbigniew Jędrzejewski-Szmek d7ceaf7261 shared/install: provide a nicer error message for invalid WantedBy=/Required= values
$ build/systemctl --user cat badinstall
 # /home/zbyszek/.config/systemd/user/badinstall.service
[Service]
ExecStart=true

[Install]
WantedBy=asdf

$ build/systemctl --user enable badinstall
Failed to enable unit: "asdf" is not a valid unit name.

Fixes #4209.
2019-12-13 19:30:36 +01:00
Zbigniew Jędrzejewski-Szmek 3a0f06c41a core: make TasksMax a partially dynamic property
TasksMax= and DefaultTasksMax= can be specified as percentages. We don't
actually document of what the percentage is relative to, but the implementation
uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max,
and /sys/fs/cgroup/pids.max (when present). When the value is a percentage,
we immediately convert it to an absolute value. If the limit later changes
(which can happen e.g. when systemd-sysctl runs), the absolute value becomes
outdated.

So let's store either the percentage or absolute value, whatever was specified,
and only convert to an absolute value when the value is used. For example, when
starting a unit, the absolute value will be calculated when the cgroup for
the unit is created.

Fixes #13419.
2019-11-14 18:41:54 +01:00
Zbigniew Jędrzejewski-Szmek e9cfc71222
Merge pull request #13635 from fbuihuu/no-aliases-with-enable
man: alias names can't be used with enable command
2019-10-28 09:23:08 +01:00
Zbigniew Jędrzejewski-Szmek a5648b8094 basic/fs-util: change CHASE_OPEN flag into a separate output parameter
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.

(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
2019-10-24 22:44:24 +09:00
Franck Bui 2268367471 shared/install: failing with -ELOOP can be due to the use of an alias in install_error()
-ELOOP can happen also when enabling an alias name (which is admittedly useless
since the unit it belongs to was already enabled) so let's mention this
possibility when reporting the corresponding error.
2019-09-24 19:05:06 +02:00
Luca Boccassi 65224c1d0e core: rename ShutdownWatchdogSec to RebootWatchdogSec
This option is only used on reboot, not on other types of shutdown
modes, so it is misleading.
Keep the old name working for backward compatibility, but remove it
from the documentation.
2019-07-23 20:29:03 +01:00
Luca Boccassi acafd7d8a6 core: add KExecWatchdogSec option
Rather than always enabling the shutdown WD on kexec, which might be
dangerous in case the kernel driver and/or the hardware implementation
does not reset the wd on kexec, add a new timer, disabled by default,
to let users optionally enable the shutdown WD on kexec separately
from the runtime and reboot ones. Advise in the documentation to
also use the runtime WD in conjunction with it.

Fixes: a637d0f9ec ("core: set shutdown watchdog on kexec too")
2019-07-23 20:29:03 +01:00
Lennart Poettering 4d3bac5645 core: expose new clean operation on the bus
This adds CanClean() and Clean() as new methods on the Unit object that
initiate the cleaning operation.
2019-07-11 12:18:51 +02:00
Zbigniew Jędrzejewski-Szmek c1d95b713a pid1: tiny simplification
v2:
- use empty_to_root()
2019-07-10 13:35:17 +02:00
Yu Watanabe 3bf0cb65f5 core: use BUS_DEFINE_PROPERTY_GET() macro at more places 2019-04-14 20:45:31 +09:00
Jan Klötzke dc653bf487 service: handle abort stops with dedicated timeout
When shooting down a service with SIGABRT the user might want to have a
much longer stop timeout than on regular stops/shutdowns. Especially in
the face of short stop timeouts the time might not be sufficient to
write huge core dumps before the service is killed.

This commit adds a dedicated (Default)TimeoutAbortSec= timer that is
used when stopping a service via SIGABRT. In all other cases the
existing TimeoutStopSec= is used. The timer value is unset by default
to skip the special handling and use TimeoutStopSec= for state
'stop-watchdog' to keep the old behaviour.

If the service is in state 'stop-watchdog' and the service should be
stopped explicitly we still go to 'stop-sigterm' and re-apply the usual
TimeoutStopSec= timeout.
2019-04-12 17:32:52 +02:00
Lennart Poettering afcfaa695c core: implement OOMPolicy= and watch cgroups for OOM killings
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).

On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".

All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
2019-04-09 11:17:58 +02:00
Lennart Poettering 42561fc99c core: add a generic helper that forwards per-unit method calls from Manager
Quite often we have a method DoSomethingWithUnit() on the Manager object
that is the same as a function DoSomething() on a Unit object. Let's
shorten things by introducing a common function that forwards the
former to the latter, instead of writing this again and again.
2019-04-02 16:38:20 +02:00
Zbigniew Jędrzejewski-Szmek 237ebf61e2
Merge pull request #12013 from yuwata/fix-switchroot-11997
core: on switching root do not emit device state change based on enumeration results
2019-04-02 16:06:07 +02:00
Lennart Poettering 4659bf6f76 core: add a common function for bus calls that return unit dbus path
Let's shorten the code a bit by using a single function for similar
cases.

No change in behaviour, just some refactoring and shortening.
2019-04-02 05:34:03 +09:00
Lennart Poettering 50cbaba4fe core: add new API for enqueing a job with returning the transaction data 2019-03-27 12:37:37 +01:00
Yu Watanabe 87a19bfedc core: use _cleanup_free_ attribute and free_and_replace() macro in method_switch_root() 2019-03-15 18:59:31 +09:00
Zbigniew Jędrzejewski-Szmek 681bd2c524 meson: generate version tag from git
$ build/systemctl --version
systemd 239-3555-g6178cbb5b5
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN +PCRE2 default-hierarchy=hybrid
$ git tag v240 -m 'v240'
$ ninja -C build
ninja: Entering directory `build'
[76/76] Linking target fuzz-unit-file.
$ build/systemctl --version
systemd 240
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN +PCRE2 default-hierarchy=hybrid

This is very useful during development, because a precise version string is
embedded in the build product and displayed during boot, so we don't have to
guess answers for questions like "did I just boot the latest version or the one
from before?".

This change creates an overhead for "noop" builds. On my laptop, 'ninja -C
build' that does nothing goes from 0.1 to 0.5 s. It would be nice to avoid
this, but I think that <1 s is still acceptable.

Fixes #7183.

PACKAGE_VERSION is renamed to GIT_VERSION, to make it obvious that this is the
more dynamically changing version string.

Why save to a file? It would be easy to generate the version tag using
run_command(), but we want to go through a file so that stuff gets rebuilt when
this file changes. If we just defined an variable in meson, ninja wouldn't know
it needs to rebuild things.
2018-12-21 13:43:20 +01:00
Lennart Poettering 209de5256b core: rename queued_message → pending_reload_message
This field is only used for pending Reload() replies, hence let's rename
it to be more descriptive and precise.

No change in behaviour.
2018-11-13 11:59:06 +01:00
Lennart Poettering c20076a8c1 pid1: add a new AbandonScope() method call on the Manager object
This is the same as Abandon() on the Scope object, but saves clients
from first translating a unit name into a unit object path. This logic
matches how all the other unit methods have counterparts on the Manager
object too (e.g. StopUnit() on the Manager object matching Stop() on the
Unit object), this one was simply forgotten so far.
2018-11-09 17:08:59 +01:00
Lennart Poettering 1ad6e8b302 core: split environment block mantained by PID 1's Manager object in two
This splits the "environment" field of Manager into two:
transient_environment and client_environment. The former is generated
from configuration file, kernel cmdline, environment generators. The
latter is the one the user can control with "systemctl set-environment"
and similar.

Both sets are merged transparently whenever needed. Separating the two
sets has the benefit that we can safely flush out the former while
keeping the latter during daemon reload cycles, so that env var settings
from env generators or configuration files do not accumulate, but
dynamic API changes are kept around.

Note that this change is not entirely transparent to users: if the user
first uses "set-environment" to override a transient variable, and then
uses "unset-environment" to unset it again things will revert to the
original transient variable now, while previously the variable was fully
removed. This change in behaviour should not matter too much though I
figure.

Fixes: #9972
2018-10-31 18:00:53 +01:00
Lennart Poettering af41e5086d core: rename ManagerExitCode → ManagerObjective
"ExitCode" is a bit of a misnomer in two ways: it suggests this was
about the "exit code" concept that exit()/waitid() deal with, but really
isn't. Moreover, it's not event just about exiting either, but more
often about reloading/reexecing or rebooting. Let's hence pick a new
name for this that is a bit more correct.

I initially thought about naming this the "state", but that'd be a
misnomer too, as the value really encodes a "goal" more than a current
state. Also we already have the externally visible ManagerState.

No actual changes in behaviour, just the rename.
2018-10-09 19:43:43 +02:00
Yu Watanabe 4ae25393f3 tree-wide: shorten error logging a bit
Continuation of 4027f96aa0.
2018-08-07 10:14:33 +09:00
Lennart Poettering ae0db6f132
Merge pull request #9687 from yuwata/rfe-9662
analyze: several systemd-analyze plot improvements
2018-07-24 09:43:57 +02:00
Yu Watanabe 665a9774e0 core: expose initrd related timestamps on bus 2018-07-24 03:47:07 +09:00
Yu Watanabe 7a293242e0 core: normalize ShowStatus 2018-07-23 21:55:26 +09:00
Chris Lamb 3fe910794b Correct a number of trivial typos. 2018-06-18 22:44:44 +02:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Franck Bui bda7d78ba1 pid1: preserve current value of log target across re-{load,execution}
To make debugging easier, this patches allows one to change the log target and
do reload/reexec without modifying configuration permanently, which makes
debugging easier.

Indeed if one changed the log target at runtime (via the bus or via signals),
the change was lost on the next reload/reexecution.

In order to restore back the default value (set via system.conf, environment
variables or any other means ), the empty string in the "LogTarget" property is
now supported as well as sending SIGTRMIN+26 signal.
2018-06-13 18:52:27 +02:00
Franck Bui a6ecbf836c pid1: preserve current value of log level across re-{load,execution}
To make debugging easier, this patches allows one to change the log level and
do reload/reexec without modifying configuration permanently, which makes
debugging easier.

Indeed if one changed the log max level at runtime (via the bus or via
signals), the change was lost on the next daemon reload/reexecution.

In order to restore the original value back (set via system.conf, environment
variables or any other means), the empty string in the "LogLevel" property is
now supported as well as sending SIGRTMIN+23 signal.
2018-06-13 18:52:27 +02:00
Lennart Poettering e49da001c4 core: rename (and modernize) bus_unit_check_load_state() → bus_unit_validate_load_state()
Let's use a switch() statement, cover more cases with pretty messages.
Also let's rename it to "validate", as that's more specific that
"check", as it implies checking for a "valid"/"good" state, which is
what this function does.
2018-06-11 12:53:12 +02:00
Lennart Poettering d58ad743f9 os-util: add helpers for finding /etc/os-release
Place this new helpers in a new source file os-util.[ch], and move the
existing and related call path_is_os_tree() to it as well.
2018-05-24 17:01:57 +02:00
Zbigniew Jędrzejewski-Szmek 52d2566ac7 pid1: fix ShowStatus property
It is not const, because a) systemd can bump it on its own if
errors occur, and b) the user can change it using signals.
Also it's not boolean.

$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ShowStatus
b true
$ sudo kill -SIGRTMIN+21 1
$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ShowStatus
b false

Fixes #4503.
2018-05-22 16:14:20 +02:00
Yu Watanabe 92c23c5a70 core: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-15 23:11:16 +09:00
Yu Watanabe 3ff52e8f52 dbus-manager: introduce property_get_{hashmap,set}_size() 2018-05-13 12:21:17 +09:00
Yu Watanabe 23c9a63a98 dbus-manager: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:06 +09:00
David Tardon c0a1bfacfe systemd-analyze: make dump work for large # of units
If there is a large number of units, the size of the generated dump
string can overstep DBus message size limit. So let's pass that string
via a fd.
2018-05-11 08:11:02 -07:00
Yu Watanabe 79a603758d core: send NULL instead of empty string 2018-05-11 01:22:49 +09:00
Lennart Poettering 9fc0345551
Merge pull request #8815 from poettering/get-unit-by-cgroup
add new GetUnitByControlGroup API
2018-05-02 10:51:48 +02:00
Lennart Poettering da6053d0a7 tree-wide: be more careful with the type of array sizes
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.

Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.

So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.

This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:

1. strv_length()' return type becomes size_t

2. the unit file changes array size becomes size_t

3. DNS answer and query array sizes become size_t

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
2018-04-27 14:29:06 +02:00