Commit Graph

498 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 1d3fe304fd Use sd_event_source_disable_unref() 2019-05-10 16:55:37 +02:00
Ben Boeckel 5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Jan Klötzke 99b43caf26 core: immediately trigger watchdog action on WATCHDOG=trigger
A service might be able to detect errors by itself that may require the
system to take the same action as if the service locked up. Add a
WATCHDOG=trigger state change notification to sd_notify() to let the
service manager know about the self-detected misery and instantly
trigger the configured watchdog behaviour.
2019-04-24 10:17:10 +02:00
Yu Watanabe dcab85be18 core: do not show TimeoutStopSec= in dump message if it is not set 2019-04-14 20:47:13 +09:00
Jan Klötzke dc653bf487 service: handle abort stops with dedicated timeout
When shooting down a service with SIGABRT the user might want to have a
much longer stop timeout than on regular stops/shutdowns. Especially in
the face of short stop timeouts the time might not be sufficient to
write huge core dumps before the service is killed.

This commit adds a dedicated (Default)TimeoutAbortSec= timer that is
used when stopping a service via SIGABRT. In all other cases the
existing TimeoutStopSec= is used. The timer value is unset by default
to skip the special handling and use TimeoutStopSec= for state
'stop-watchdog' to keep the old behaviour.

If the service is in state 'stop-watchdog' and the service should be
stopped explicitly we still go to 'stop-sigterm' and re-apply the usual
TimeoutStopSec= timeout.
2019-04-12 17:32:52 +02:00
Lennart Poettering afcfaa695c core: implement OOMPolicy= and watch cgroups for OOM killings
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).

On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".

All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
2019-04-09 11:17:58 +02:00
Lennart Poettering a5b5aece01 service: beautify debug log message a bit 2019-04-09 11:17:58 +02:00
Zbigniew Jędrzejewski-Szmek c6335c3b51
Merge pull request #12115 from poettering/verbose-job-enqueue
add "systemctl --show-transaction start" as a more verbose "systemctl start" that shows enqueued jobs
2019-03-28 11:04:26 +01:00
Lennart Poettering 50cbaba4fe core: add new API for enqueing a job with returning the transaction data 2019-03-27 12:37:37 +01:00
Zbigniew Jędrzejewski-Szmek ca78ad1de9 headers: remove unneeded includes from util.h
This means we need to include many more headers in various files that simply
included util.h before, but it seems cleaner to do it this way.
2019-03-27 11:53:12 +01:00
Lennart Poettering 6f765baf23 core: rework how we reset the TTY after use by a service
This makes two changes:

1. Instead of resetting the configured service TTY each time after a
   process exited, let's do so only when the service goes back to "dead"
   state. This should be preferable in case the started processes leave
   background child processes around that still reference the TTY.

2. chmod() and chown() the TTY at the same time. This should make it
   safe to run "systemd-run -p DynamicUser=1 -p StandardInput=tty -p
   TTYPath=/dev/tty8 /bin/bash" without leaving a TTY owned by a dynamic
   user around.
2019-03-20 21:28:02 +01:00
Franck Bui 846a07b505 core: only watch processes when it's really necessary
If we know that main pid is our child then it's unnecessary to watch all
other processes of a unit since in this case we will get SIGCHLD when the main
process will exit and will act upon accordingly.

So let's watch all processes only if the main process is not our child since in
this case we need to detect when the cgroup will become empty in order to
figure out when the service becomes dead. This is only needed by cgroupv1.
2019-03-20 10:51:49 +01:00
Franck Bui f75f613d25 core: reduce the number of stalled PIDs from the watched processes list when possible
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.

However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.

If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
2019-03-20 10:51:49 +01:00
Franck Bui 4d05154600 process-util: introduce pid_is_my_child() helper
No functional changes.
2019-03-20 10:51:49 +01:00
Lennart Poettering 97a3f4ee05 core: rename unit_{start_limit|condition|assert}_test() to unit_test_xyz()
Just some renaming, no change in behaviour.

Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
2019-03-18 16:06:36 +01:00
Anita Zhang e51237253e core: consider non-SERVICE_EXEC_START commands for EXIT_CLEAN_COMMAND
When there are multiple ExecStop= statements, the next command would continue
to run even after TimeoutStopSec= is up and sends SIGTERM. This is because,
unless Type= is oneshot, the exit code/status would evaluate to SERVICE_SUCCESS
in service_sigchld_event()'s call to is_clean_exit(). This success indicates
following commands would continue running until the end of the list
is reached, or another timeout is hit and SIGKILL is sent.

Since long running processes should not be invoked in non-SERVICE_EXEC_START
commands, consider them for EXIT_CLEAN_COMMAND instead of EXIT_CLEAN_DAEMON.
Passing EXIT_CLEAN_COMMAND to is_clean_exit() evaluates the SIGTERM exit
code/status to failure and will stop execution after the first timeout is hit.

Fixes #11431
2019-02-26 10:18:39 +01:00
Lennart Poettering 5bcffb4b54
Merge pull request #11457 from grooverdan/sendsigkill_no
service: killmode=cgroup|mixed, SendSIGKILL=no services are not multiprocess
2019-02-18 13:41:52 +01:00
Lennart Poettering dcf3c3c3d9 core: export $PIDFILE env var for services, derived from PIDFile= 2019-02-15 11:32:19 +01:00
Daniel Black c53d2d54bd service: make killmode=cgroup|mixed, SendSIGKILL=no services singletons
KillMode=mixed and control group are used to indicate that all
process should be killed off. SendSIGKILL is used for services
that require a clean shutdown. These are typically database
service where a SigKilled process would result in a lengthy
recovery and who's shutdown or startup time is quite variable
(so Timeout settings aren't of use).

Here we take these two factors and refuse to start a service if
there are existing processes within a control group. Databases,
while generally having some protection against multiple instances
running, lets not stress the rigor of these. Also ExecStartPre
parts of the service aren't as rigoriously written to protect
against against multiple use.

closes #8630
2019-01-29 15:35:59 +11:00
Jonathon Kowalski 03ff2dc71e Change job mode of manager triggered restarts to JOB_REPLACE
Fixes: #11305
Fixes: #3260
Related: #11456

So, here's what happens in the described scenario in #11305. A unit goes
down, and that triggeres stop jobs for the other two units as they were
bound to it. Now, the timer for manager triggered restarts kicks in and
schedules a restart job with the JOB_FAIL job mode. This means there is
a stop job installed on those units, and now due to them being bound to
us they also get a restart job enqueued. This however is a conflicts, as
neither stop can merge into restart, nor restart into stop. However,
restart should be able to replace stop in any case. If the stop
procedure is ongoing, it can cancel the stop job, install itself, and
then after reaching dead finish and convert itself to a start job.
However, if we increase the timer, then it can always take those units
from inactive -> auto-restart.

We change the job mode to JOB_REPLACE so the restart job cancels the
stop job and installs itself.

Also, the original bug could be worked around by bumping RestartSec= to
avoid the conflicting.

This doesn't seem to be something that is going to break uses. That is
because for those who already had it working, there must have never been
conflicting jobs, as that would result in a desctructive transaction by
virtue of the job mode used.

After this change, the test case is able to work nicely without issues.
2019-01-18 13:50:52 +01:00
Alexey Bogdanenko 8f9f3cb724 core: fix KeyringMode for user services
KeyringMode option is useful for user services. Also, documentation for the
option suggests that the option applies to user services. However, setting the
option to any of its allowed values has no effect.

This commit fixes that and removes EXEC_NEW_KEYRING flag. The flag is no longer
necessary: instead of checking if the flag is set we can check if keyring_mode
is not equal to EXEC_KEYRING_INHERIT.
2018-12-17 16:56:36 +01:00
Lennart Poettering 2327f95499
Merge pull request #10984 from fbuihuu/tmpfiles-be-more-explicit-with-unsafe-transition
tmpfiles: be more explicit when an unsafe path transition is met
2018-12-10 12:31:56 +01:00
Franck Bui 36c97decbe fs-util: make chase_symlink() returns -ENOLINK when unsafe transitions are met
We previously returned -EPERM but it can be returned for various other reasons
too.

Let's use -ENOLINK instead as this value shouldn't be used currently. This
allows users of CHASE_SAFE to detect without any ambiguities when unsafe
transitions are encountered by chase_symlinks().

All current users of CHASE_SAFE that explicitly reacted on -EPERM have been
converted to react on -ENOLINK.
2018-12-10 09:18:27 +01:00
Lennart Poettering 6fcbec6f9b core: whenever we change state of a unit, force out PropertiesChanged bus signal
This allows clients to follow our internal state changes safely.

Previously, quick state changes (for example, when we restart a unit due
to Restart= after it quickly transitioned through DEAD/FAILED states)
would be coalesced into one bus signal event, with this change there's
the guarantee that all state changes after the unit was announced ones
are reflected on th bus.

Note we only do this kind of guaranteed flushing only for unit state
changes, not for other unit property changes, where clients still have
to expect coalescing. This is because the unit state is a very
important, high-level concept.

Fixes: #10185
2018-12-01 12:53:26 +01:00
Zbigniew Jędrzejewski-Szmek 8b4e51a60e
Merge pull request #10797 from poettering/run-generator
add new "systemd-run-generator" for running arbitrary commands from the kernel command line as system services using the "systemd.run=" kernel command line switch
2018-11-28 22:40:55 +01:00
Lennart Poettering 7af67e9a8b core: allow to set exit status when using SuccessAction=/FailureAction=exit in units
This adds SuccessActionExitStatus= and FailureActionExitStatus= that may
be used to configure the exit status to propagate in when
SuccessAction=exit or FailureAction=exit is used.

When not specified let's also propagate the exit status of the main
process we fork off for the unit.
2018-11-27 09:44:40 +01:00
Lennart Poettering 78f93209fc core: when Delegate=yes is set for a unit, run ExecStartPre= and friends in a subcgroup of the unit
Otherwise we might conflict with the "no-processes-in-inner-cgroup" rule
of cgroupsv2. Consider nspawn starting up and initializing its cgroup
hierarchy with "supervisor/" and "payload/" as subcgroup, with itself
moved into the former and the payload into the latter. Now, if an
ExecStartPre= is run right after it cannot be placed in the main cgroup,
because that is now in inner cgroup with populated children.

Hence, let's run these helpers in another sub-cgroup .control/ below it.

This is somewhat ugly since it weakens the clear separation of
ownership, but given that this is an explicit contract, and double opt-in should be acceptable.

Fixes: #10482
2018-11-26 18:43:23 +01:00
Zbigniew Jędrzejewski-Szmek aac99f303a core: introduce a helper function to wrap unit_log_{success,failure}
It's inline so that the compiler can easily optimize away the call to get
status string.
2018-11-16 19:47:07 +01:00
Lennart Poettering 523ee2d414 core: log a recognizable message when a unit succeeds, too
We already are doing it on failure, let's do it on success, too.

Fixes: #10265
2018-11-16 15:22:48 +01:00
Lennart Poettering 91bbd9b796 core: make log messages about unit processes exiting recognizable 2018-11-16 15:22:48 +01:00
Lennart Poettering 7c047d7443 core: make log messages about units entering a 'failed' state recognizable
Let's make this recognizable, and carry result information in a
structure fashion.
2018-11-16 15:22:48 +01:00
Yu Watanabe b9c04eafb8 core: introduce exec_params_clear()
Follow-up for 1ad6e8b302.

Fixes #10677.
2018-11-08 09:36:37 +01:00
Lennart Poettering 1ad6e8b302 core: split environment block mantained by PID 1's Manager object in two
This splits the "environment" field of Manager into two:
transient_environment and client_environment. The former is generated
from configuration file, kernel cmdline, environment generators. The
latter is the one the user can control with "systemctl set-environment"
and similar.

Both sets are merged transparently whenever needed. Separating the two
sets has the benefit that we can safely flush out the former while
keeping the latter during daemon reload cycles, so that env var settings
from env generators or configuration files do not accumulate, but
dynamic API changes are kept around.

Note that this change is not entirely transparent to users: if the user
first uses "set-environment" to override a transient variable, and then
uses "unset-environment" to unset it again things will revert to the
original transient variable now, while previously the variable was fully
removed. This change in behaviour should not matter too much though I
figure.

Fixes: #9972
2018-10-31 18:00:53 +01:00
Lennart Poettering bea1a01310 strv: wrap strv_new() in a macro so that NULL sentinel is implicit 2018-10-31 18:00:52 +01:00
Lennart Poettering aa8c4bbf6a service: when starting a service make a copy of the watchdog timeout and use that
When we start a service process we pass the selected watchdog timeout to
it with the $WATCHDOG_USEC environment variable. If the unit file is
reconfigured later, we need to make sure to continue to honour the
original timeout, i.e. watch $WATCHDOG_USEC was set to, otherwise we'll
expect the ping at a different time as the service process is sending it
to us.

Hence, whenever we start a unit, save the watchdog timeout, and stick to
that for everything we do.

Fixes: #9467
2018-10-26 13:00:04 +02:00
Lennart Poettering 34b3f625f2 service: continue to use the overriden timeout when forking off again
Let's make sure we always use the right watchdog timeout: when a service
has overwritten it, then stick to it, also for follow-up processes of
the same service.
2018-10-26 13:00:04 +02:00
Lennart Poettering 95d0d8ed0a service: rename service_reset_watchdog_timeout() → service_override_watchdog_timeout()
This is what the function really does, hence name it that way.
2018-10-26 13:00:04 +02:00
Lennart Poettering ec35a7f6b0 service: rework service_extend_timeout()
Let's unify common code: let's extend the watchdog timeout and the
regular timeout with the same helper function.
2018-10-26 13:00:04 +02:00
Lennart Poettering 9fb1cdb480 service: explicit stop the watchdog when we shall not use it
This is useful so that WATCHDOG_USEC=0 sent from a process does the
right thing if turning off the watchdog logic.
2018-10-26 12:53:17 +02:00
Lennart Poettering d68c645bd3 core: rework serialization
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.

In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.

While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
2018-10-26 10:52:41 +02:00
Lennart Poettering 3eac1bcae9 core: enforce a limit on STATUS= texts recvd from services
Let's better be safe than sorry, and put a limit on what we receive.
2018-10-26 10:40:01 +02:00
Yu Watanabe 5e1ee764e1 core: include error cause in log message 2018-10-20 01:40:42 +09:00
Lennart Poettering 67f5d31b45
Merge pull request #10440 from poettering/fflush-and-check-some-more
use fflush_and_check() and free_and_replace() where we can
2018-10-17 22:54:34 +02:00
Lennart Poettering a42984dbc7
Merge pull request #10428 from keszybz/failure-actions
Implement manager status changes using SuccessAction=
2018-10-17 21:29:10 +02:00
Lennart Poettering efa3f34e84 service: use free_and_replace() where we can 2018-10-17 21:24:04 +02:00
Zbigniew Jędrzejewski-Szmek 3f00d379fa core: allow services with no commands but SuccessAction set 2018-10-17 19:31:50 +02:00
Zbigniew Jędrzejewski-Szmek ef5ae8e713 core: consider service with no start command immediately started
The service would always be in state == SERVICE_INACTIVE, but it needs to go
through state == SERVICE_START so that SuccessAction/FailureAction are executed.
2018-10-17 19:28:16 +02:00
Lennart Poettering cdc2af3e15 core: log about unit_watch_pid() failing
CID 1237509
2018-10-16 13:52:21 +02:00
Lennart Poettering 334415b16e
Merge pull request #10094 from keszybz/wants-loading
Fix bogus fragment paths in units in .wants/.requires
2018-10-05 17:36:31 +02:00
Anita Zhang c87700a133 Make Watchdog Signal Configurable
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.

Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.

Closes #8658
2018-09-26 16:14:29 +02:00