core: add EXTEND_TIMEOUT_USEC={usec} - prevent timeouts in startup/runtime/shutdown (#7214)

With Type=notify services, EXTEND_TIMEOUT_USEC= messages will delay any startup/
runtime/shutdown timeouts.

A service that hasn't timed out, i.e, start time < TimeStartSec,
runtime < RuntimeMaxSec and stop time < TimeoutStopSec, may by sending
EXTEND_TIMEOUT_USEC=, allow the service to continue beyond the limit for
the execution phase (i.e TimeStartSec, RunTimeMaxSec and TimeoutStopSec).

EXTEND_TIMEOUT_USEC= must continue to be sent (in the same way as
WATCHDOG=1) within the time interval specified to continue to reprevent
the timeout from occuring.

Watchdog timeouts are also extended if a EXTEND_TIMEOUT_USEC is greater
than the remaining time on the watchdog counter.

Fixes #5868.
This commit is contained in:
Daniel Black 2017-12-14 22:17:43 +11:00 committed by Lennart Poettering
parent 966c04cf01
commit a327431bd1
16 changed files with 402 additions and 18 deletions

40
NEWS
View File

@ -180,26 +180,32 @@ CHANGES WITH 236 in spe:
registered by dropping in a .dnssd file in /etc/systemd/dnssd/ (or
the same dir below /run, /usr/lib), or through its D-Bus API.
* The sd_notify() protocol can now with EXTEND_TIMEOUT_USEC=microsecond
extend the effective start, runtime, and stop time. The service must
continue to send EXTEND_TIMEOUT_USEC within the period specified to
prevent the service manager from making the service as timedout.
Contributions from: Alan Jenkins, Alan Robertson, Alessandro Ghedini,
Andrew Jeddeloh, Antonio Rojas, Ari, asavah, bleep_blop, Carsten
Strotmann, Christian Brauner, Christian Hesse, Clinton Roy, Collin
Eggert, Daniel Lockyer, Daniel Rusek, Dimitri John Ledkov, Dmitry
Rozhkov, Edward A. James, Evgeny Vereshchagin, Florian Klink, Franck
Bui, Gwendal Grignou, Hans de Goede, Harald Hoyer, Hristo Venev, Ikey
Doherty, Jakub Wilk, Jérémy Rosen, Jiahui Xie, John Lin, José Bollo,
Josef Andersson, juga0, Krzysztof Nowicki, Lars Karlitski, Lars
Kellogg-Stedman, Lauri Tirkkonen, Lennart Poettering, Lubomir Rintel,
Lucas Werkmeister, Lukáš Nykrýn, Lukáš Říha, Lukasz Rubaszewski, Maciej
S. Szmigiero, Mantas Mikulėnas, Marcus Folkesson, Martin Steuer,
Mathieu Trudel-Lapierre, Matija Skala, Max Resch, Michael Biebl,
Michael Vogt, Michal Koutný, Michal Sekletar, Mike Gilbert, Muhammet
Kara, Neil Brown, Olaf Hering, Ondrej Kozina, Patrik Flykt, Patryk
Kocielnik, Peter Hutterer, Piotr Drąg, Razvan Cojocaru, Robin
McCorkell, Roland Hieber, Sergey Ptashnick, Shawn Landden, Shuang Liu,
Simon Arlott, Simon Peeters, Stanislav Angelovič, Stefan Agner, Susant
Sahani, Sylvain Plantefève, Thomas Blume, Tinu Weber, Tom Stellard,
Topi Miettinen, Torsten Hilbrich, Vito Caputo, Vladislav Vishnyakov,
WaLyong Cho, Yu Watanabe, Zbigniew Jędrzejewski-Szmek, Zeal Jagannatha
Eggert, Daniel Black, Daniel Lockyer, Daniel Rusek, Dimitri John
Ledkov, Dmitry Rozhkov, Edward A. James, Evgeny Vereshchagin, Florian
Klink, Franck Bui, Gwendal Grignou, Hans de Goede, Harald Hoyer, Hristo
Venev, Ikey Doherty, Jakub Wilk, Jérémy Rosen, Jiahui Xie, John Lin,
José Bollo, Josef Andersson, juga0, Krzysztof Nowicki, Lars Karlitski,
Lars Kellogg-Stedman, Lauri Tirkkonen, Lennart Poettering, Lubomir
Rintel, Lucas Werkmeister, Lukáš Nykrýn, Lukáš Říha, Lukasz
Rubaszewski, Maciej S. Szmigiero, Mantas Mikulėnas, Marcus Folkesson,
Martin Steuer, Mathieu Trudel-Lapierre, Matija Skala, Max Resch,
Michael Biebl, Michael Vogt, Michal Koutný, Michal Sekletar, Mike
Gilbert, Muhammet Kara, Neil Brown, Olaf Hering, Ondrej Kozina, Patrik
Flykt, Patryk Kocielnik, Peter Hutterer, Piotr Drąg, Razvan Cojocaru,
Robin McCorkell, Roland Hieber, Sergey Ptashnick, Shawn Landden, Shuang
Liu, Simon Arlott, Simon Peeters, Stanislav Angelovič, Stefan Agner,
Susant Sahani, Sylvain Plantefève, Thomas Blume, Tinu Weber, Tom
Stellard, Topi Miettinen, Torsten Hilbrich, Vito Caputo, Vladislav
Vishnyakov, WaLyong Cho, Yu Watanabe, Zbigniew Jędrzejewski-Szmek, Zeal
Jagannatha
— Berlin, 2017-12-XX

View File

@ -211,6 +211,18 @@
Example : <literal>WATCHDOG_USEC=20000000</literal></para></listitem>
</varlistentry>
<varlistentry>
<term>EXTEND_TIMEOUT_USEC=…</term>
<listitem><para>Tells the service manager to extend the startup, runtime or shutdown service timeout
corresponding the current state. The value specified is a time in microseconds during which the service must
send a new message. A service timeout will occur if the message isn't received, but only if the runtime of the
current state is beyond the original maximium times of <varname>TimeoutStartSec=</varname>, <varname>RuntimeMaxSec=</varname>,
and <varname>TimeoutStopSec=</varname>.
See <citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for effects on the service timeouts.</para></listitem>
</varlistentry>
<varlistentry>
<term>FDSTORE=1</term>

View File

@ -517,6 +517,15 @@
<varname>Type=oneshot</varname> is used, in which case the
timeout is disabled by default (see
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>).
</para>
<para>If a service of <varname>Type=notify</varname> sends <literal>EXTEND_TIMEOUT_USEC=…</literal>, this may cause
the start time to be extended beyond <varname>TimeoutStartSec=</varname>. The first receipt of this message
must occur before <varname>TimeoutStartSec=</varname> is exceeded, and once the start time has exended beyond
<varname>TimeoutStartSec=</varname>, the service manager will allow the service to continue to start, provided
the service repeats <literal>EXTEND_TIMEOUT_USEC=…</literal> within the interval specified until the service
startup status is finished by <literal>READY=1</literal>. (see
<citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>).
</para></listitem>
</varlistentry>
@ -535,6 +544,14 @@
<varname>DefaultTimeoutStopSec=</varname> from the manager
configuration file (see
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>).
</para>
<para>If a service of <varname>Type=notify</varname> sends <literal>EXTEND_TIMEOUT_USEC=…</literal>, this may cause
the stop time to be extended beyond <varname>TimeoutStopSec=</varname>. The first receipt of this message
must occur before <varname>TimeoutStopSec=</varname> is exceeded, and once the stop time has exended beyond
<varname>TimeoutStopSec=</varname>, the service manager will allow the service to continue to stop, provided
the service repeats <literal>EXTEND_TIMEOUT_USEC=…</literal> within the interval specified, or terminates itself
(see <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>).
</para></listitem>
</varlistentry>
@ -553,7 +570,16 @@
active for longer than the specified time it is terminated and put into a failure state. Note that this setting
does not have any effect on <varname>Type=oneshot</varname> services, as they terminate immediately after
activation completed. Pass <literal>infinity</literal> (the default) to configure no runtime
limit.</para></listitem>
limit.</para>
<para>If a service of <varname>Type=notify</varname> sends <literal>EXTEND_TIMEOUT_USEC=…</literal>, this may cause
the runtime to be extended beyond <varname>RuntimeMaxSec=</varname>. The first receipt of this message
must occur before <varname>RuntimeMaxSec=</varname> is exceeded, and once the runtime has exended beyond
<varname>RuntimeMaxSec=</varname>, the service manager will allow the service to continue to run, provided
the service repeats <literal>EXTEND_TIMEOUT_USEC=…</literal> within the interval specified until the service
shutdown is acheived by <literal>STOPPING=1</literal> (or termination). (see
<citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>).
</para></listitem>
</varlistentry>
<varlistentry>

View File

@ -253,6 +253,45 @@ static void service_start_watchdog(Service *s) {
log_unit_warning_errno(UNIT(s), r, "Failed to install watchdog timer: %m");
}
static void service_extend_timeout(Service *s, usec_t extend_timeout_usec) {
assert(s);
if (s->timer_event_source) {
uint64_t current = 0, extended = 0;
int r;
if (IN_SET(extend_timeout_usec, 0, USEC_INFINITY))
return;
extended = usec_add(now(CLOCK_MONOTONIC), extend_timeout_usec);
r = sd_event_source_get_time(s->timer_event_source, &current);
if (r < 0)
log_unit_error_errno(UNIT(s), r, "Failed to retrieve timeout timer: %m");
else if (extended > current) {
r = sd_event_source_set_time(s->timer_event_source, extended);
if (r < 0)
log_unit_warning_errno(UNIT(s), r, "Failed to set timeout timer: %m");
}
if (s->watchdog_event_source) {
/* extend watchdog if necessary. We've asked for an extended timeout so we
* shouldn't expect a watchdog timeout in the interval in between */
r = sd_event_source_get_time(s->watchdog_event_source, &current);
if (r < 0) {
log_unit_error_errno(UNIT(s), r, "Failed to retrieve watchdog timer: %m");
return;
}
if (extended > current) {
r = sd_event_source_set_time(s->watchdog_event_source, extended);
if (r < 0)
log_unit_warning_errno(UNIT(s), r, "Failed to set watchdog timer: %m");
}
}
}
}
static void service_reset_watchdog(Service *s) {
assert(s);
@ -3476,6 +3515,16 @@ static void service_notify_message(Unit *u, pid_t pid, char **tags, FDSet *fds)
}
}
/* Interpret EXTEND_TIMEOUT= */
e = strv_find_startswith(tags, "EXTEND_TIMEOUT_USEC=");
if (e) {
usec_t extend_timeout_usec;
if (safe_atou64(e, &extend_timeout_usec) < 0)
log_unit_warning(u, "Failed to parse EXTEND_TIMEOUT_USEC=%s", e);
else
service_extend_timeout(s, extend_timeout_usec);
}
/* Interpret WATCHDOG= */
if (strv_find(tags, "WATCHDOG=1"))
service_reset_watchdog(s);

View File

@ -0,0 +1 @@
../TEST-01-BASIC/Makefile

View File

@ -0,0 +1,55 @@
#!/bin/bash
set -v -x
rm -f /test.log
TL=/test.log.XXXXXXXX
function wait_for()
{
service=${1}
result=${2:-success}
time=${3:-45}
while [[ ! -f /${service}.terminated && ! -f /${service}.success && $time -gt 0 ]]
do
sleep 1
time=$(( $time - 1 ))
done
if [[ ! -f /${service}.${result} ]]
then
journalctl -u testsuite-${service/_/-}.service >> "${TL}"
fi
}
# This checks all stages, start, runtime and stop, can be extended by
# EXTEND_TIMEOUT_USEC
wait_for success_all
# These check that EXTEND_TIMEOUT_USEC that occurs at greater than the
# extend timeout interval but less then the stage limit (TimeoutStartSec,
# RuntimeMaxSec, TimeoutStopSec) still succeed.
wait_for success_start
wait_for success_runtime
wait_for success_stop
# These ensure that EXTEND_TIMEOUT_USEC will still timeout in the
# approprate stage, after the stage limit, when the EXTEND_TIMEOUT_USEC
# message isn't sent within the extend timeout interval.
wait_for fail_start startfail
wait_for fail_stop stopfail
wait_for fail_runtime runtimefail
if [[ -f "${TL}" ]]
then
# no mv
cp "${TL}" /test.log
exit 1
else
touch /testok
exit 0
fi

View File

@ -0,0 +1,70 @@
#!/bin/bash
set -x
set -e
set -o pipefail
# sleep interval (seconds)
sleep_interval=1
# extend_timeout_interval second(s)
extend_timeout_interval=1
# number of sleep_intervals before READY=1
start_intervals=10
# number of sleep_intervals before exiting
stop_intervals=10
# run intervals, number of sleep_intervals to run
run_intervals=7
# service name
SERVICE=unknown
while [ $# -gt 0 ];
do
eval ${1%=*}=${1#*=}
shift
done
# We convert to usec
extend_timeout_interval=$(( $extend_timeout_interval * 1000000 ))
trap "{ touch /${SERVICE}.terminated; exit 1; }" SIGTERM SIGABRT
rm -f /${SERVICE}.*
touch /${SERVICE}.startfail
systemd-notify EXTEND_TIMEOUT_USEC=$extend_timeout_interval
while [ $start_intervals -gt 0 ]
do
sleep $sleep_interval
start_intervals=$(( $start_intervals - 1 ))
systemd-notify EXTEND_TIMEOUT_USEC=$extend_timeout_interval
done
systemd-notify --ready --status="Waiting for your request"
touch /${SERVICE}.runtimefail
rm /${SERVICE}.startfail
systemd-notify EXTEND_TIMEOUT_USEC=$extend_timeout_interval
while [ $run_intervals -gt 0 ]
do
sleep $sleep_interval
run_intervals=$(( $run_intervals - 1 ))
systemd-notify EXTEND_TIMEOUT_USEC=$extend_timeout_interval
done
systemd-notify STOPPING=1
touch /${SERVICE}.stopfail
rm /${SERVICE}.runtimefail
systemd-notify EXTEND_TIMEOUT_USEC=$extend_timeout_interval
while [ $stop_intervals -gt 0 ]
do
sleep $sleep_interval
stop_intervals=$(( $stop_intervals - 1 ))
systemd-notify EXTEND_TIMEOUT_USEC=$extend_timeout_interval
done
touch /${SERVICE}.success
rm /${SERVICE}.stopfail
exit 0

View File

@ -0,0 +1,52 @@
#!/bin/bash
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh
set -e
TEST_DESCRIPTION="EXTEND_TIMEOUT_USEC=usec start/runtime/stop tests"
SKIP_INITRD=yes
. $TEST_BASE_DIR/test-functions
test_setup() {
create_empty_image
mkdir -p $TESTDIR/root
mount ${LOOPDEV}p1 $TESTDIR/root
# Create what will eventually be our root filesystem onto an overlay
(
eval $(udevadm info --export --query=env --name=${LOOPDEV}p2)
setup_basic_environment
for s in success-all success-start success-stop success-runtime \
fail-start fail-stop fail-runtime
do
cp testsuite-${s}.service ${initdir}/etc/systemd/system
done
cp testsuite.service ${initdir}/etc/systemd/system
cp extend_timeout_test_service.sh ${initdir}/
cp assess.sh ${initdir}/
cp $BUILD_DIR/systemd-notify ${initdir}/bin
cp $BUILD_DIR/src/shared/libsystemd-shared-*.so ${initdir}/usr/lib
setup_testsuite
) || return 1
# mask some services that we do not want to run in these tests
ln -s /dev/null $initdir/etc/systemd/system/systemd-hwdb-update.service
ln -s /dev/null $initdir/etc/systemd/system/systemd-journal-catalog-update.service
ln -s /dev/null $initdir/etc/systemd/system/systemd-networkd.service
ln -s /dev/null $initdir/etc/systemd/system/systemd-networkd.socket
ln -s /dev/null $initdir/etc/systemd/system/systemd-resolved.service
setup_nspawn_root
ddebug "umount $TESTDIR/root"
umount $TESTDIR/root
}
test_cleanup() {
return 0
}
do_test "$@"

View File

@ -0,0 +1,13 @@
[Unit]
Description=Testsuite: Fail Runtime (EXTEND_TIMEOUT_USEC Didn't occur in sufficient time after RuntimeSecMax.)
[Service]
# EXTEND_TIMEOUT_USEC on runtime start (0) and 7 seconds after. Systemd will expect one at 7+5 (extend_timeout_interval)
# seconds this won't happen until 7 + 7 (sleep interval) seconds. Therefore timeout at 12 seconds.
Type=notify
TimeoutStartSec=4
TimeoutStopSec=4
RuntimeMaxSec=10
ExecStart=/extend_timeout_test_service.sh SERVICE=fail_runtime extend_timeout_interval=5 sleep_interval=7 start_intervals=0 run_intervals=2 stop_intervals=0

View File

@ -0,0 +1,13 @@
[Unit]
Description=Testsuite: Fail Start (EXTEND_TIMEOUT_USEC Didn't occur in sufficient time after TimeoutStartSec.)
[Service]
# EXTEND_TIMEOUT_USEC on startup and 7 seconds from start. Systemd will expect one at 7+5 (extend_timeout_interval)
# seconds this won't happen until 7 + 7 (sleep interval) seconds. Therefore timeout at 12 seconds.
Type=notify
TimeoutStartSec=10
TimeoutStopSec=4
RuntimeMaxSec=4
ExecStart=/extend_timeout_test_service.sh SERVICE=fail_start extend_timeout_interval=5 sleep_interval=7 start_intervals=2 run_intervals=0 stop_intervals=0

View File

@ -0,0 +1,16 @@
[Unit]
Description=Testsuite: Fail Stop (EXTEND_TIMEOUT_USEC Didn't occur in sufficient time after TimeoutStopSec.)
[Service]
# EXTEND_TIMEOUT_USEC on stop (0) and 7 seconds after. Systemd will expect one at 7+5 (extend_timeout_interval)
# seconds this won't happen until 7 + 7 (sleep interval) seconds. Therefore timeout at 12 seconds.
Type=notify
TimeoutStartSec=4
TimeoutStopSec=10
RuntimeMaxSec=4
ExecStart=/extend_timeout_test_service.sh SERVICE=fail_stop extend_timeout_interval=5 sleep_interval=7 start_intervals=0 run_intervals=0 stop_intervals=2
# Due to 6041a7ee2c1bbff6301082f192fc1b0882400d42 SIGTERM isn't sent as the service shuts down with STOPPING=1
# This file makes the test assess.sh quicker by notifing it that this test has finished.
ExecStopPost=/bin/bash -c '[[ $SERVICE_RESULT == timeout && $EXIT_CODE == killed ]] && touch /fail_runtime.terminated'

View File

@ -0,0 +1,14 @@
[Unit]
Description=Testsuite: EXTEND_TIMEOUT_USEC Success - extend timeout on all services
[Service]
# Normal success - startup / runtime / shutdown all take 8 seconds which is within the EXTEND_TIMEOUT_USEC=4 seconds interval
# runtime is 8+8+8 seconds. so we are relying on the EXTEND_TIMEOUT_USEC to exceed all stages, Start, Runtime and Stop.
# success occurs after 24 seconds
Type=notify
TimeoutStartSec=4
TimeoutStopSec=4
RuntimeMaxSec=4
ExecStart=/extend_timeout_test_service.sh SERVICE=success_all extend_timeout_interval=4 sleep_interval=2 start_intervals=3 run_intervals=3 stop_intervals=3

View File

@ -0,0 +1,13 @@
[Unit]
Description=Testsuite: Success Runtime (EXTEND_TIMEOUT_USEC > WATCHDOG_USEC however < RuntimeMaxSec)
[Service]
# EXTEND_TIMEOUT_USEC=4 second once during runtime, but sleep for 6 seconds.
# Runtime is 6 seconds and < RuntimeMaxSec so still successful.
Type=notify
TimeoutStartSec=4
TimeoutStopSec=4
RuntimeMaxSec=8
ExecStart=/extend_timeout_test_service.sh SERVICE=success_runtime extend_timeout_interval=4 sleep_interval=6 start_intervals=0 run_intervals=1 stop_intervals=0

View File

@ -0,0 +1,13 @@
[Unit]
Description=Testsuite: Success Start (EXTEND_TIMEOUT_USEC > WATCHDOG_USEC however < TimeoutStartSec)
[Service]
# EXTEND_TIMEOUT_USEC=4 second interval once at startup, but sleep 6 seconds.
# Therefore startup is 6 seconds and < TimeoutStartSec so still successful.
Type=notify
TimeoutStartSec=8
TimeoutStopSec=4
RuntimeMaxSec=4
ExecStart=/extend_timeout_test_service.sh SERVICE=success_start extend_timeout_interval=4 sleep_interval=6 start_intervals=1 run_intervals=0 stop_intervals=0

View File

@ -0,0 +1,13 @@
[Unit]
Description=Testsuite: Success Stop (EXTEND_TIMEOUT_USEC > WATCHDOG_USEC however < TimeoutStopSec)
[Service]
# EXTEND_TIMEOUT_USEC=4 seconds once during shutdown, but sleep for 6 seconds.
# Therefore stop time is 6 seconds and < TimeoutStopSec so still successful.
Type=notify
TimeoutStartSec=4
TimeoutStopSec=8
RuntimeMaxSec=4
ExecStart=/extend_timeout_test_service.sh SERVICE=success_stop extend_timeout_interval=4 sleep_interval=6 start_intervals=0 run_intervals=0 stop_intervals=1

View File

@ -0,0 +1,18 @@
[Unit]
Description=Testsuite: Assess all other testsuite-*.services worked as expected
Wants=testsuite-success-all.service
Wants=testsuite-success-start.service
Wants=testsuite-success-runtime.service
Wants=testsuite-success-stop.service
Wants=testsuite-fail-start.service
Wants=testsuite-fail-stop.service
Wants=testsuite-fail-runtime.service
StopWhenUnneeded=yes
[Service]
Type=simple
ExecStartPre=/assess.sh
ExecStart=/bin/true