Just some renaming, no change in behaviour.
Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
Fixes#5659.
Currently, if Persistent=true and the machine is off at the scheduled time of the timer unit, the timer
will be triggered immediately at the next boot even if RandomizedDelaySec= is specified.
As a result, if multiple timers meet that condition, they will be triggered at the same time and too
much CPU/IO work makes boot slow down.
With this commit, if the scheduled time of the persistent timer has already elapsed at boot,
set the time when systemd first started as the scheduled time and RandomizedDelaySec= is applied to it.
This allows clients to follow our internal state changes safely.
Previously, quick state changes (for example, when we restart a unit due
to Restart= after it quickly transitioned through DEAD/FAILED states)
would be coalesced into one bus signal event, with this change there's
the guarantee that all state changes after the unit was announced ones
are reflected on th bus.
Note we only do this kind of guaranteed flushing only for unit state
changes, not for other unit property changes, where clients still have
to expect coalescing. This is because the unit state is a very
important, high-level concept.
Fixes: #10185
We keep a mark whether a single-shot timer was triggered in the caller's
variable initial. When such a timer elapses while we are
serializing/deserializing the inner state, we consider the timer
incorrectly as elapsed and don't trigger it later.
This patch exploits last_trigger timestamp that we already serialize,
hence we can eliminate the argument initial completely.
A reproducer for OnBootSec= timers:
cat >repro.c <<EOD
/*
* Compile: gcc repro.c -o repro
* Run: ./repro
*/
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <time.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
char command[1024];
int pause;
struct timespec now;
while (1) {
usleep(rand() % 200000); // prevent periodic repeats
clock_gettime(CLOCK_MONOTONIC, &now);
printf("%i\n", now.tv_sec);
system("rm -f $PWD/mark");
snprintf(command, 1024, "systemd-run --user --on-boot=%i --timer-property=AccuracySec=100ms "
"touch $PWD/mark", now.tv_sec + 1);
system(command);
system("systemctl --user list-timers");
pause = (1000000000 - now.tv_nsec)/1000 - 70000; // fiddle to hit the middle of reloading
usleep(pause > 0 ? pause : 0);
system("systemctl --user daemon-reload");
sync();
sleep(2);
if (open("./mark", 0) < 0)
if (errno == ENOENT) {
printf("mark file does not exist\n");
break;
}
}
return 0;
}
EOD
There is difference between time set by the user and real elapsed time because of accuracy feature.
If you change the system date(or time) between these times, the timer drops.
You can easily reproduce it with the following command.
-----------------------------------------------------------
$ systemd-run --on-active=3s ls; sleep 3; date -s "`date`"
-----------------------------------------------------------
In the following command, the problem is rarely reproduced. But it exists.
---------------------------------------------------------------------------------------------
$ systemd-run --on-active=3s --timer-property=AccuracySec=1us ls ; sleep 1; date -s "`date`"
---------------------------------------------------------------------------------------------
Note : Global AccuracySec value.
----------------------------------------------------------------------
$ cat /etc/systemd/system.conf
DefaultTimerAccuracySec=1min
----------------------------------------------------------------------
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.
In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.
While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
This adds a flags parameter to unit_notify() which can be used to pass
additional notification information to the function. We the make the old
reload_failure boolean parameter one of these flags, and then add a new
flag that let's unit_notify() if we are configured to restart the
service.
Note that this adjusts behaviour of systemd to match what the docs say.
Fixes: #8398
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
E.g. if you have a monthly event and you set the computer clock back one
year, we can allow the next 12 monthly events to happen naturally. In fact
we already do this when you start a Persistent=yes timer, we just need to
apply the same logic when it's running and we notice the system clock
being set backwards.
On timejumps, including suspend, timer_time_change() calls for a
re-calculation of the next elapse. Sadly I'm not quite sure what the
intended effect of this was! Because it was not managing to fire
OnCalendar= timers which fired during the suspend... unless the timer had
already fired once before.
Reported, entirely correctly as far as I can see, on stackexchange:
https://unix.stackexchange.com/questions/351829/systemd-timer-that-expired-while-suspended
/* If we know the last time this was
* triggered, schedule the job based relative
- * to that. If we don't just start from
- * now. */
+ * to that. If we don't, just start from
+ * the activation time. */
The same code is called for both the initial calculation and this
re-calculation. If we're _not_ already active, then this is before the
activation time has been recorded in the unit, so just use the current
time as before. The new code is mechanically adapted from the same
logic for `OnActiveSec=` (case TIMER_ACTIVE in the code which follows).
Tested with `date --set`.
Motivations:
* Rotate monitoring data from Atop into files which are named per-day.
Fedora currently implements this with a cron job that runs at midnight,
but that didn't handle suspend correctly either.
* unbound-anchor.timer on Fedora, is used to update DNSSEC "root trust
anchor" daily, before the TTL expires. It uses OnCalendar=daily
AccuracySec=24h. Which is a bit suspect because the TTL is 2 days, but I
think it has the right general idea.
None of the other timer settings are correct, because they would not
account for time spent in suspend. Unless you set WakeSystem
(this feature is currently undocumented).
* So in general, we can expect to see people using OnCalendar= for the same
cases as cron.daily and cron.monthly. Which use anacron to keep track of
jobs which should be run even if the system was down at the time.
Timers which are configured to run more frequently than that, are
unlikely to mind if they get run slightly more often that the writer
realized, relative to the amount of time the system was really running.
* From the user report above: "I only want to use remind to show a desktop
notification, it seems excessive to wake up the computer for that. Also,
I would like to get the reminder first thing in the morning, so the
OnActiveSec doesn't help with that."
We have two variables `b` and `base`. `b` is declared within limited
scope; `base` is declared at the top of the function. However `base`
is actually only used within a scope which is exclusive of `b`. Clarify
by moving `base` inside the limited scope as well.
(Also `base` doesn't need initializing any more than `b` does. The
declaration of `base` is now immediately followed by a case analysis of
`v->base`, which serves almost exclusively to determine the value of
`base`).
When a unit job finishes early (e.g. when fork(2) fails) triggered unit goes
through states
stopped->failed (or failed->failed),
in case a ExecStart= command fails unit passes through
stopped->starting->failed.
The former transition doesn't result in unit active/inactive timestamp being
updated and timer (OnUnitActiveSec= or OnUnitInactiveSec=) would use an expired
timestamp triggering immediately again (repeatedly).
This patch exploits timer's last trigger timestamp to ensure the timer isn't
triggered more frequently than OnUnitActiveSec=/OnUnitInactiveSec= period.
Steps to reproduce:
0) Create sample units:
cat >~/.config/systemd/user/looper.service <<EOD
[Service]
ExecStart=/usr/bin/sleep 2
EOD
cat >~/.config/systemd/user/looper.timer <<EOD
[Timer]
AccuracySec=5
OnUnitActiveSec=5
EOD
1) systemctl --user daemon-reload
2) systemctl --user start looper.timer
# to have first activation timestamp/sentinel
systemctl --user start looper.service
o Observe the service is being regularly triggered.
3) systemctl set-property user@$UID.service TasksMax=2
o Observe the tight looping as long as the looper.service cannot be started.
Ref: #5969
This makes things quite a bit more systematic I think, as we can
systematically operate on all timestamps, for example for the purpose of
serialization/deserialization.
This rework doesn't necessarily make things shorter in the individual
lines, but it does reduce the line count a bit.
(This is useful particularly when we want to add additional timestamps,
for example to solve #7023)
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.
The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.
Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.
Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:
1. We can fix UDEV_WANTS dependency generation: so far we kept adding
dependencies configured that way, but if a device lost such a
dependency we couldn't them again as there was no scheme for removing
of dependencies in place.
2. We can implement "pin-pointed" reload of unit files. If we know what
dependencies were created as result of configuration in a unit file,
then we know what to flush out when we want to reload it.
3. It's useful for debugging: "systemd-analyze dump" now shows
this information, helping substantially with understanding how
systemd's dependency tree came to be the way it came to be.
This slightly changes how we log about failures. Previously,
service_enter_dead() would log that a service unit failed along with its
result code, and unit_notify() would do this again but without the
result code. For other unit types only the latter would take effect.
This cleans this up: we keep the message in unit_notify() only for debug
purposes, and add type-specific log lines to all our unit types that can
fail, and always place them before unit_notify() is invoked.
Or in other words: the duplicate log message for service units is
removed, and all other unit types get a more useful line with the
precise result code.
Also, use the mtime rather than the atime of the timestamp file. While
the atime is not completely wrong, the mtime appears more appropriate
as that's what we actually explicitly change, and is not effected by
mere reading.
Fixes: #6821
This reworks timer_enter_waiting() in a couple of ways in order to clean
it up a bit and fix#5629.
Most importantly, we previously we initialized ts_monotonic to either
the current time in CLOCK_MONOTONIC or in CLOCK_BOOTTIME, depending on
t->wake_system. Then given specific conditions we'd use this time as
base for our timers. And afterwards, if t->wake_system was on we'd
convetr the resulting value from CLOCK_MONOTONIC to CLOCK_BOOTTIME again
— which of course is wrong since we already were in CLOCK_BOOTTIME! This
fixes this logic, by using a triple timestamp so that we always have the
right base around, and initially only calculate in CLOCK_MONOTONIC and
only convert as last step.
Conversion between the clocks is now done with the generic
usec_shift_clock(), and additions via usec_add() making these
calculations a bit safer.
Fixes: #5629
* logind: trivial simplification
free_and_strdup() handles NULL arg, so make use of that.
* boot: fix two typos
* pid1: rewrite check in ignore_proc() to not check condition twice
It's harmless, but it seems nicer to evaluate a condition just a single time.
* core/execute: reformat exec_context_named_iofds() for legibility
* core/execute.c: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.
CID #1368227.
* core/timer: use (void)
CID #1368234.
* journal-file: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.
CID #1368236.
* shared/cgroup-show: use (void)
CID #1368243.
* cryptsetup: do not return uninitialized value on error
CID #1368416.
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.
(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
When the unit that is triggered by a timer is started and running,
we transition to "running" state, and the timer will not elapse again
until the unit has finished running. In this state "systemctl list-timers"
would display the previously calculated next elapse time, which would
now of course be in the past, leading to nonsensical values.
Simply set the next elapse to infinity, which causes list-timers to
show n/a. We cannot specify when the next elapse will happen, possibly
never.
Fixes#4031.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.
With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.
This addresses the issue pointed out here:
https://github.com/systemd/systemd/pull/3818#discussion_r73433520
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.
The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.
Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.
Note that restores the "start-limit-hit" result code that existed before
6bf0f408e4 already in the service code. However,
it's not introduced for all units that have a result code concept.
Fixes#3166.
Before we invoke now(CLOCK_BOOTTIME), let's make sure we actually have that
clock, since now() will otherwise hit an assert.
Specifically, let's refuse CLOCK_BOOTTIME early in sd-event if the kernel
doesn't actually support it.
This is a follow-up for #3037, and specifically:
https://github.com/systemd/systemd/pull/3037#issuecomment-210199167
It was added in 2.6.39, and causes an assertion to fail when running in mock
hosted on 2.6.32-based RHEL-6:
Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at systemd/src/basic/time-util.c:70, function now(). Aborting.
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.
Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.