Commit Graph

757 Commits

Author SHA1 Message Date
Michael Marley 61927b9f11 manager: Fix HW watchdog when systemd starts before driver loaded
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized.  If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again.  This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).

This builds on work by @danc86 in https://github.com/systemd/systemd/pull/17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.

Fixes https://github.com/systemd/systemd/issues/17838

Co-authored-by: Dan Callaghan <djc@djc.id.au>
Co-authored-by: Michael Marley <michael@michaelmarley.com>
2020-12-09 11:47:22 +00:00
Yu Watanabe db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Lennart Poettering bfeb927a55 pid1: various minor watchdog modernizations
Just some clean-ups.
2020-10-30 13:02:06 +01:00
Zbigniew Jędrzejewski-Szmek 69c0807432
Merge pull request #15206 from anitazha/systoomd-v0
systemd-oomd
2020-10-15 14:16:52 +02:00
Lennart Poettering 670eed4c8c core: debug log about received fds 2020-10-14 16:41:37 +02:00
Anita Zhang fe8d22fb09 core: systemd-oomd pid1 integration 2020-10-07 17:12:24 -07:00
Zbigniew Jędrzejewski-Szmek 526e3cbbdd core: don't try to load units from non-absolute paths
The error message disagreed with the check that was actually performed. Adjust the check.
2020-09-23 14:49:37 +02:00
Lennart Poettering c6552f7cd5
Merge pull request #16955 from keszybz/test-execute-cleanup
One patch for test-execute and assorted cleanups
2020-09-08 18:33:12 +02:00
Zbigniew Jędrzejewski-Szmek 90e74a66e6 tree-wide: define iterator inside of the macro 2020-09-08 12:14:05 +02:00
Zbigniew Jędrzejewski-Szmek 9978e631cd core/manager: reindent table for readability 2020-09-04 18:14:26 +02:00
Zbigniew Jędrzejewski-Szmek 5b10116e49 core/{execute, manager}: reduce scope of iterator variables a bit 2020-09-04 18:14:26 +02:00
Christian Göttsche 45ae2f725e selinux: create systemd/notify socket with default SELinux context 2020-09-01 16:25:06 +02:00
Zbigniew Jędrzejewski-Szmek c2911d48ff Rework how we cache mtime to figure out if units changed
Instead of assuming that more-recently modified directories have higher mtime,
just look for any mtime changes, up or down. Since we don't want to remember
individual mtimes, hash them to obtain a single value.

This should help us behave properly in the case when the time jumps backwards
during boot: various files might have mtimes that in the future, but we won't
care. This fixes the following scenario:

We have /etc/systemd/system with T1. T1 is initially far in the past.
We have /run/systemd/generator with time T2.
The time is adjusted backwards, so T2 will be always in the future for a while.
Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'.
Nevertheless, T1 < T1' << T2.
We would consider our cache to be up-to-date, falsely.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 02103e5716 core: always try to reload not-found unit
This check was added in d904afc730. It would only
apply in the case where the cache hasn't been loaded yet. I think we pretty
much always have the cache loaded when we reach this point, but even if we
didn't, it seems better to try to reload the unit. So let's drop this check.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek c149d2b491 pid1: use the cache mtime not clock to "mark" load attempts
We really only care if the cache has been reloaded between the time when we
last attempted to load this unit and now. So instead of recording the actual
time we try to load the unit, just store the timestamp of the cache. This has
the advantage that we'll notice if the cache mtime jumps forward or backward.

Also rename fragment_loadtime to fragment_not_found_time. It only gets set when
we failed to load the unit and the old name was suggesting it is always set.

In https://bugzilla.redhat.com/show_bug.cgi?id=1871327
(and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930
and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try
to load a non-existent unit over and over from transaction_add_job_and_dependencies().
My understanding is that the clock was in the future during inital boot,
so cache_mtime is always in the future (since we don't touch the fs after initial boot),
so no matter how many times we try to load the unit and set
fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime,
so manager_unit_cache_should_retry_load() always returns true.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 81be23886d core: rename manager_unit_file_maybe_loadable_from_cache()
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
2020-08-31 20:53:38 +02:00
Lennart Poettering bb0c0d6f29 core: add credentials logic
Fixes: #15778 #16060
2020-08-25 19:45:35 +02:00
Zbigniew Jędrzejewski-Szmek 2aed63f427 tree-wide: fix spelling of "fallback"
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
2020-08-20 17:45:32 +02:00
Lennart Poettering 39cf0351c5 tree-wide: make use of new relative time events in sd-event.h 2020-07-28 11:24:55 +02:00
Lennart Poettering 8047ac8fdc core: clean more env vars from env block pid1 receives
We generally clean all env vars we use ourselves to communicate with out
childrens. We forgot some more recent additions however. Let's correct
that.
2020-07-23 18:30:15 +02:00
Zbigniew Jędrzejewski-Szmek 56a13a495c pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only
Read-only /var/tmp is more likely, because it's backed by a real device. /tmp
is (by default) backed by tmpfs, but it doesn't have to be. In both cases the
same consideration applies.

If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail
because we cannot create the subdir under /var/tmp to mount the private directory.
But many services actually don't require /var/tmp (either because they only use
it occasionally, or because they only use /tmp, or even because they don't use the
temporary directories at all, and PrivateTmp=yes is used to isolate them from
the rest of the system).

To handle both cases let's create a read-only directory under /run/systemd and
mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into
dumping too much data in /run.)

$ sudo systemd-run -t -p PrivateTmp=yes bash
Running as unit: run-u14.service
Press ^] three times within 1s to disconnect TTY.
[root@workstation /]# ls -l /tmp/
total 0
[root@workstation /]# ls -l /var/tmp/
total 0
[root@workstation /]# touch /tmp/f
[root@workstation /]# touch /var/tmp/f
touch: cannot touch '/var/tmp/f': Read-only file system

This commit has more changes than I like to put in one commit, but it's touching all
the same paths so it's hard to split.
exec_runtime_make() was using the wrong cleanup function, so the directory would be
left behind on error.
2020-07-14 19:47:15 +02:00
Luca Boccassi cda667722c core: refresh unit cache when building a transaction if UNIT_NOT_FOUND
When a command asks to load a unit directly and it is in state
UNIT_NOT_FOUND, and the cache is outdated, we refresh it and
attempto to load again.
Use the same logic when building up a transaction and a dependency in
UNIT_NOT_FOUND state is encountered.
Update the unit test to exercise this code path.
2020-07-07 10:09:24 +02:00
Luca Boccassi 7233e91af0 core: store timestamps of unit load attempts
When the system is under heavy load, it can happen that the unit cache
is refreshed for an unrelated reason (in the test I simulate this by
attempting to start a non-existing unit). The new unit is found and
accounted for in the cache, but it's ignored since we are loading
something else.
When we actually look for it, by attempting to start it, the cache is
up to date so no refresh happens, and starting fails although we have
it loaded in the cache.

When the unit state is set to UNIT_NOT_FOUND, mark the timestamp in
u->fragment_loadtime. Then when attempting to load again we can check
both if the cache itself needs a refresh, OR if it was refreshed AFTER
the last failed attempt that resulted in the state being
UNIT_NOT_FOUND.

Update the test so that this issue reproduces more often.
2020-06-30 16:50:00 +02:00
Zbigniew Jędrzejewski-Szmek f83803a649
Merge pull request #16238 from keszybz/set-handling-more
Fix handling of cases where a duplicate item is added to a set and related cleanups
2020-06-24 17:42:13 +02:00
Zbigniew Jędrzejewski-Szmek de7fef4b6e tree-wide: use set_ensure_put()
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.

Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
2020-06-22 16:32:37 +02:00
Franck Bui 43bba15ac8 pid1: rename manager_set_{show_status,watchdog}_overridden() into manager_override_(show_status,watchdog}
No functional change.
2020-06-11 12:00:32 +02:00
Franck Bui 3ceb347130 pid1: introduce an helper to handle the show-status marker
No functional change.
2020-06-11 12:00:16 +02:00
Franck Bui 44a419540e pid1: rework handling of m->show_status
The fact that m->show_status was serialized/deserialized made impossible any
further customisation of this setting via system.conf. IOW the value was
basically always locked unless it was changed via signals.

This patch reworks the handling of m->show_status but also makes sure that if a
new value was changed via the signal API then this value is kept and preserved
accross PID1 reexecuting or reloading.

Note: this effectively means that once the value is set via the signal
interface, it can be changed again only through the signal API.
2020-06-09 09:16:54 +02:00
Franck Bui 0d6d3cf055 pid1: rename manager_get_show_status() to manager_should_show_status()
The name 'manager_get_show_status()' suggests that the function simply reads
the property 'show_status' of the manager and hence returns a 'StatusType'
value.

However it was doing more than that since it contained the logic (based on
'show_status' but also on the state of the manager) to figure out if status
message could be emitted to the console.

Hence this patch renames the function to 'manager_should_show_status()'. The
previous name will be reused in a later patch to effectively return the value
of 'show_status' property.

No functional change.
2020-06-09 09:16:54 +02:00
Franck Bui b309078ab9 pid1: make more use of show_status_on()
No functional change.
2020-06-09 09:16:54 +02:00
Luca Boccassi d904afc730 core: reload cache if it's dirty when starting a UNIT_NOT_FOUND unit
The time-based cache allows starting a new unit without an expensive
daemon-reload, unless there was already a reference to it because of
a dependency or ordering from another unit.
If the cache is out of date, check again if we can load the
fragment.
2020-05-30 16:50:05 +02:00
Zbigniew Jędrzejewski-Szmek a4ac27c1af manager: free the jobs hashmap after we have no jobs
After a larger transaction, e.g. after bootup, we're left with an empty hashmap
with hundreds of buckets. Long-term, it'd be better to size hashmaps down when
they are less than 1/4 full, but even if we implement that, jobs hashmap is
likely to be empty almost always, so it seems useful to deallocate it once the
jobs count reaches 0.
2020-05-28 18:54:20 +02:00
Zbigniew Jędrzejewski-Szmek 3fb2326f3e shared/unit-file: make sure the old hashmaps and sets are freed upon replacement
Possibly fixes #15220. (There might be another leak. I'm still investigating.)

The leak would occur when the path cache was rebuilt. So in normal circumstances
it wouldn't be too bad, since usually the path cache is not rebuilt too often. But
the case in #15220, where new unit files are created in a loop and started, the leak
occurs once for each unit file:

$ for i in {1..300}; do cp ~/.config/systemd/user/test0001.service ~/.config/systemd/user/test$(printf %04d $i).service; systemctl --user start test$(printf %04d $i).service;done
2020-05-28 18:51:52 +02:00
Zbigniew Jędrzejewski-Szmek 24b4597064 core: minor simplification 2020-05-27 09:02:53 +02:00
Franck Bui b406c6d128 pid1: make manager_deserialize_{uid,gid}_refs() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 80f605c807 pid1: make manager_serialize_{uid,gid}_refs() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 06a4eb0737 pid1: make manager_vacuum_{uid,gid}_refs() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 1addc46c8c pid1: make manager_flip_auto_status() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 986935cf6a pid1: update manager settings on reload too
Most complexity of this patch is due to the fact that some manager settings
(basically the watchdog properties) can be set at runtime and in this case the
runtime values must be retained over daemon-reload or daemon-reexec.

For consistency sake, all watchdog properties behaves now the same way, that
is:

  - Values defined by config files can be overridden by writing the new value
    through their respective D-BUS properties. In this case, these values are
    preserved over reload/reexec until the special value '0' or USEC_INFINITY
    is written, which will then restore the last values loaded from the config
    files. If the restored value is '0' or 'USEC_INFINITY', the watchdogs will
    be disabled and the corresponding device will be closed.

  - Reading the properties from a user instance will return the USEC_INFINITY
    value as these properties are only meaningful for PID1.

  - Writing to one of the watchdog properties of a user instance's will be a
    NOP.

Fixes: #15453
2020-05-19 15:31:55 +02:00
Benjamin Robin 5151b4ccd2 core: Parse the tags list sooner, and use it for multiple function
- Parse the tags list using strv_split_newlines() which remove any
   unnecessary empty string at the end of the strv.
 - Use this parsed list for manager_process_barrier_fd() and every call
   to manager_invoke_notify_message().
 - This also allow to simplify the manager_process_barrier_fd() function.
2020-05-13 22:44:12 +02:00
Lennart Poettering fb29cdbef2 tree-wide: make sure our control buffers are properly aligned
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.

Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.

Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
2020-05-07 14:39:44 +02:00
Benjamin Robin 08f468567d tree-wide: Workaround -Wnonnull GCC bug
See issue #6119
2020-05-07 09:43:28 +02:00
Kumar Kartikeya Dwivedi 4f07ddfa9b
Introduce sd_notify_barrier
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.

systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.

Fixes: #2739
2020-05-01 03:22:47 +05:30
Lennart Poettering 3691bcf3c5 tree-wide: use recvmsg_safe() at various places
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.

This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.

(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
2020-04-23 09:41:47 +02:00
Lennart Poettering df3d3bdfe8 core: minor error code handling fixes 2020-04-22 08:56:05 +02:00
Alin Popa c5f8a179a2 watchdog: reduce watchdog pings in timeout interval
The watchdog ping is performed for every iteration of manager event
loop. This results in a lot of ioctls on watchdog device driver
especially during boot or if services are aggressively using sd_notify.
Depending on the watchdog device driver this may have performance
impact on embedded systems.
The patch skips sending the watchdog to device driver if the ping is
requested before half of the watchdog timeout.
2020-04-16 16:32:05 +02:00
root f9d29f6d06 fix manager_state 2020-04-07 15:27:50 +02:00
Vito Caputo b46c3e4913 *: use _cleanup_close_ with fdopen() where trivial
Also convert these to use take_fdopen().
2020-03-31 06:48:03 -07:00
Zbigniew Jędrzejewski-Szmek 385093b702 Split out generator directory setup to a src/core/generator-setup.c
Those functions have only one non-test user, so we can move them to src/core/.
2020-03-27 20:12:44 +01:00
Zbigniew Jędrzejewski-Szmek 51327bcc74 sd-path: rename the two functions
I think the two names were both pretty bad. They did not give a proper hint
what the difference between the two functions is, and sd_path_home sounds like
it is somehow related to /home or home directories or whatever, when in fact
both functions return the same set of paths as either a colon-delimited string
or a strv. "_strv" suffix is used by various functions in sd-bus, so let's
reuse that.

Those functions are not public yet, so let's rename.
2020-03-27 20:12:44 +01:00