Commit Graph

740 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 2aed63f427 tree-wide: fix spelling of "fallback"
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
2020-08-20 17:45:32 +02:00
Lennart Poettering 39cf0351c5 tree-wide: make use of new relative time events in sd-event.h 2020-07-28 11:24:55 +02:00
Lennart Poettering 8047ac8fdc core: clean more env vars from env block pid1 receives
We generally clean all env vars we use ourselves to communicate with out
childrens. We forgot some more recent additions however. Let's correct
that.
2020-07-23 18:30:15 +02:00
Zbigniew Jędrzejewski-Szmek 56a13a495c pid1: create ro private tmp dirs when /tmp or /var/tmp is read-only
Read-only /var/tmp is more likely, because it's backed by a real device. /tmp
is (by default) backed by tmpfs, but it doesn't have to be. In both cases the
same consideration applies.

If we boot with read-only /var/tmp, any unit with PrivateTmp=yes would fail
because we cannot create the subdir under /var/tmp to mount the private directory.
But many services actually don't require /var/tmp (either because they only use
it occasionally, or because they only use /tmp, or even because they don't use the
temporary directories at all, and PrivateTmp=yes is used to isolate them from
the rest of the system).

To handle both cases let's create a read-only directory under /run/systemd and
mount it as the private /tmp or /var/tmp. (Read-only to not fool the service into
dumping too much data in /run.)

$ sudo systemd-run -t -p PrivateTmp=yes bash
Running as unit: run-u14.service
Press ^] three times within 1s to disconnect TTY.
[root@workstation /]# ls -l /tmp/
total 0
[root@workstation /]# ls -l /var/tmp/
total 0
[root@workstation /]# touch /tmp/f
[root@workstation /]# touch /var/tmp/f
touch: cannot touch '/var/tmp/f': Read-only file system

This commit has more changes than I like to put in one commit, but it's touching all
the same paths so it's hard to split.
exec_runtime_make() was using the wrong cleanup function, so the directory would be
left behind on error.
2020-07-14 19:47:15 +02:00
Luca Boccassi cda667722c core: refresh unit cache when building a transaction if UNIT_NOT_FOUND
When a command asks to load a unit directly and it is in state
UNIT_NOT_FOUND, and the cache is outdated, we refresh it and
attempto to load again.
Use the same logic when building up a transaction and a dependency in
UNIT_NOT_FOUND state is encountered.
Update the unit test to exercise this code path.
2020-07-07 10:09:24 +02:00
Luca Boccassi 7233e91af0 core: store timestamps of unit load attempts
When the system is under heavy load, it can happen that the unit cache
is refreshed for an unrelated reason (in the test I simulate this by
attempting to start a non-existing unit). The new unit is found and
accounted for in the cache, but it's ignored since we are loading
something else.
When we actually look for it, by attempting to start it, the cache is
up to date so no refresh happens, and starting fails although we have
it loaded in the cache.

When the unit state is set to UNIT_NOT_FOUND, mark the timestamp in
u->fragment_loadtime. Then when attempting to load again we can check
both if the cache itself needs a refresh, OR if it was refreshed AFTER
the last failed attempt that resulted in the state being
UNIT_NOT_FOUND.

Update the test so that this issue reproduces more often.
2020-06-30 16:50:00 +02:00
Zbigniew Jędrzejewski-Szmek f83803a649
Merge pull request #16238 from keszybz/set-handling-more
Fix handling of cases where a duplicate item is added to a set and related cleanups
2020-06-24 17:42:13 +02:00
Zbigniew Jędrzejewski-Szmek de7fef4b6e tree-wide: use set_ensure_put()
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.

Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
2020-06-22 16:32:37 +02:00
Franck Bui 43bba15ac8 pid1: rename manager_set_{show_status,watchdog}_overridden() into manager_override_(show_status,watchdog}
No functional change.
2020-06-11 12:00:32 +02:00
Franck Bui 3ceb347130 pid1: introduce an helper to handle the show-status marker
No functional change.
2020-06-11 12:00:16 +02:00
Franck Bui 44a419540e pid1: rework handling of m->show_status
The fact that m->show_status was serialized/deserialized made impossible any
further customisation of this setting via system.conf. IOW the value was
basically always locked unless it was changed via signals.

This patch reworks the handling of m->show_status but also makes sure that if a
new value was changed via the signal API then this value is kept and preserved
accross PID1 reexecuting or reloading.

Note: this effectively means that once the value is set via the signal
interface, it can be changed again only through the signal API.
2020-06-09 09:16:54 +02:00
Franck Bui 0d6d3cf055 pid1: rename manager_get_show_status() to manager_should_show_status()
The name 'manager_get_show_status()' suggests that the function simply reads
the property 'show_status' of the manager and hence returns a 'StatusType'
value.

However it was doing more than that since it contained the logic (based on
'show_status' but also on the state of the manager) to figure out if status
message could be emitted to the console.

Hence this patch renames the function to 'manager_should_show_status()'. The
previous name will be reused in a later patch to effectively return the value
of 'show_status' property.

No functional change.
2020-06-09 09:16:54 +02:00
Franck Bui b309078ab9 pid1: make more use of show_status_on()
No functional change.
2020-06-09 09:16:54 +02:00
Luca Boccassi d904afc730 core: reload cache if it's dirty when starting a UNIT_NOT_FOUND unit
The time-based cache allows starting a new unit without an expensive
daemon-reload, unless there was already a reference to it because of
a dependency or ordering from another unit.
If the cache is out of date, check again if we can load the
fragment.
2020-05-30 16:50:05 +02:00
Zbigniew Jędrzejewski-Szmek a4ac27c1af manager: free the jobs hashmap after we have no jobs
After a larger transaction, e.g. after bootup, we're left with an empty hashmap
with hundreds of buckets. Long-term, it'd be better to size hashmaps down when
they are less than 1/4 full, but even if we implement that, jobs hashmap is
likely to be empty almost always, so it seems useful to deallocate it once the
jobs count reaches 0.
2020-05-28 18:54:20 +02:00
Zbigniew Jędrzejewski-Szmek 3fb2326f3e shared/unit-file: make sure the old hashmaps and sets are freed upon replacement
Possibly fixes #15220. (There might be another leak. I'm still investigating.)

The leak would occur when the path cache was rebuilt. So in normal circumstances
it wouldn't be too bad, since usually the path cache is not rebuilt too often. But
the case in #15220, where new unit files are created in a loop and started, the leak
occurs once for each unit file:

$ for i in {1..300}; do cp ~/.config/systemd/user/test0001.service ~/.config/systemd/user/test$(printf %04d $i).service; systemctl --user start test$(printf %04d $i).service;done
2020-05-28 18:51:52 +02:00
Zbigniew Jędrzejewski-Szmek 24b4597064 core: minor simplification 2020-05-27 09:02:53 +02:00
Franck Bui b406c6d128 pid1: make manager_deserialize_{uid,gid}_refs() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 80f605c807 pid1: make manager_serialize_{uid,gid}_refs() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 06a4eb0737 pid1: make manager_vacuum_{uid,gid}_refs() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 1addc46c8c pid1: make manager_flip_auto_status() static
No functional change.
2020-05-19 15:48:54 +02:00
Franck Bui 986935cf6a pid1: update manager settings on reload too
Most complexity of this patch is due to the fact that some manager settings
(basically the watchdog properties) can be set at runtime and in this case the
runtime values must be retained over daemon-reload or daemon-reexec.

For consistency sake, all watchdog properties behaves now the same way, that
is:

  - Values defined by config files can be overridden by writing the new value
    through their respective D-BUS properties. In this case, these values are
    preserved over reload/reexec until the special value '0' or USEC_INFINITY
    is written, which will then restore the last values loaded from the config
    files. If the restored value is '0' or 'USEC_INFINITY', the watchdogs will
    be disabled and the corresponding device will be closed.

  - Reading the properties from a user instance will return the USEC_INFINITY
    value as these properties are only meaningful for PID1.

  - Writing to one of the watchdog properties of a user instance's will be a
    NOP.

Fixes: #15453
2020-05-19 15:31:55 +02:00
Benjamin Robin 5151b4ccd2 core: Parse the tags list sooner, and use it for multiple function
- Parse the tags list using strv_split_newlines() which remove any
   unnecessary empty string at the end of the strv.
 - Use this parsed list for manager_process_barrier_fd() and every call
   to manager_invoke_notify_message().
 - This also allow to simplify the manager_process_barrier_fd() function.
2020-05-13 22:44:12 +02:00
Lennart Poettering fb29cdbef2 tree-wide: make sure our control buffers are properly aligned
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.

Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.

Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
2020-05-07 14:39:44 +02:00
Benjamin Robin 08f468567d tree-wide: Workaround -Wnonnull GCC bug
See issue #6119
2020-05-07 09:43:28 +02:00
Kumar Kartikeya Dwivedi 4f07ddfa9b
Introduce sd_notify_barrier
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.

systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.

Fixes: #2739
2020-05-01 03:22:47 +05:30
Lennart Poettering 3691bcf3c5 tree-wide: use recvmsg_safe() at various places
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.

This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.

(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
2020-04-23 09:41:47 +02:00
Lennart Poettering df3d3bdfe8 core: minor error code handling fixes 2020-04-22 08:56:05 +02:00
Alin Popa c5f8a179a2 watchdog: reduce watchdog pings in timeout interval
The watchdog ping is performed for every iteration of manager event
loop. This results in a lot of ioctls on watchdog device driver
especially during boot or if services are aggressively using sd_notify.
Depending on the watchdog device driver this may have performance
impact on embedded systems.
The patch skips sending the watchdog to device driver if the ping is
requested before half of the watchdog timeout.
2020-04-16 16:32:05 +02:00
root f9d29f6d06 fix manager_state 2020-04-07 15:27:50 +02:00
Vito Caputo b46c3e4913 *: use _cleanup_close_ with fdopen() where trivial
Also convert these to use take_fdopen().
2020-03-31 06:48:03 -07:00
Zbigniew Jędrzejewski-Szmek 385093b702 Split out generator directory setup to a src/core/generator-setup.c
Those functions have only one non-test user, so we can move them to src/core/.
2020-03-27 20:12:44 +01:00
Zbigniew Jędrzejewski-Szmek 51327bcc74 sd-path: rename the two functions
I think the two names were both pretty bad. They did not give a proper hint
what the difference between the two functions is, and sd_path_home sounds like
it is somehow related to /home or home directories or whatever, when in fact
both functions return the same set of paths as either a colon-delimited string
or a strv. "_strv" suffix is used by various functions in sd-bus, so let's
reuse that.

Those functions are not public yet, so let's rename.
2020-03-27 20:12:44 +01:00
Benjamin Berg cccf570355 core: Move environment generator path lookup into path-lookup.c 2020-03-04 11:24:33 +01:00
Zbigniew Jędrzejewski-Szmek 52c222db11
Merge pull request #14992 from keszybz/syslog-address-length-fix
Syslog address length fix
2020-03-02 21:31:24 +01:00
Zbigniew Jędrzejewski-Szmek f36a9d5909 tree-wide: use the return value from sockaddr_un_set_path()
It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().

socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
2020-03-02 15:55:44 +01:00
Zbigniew Jędrzejewski-Szmek 0d066dd1a4 pid1: add new mode systemd.show-status=error and use it when 'quiet' is passed
systemd.show-status=error is useful for the case where people care about errors
only.

If people want to have a quiet boot, they most likely don't want to see all
status output even if there is a delay in boot, so make "quiet" imply
systemd.show-status=error instead of systemd.show-status=auto.

Fixes #14976.
2020-03-01 11:48:23 +01:00
Zbigniew Jędrzejewski-Szmek 5bcf34ebf3 pid1: when showing error status, do not switch to status=temporary
We would flip to status=temporary mode on the first error, and then switch back
to status=auto after the initial transaction was done. This isn't very useful,
because usually all the messages about successfully started units and not
related to the original failure. In fact, all those messages most likely cause
the information about the prime error to scroll off screen. And if the user
requested quiet boot, there's no reason to think that they care about those
success messages.

Also, when logging about dependency cycles, treat this similarly to a unit
error and show the message even if the status is "soft disabled" (before we
wouldn't show it in that case).
2020-03-01 11:42:42 +01:00
Zbigniew Jędrzejewski-Szmek 1b4154a891 pid1: make cylon timeout significantly bigger when not showing any messages
When we are booting with show-status=on, normally new status updates happen a
few times per second. Thus, it is reasonable to start showing the cylon eye
after 5 s, because that means a significant delay has happened. When we are
running with show-status=off or show-status=auto (and no error had occured),
the user is expecting maybe 15 to 90 seconds with no output (because that's
usually how long the whole boot takes). So we shouldn't bother the user with
information about a few seconds of delay. Let's make the timeout 25s if we are
not showing any messages.

Conversly, when we are outputting status messages, we can show the cylon eye
with a shorter delay, now that we removed the connection to enablement status.
Let's make this 2s, so users get feedback about delays more quickly.
2020-03-01 11:42:35 +01:00
Zbigniew Jędrzejewski-Szmek ef15d3e1ab pid1: touch the /run/systemd/show-status just once
We know if we created the file before, no need to repeat the operation. The
state in /run should always match our internal state. Since we call
manager_set_show_status() quite often internally, this saves quite a few
pointless syscalls.
2020-03-01 11:42:26 +01:00
Zbigniew Jędrzejewski-Szmek 7365a29670 pid1: when printing status message status, give reason 2020-03-01 11:42:19 +01:00
Zbigniew Jędrzejewski-Szmek 5ca02bfc39 core: fix message about show status state
We would say "Enabling" also for SHOW_STATUS_AUTO, which is actually
"soft off". So just print the exact state to make things easier to understand.
Also add a helper function to avoid repeating the enum value list.

For #14814.
2020-03-01 11:42:12 +01:00
Lennart Poettering 96462ae998 core: show the UID we cannot parse 2020-01-21 11:51:26 +01:00
Lennart Poettering 19d22d433d core: add user/group resolution varlink interface to PID 1 2020-01-15 15:28:55 +01:00
Lennart Poettering fc67a943d9 core: drop initial ListNames() bus call from PID 1
Previously, when first connecting to the bus after connecting to it we'd
issue a ListNames() bus call to the driver to figure out which bus names
are currently active. This information was then used to initialize the
initial state for services that use BusName=.

This change removes the whole code for this and replaces it with
something vastly simpler.

First of all, the ListNames() call was issues synchronosuly, which meant
if dbus was for some reason synchronously calling into PID1 for some
reason we'd deadlock. As it turns out there's now a good chance it does:
the nss-systemd userdb hookup means that any user dbus-daemon resolves
might result in a varlink call into PID 1, and dbus resolves quite a lot
of users while parsing its policy. My original goal was to fix this
deadlock.

But as it turns out we don't need the ListNames() call at all anymore,
since #12957 has been merged. That PR was supposed to fix a race where
asynchronous installation of bus matches would cause us missing the
initial owner of a bus name when a service is first started. It fixed it
(correctly) by enquiring with GetOwnerName() who currently owns the
name, right after installing the match. But this means whenever we start watching a bus name we anyway
issue a GetOwnerName() for it, and that means also when first connecting
to the bus we don't need to issue ListNames() anymore since that just
tells us the same info: which names are currently owned.

hence, let's drop ListNames() and instead make better use of the
GetOwnerName() result: if it failed the name is not owned.

Also, while we are at it, let's simplify the unit's owner_name_changed()
callback(): let's drop the "old_owner" argument. We never used that
besides logging, and it's hard to synthesize from just the return of a
GetOwnerName(), hence don't bother.
2020-01-06 15:21:47 +01:00
Anita Zhang 2f8c48b605 core,journal: export user units' InvocationID and use as _SYSTEMD_INVOCATION_ID
Write a user unit's invocation ID to /run/user/<uid>/systemd/units/ similar
to how a system unit's invocation ID is written to /run/systemd/units/.

This lets the journal read and add a user unit's invocation ID to the
_SYSTEMD_INVOCATION_ID field of logs instead of the user manager's
invocation ID.

Fixes #12474
2019-12-19 17:42:17 -08:00
Zbigniew Jędrzejewski-Szmek 3a0f06c41a core: make TasksMax a partially dynamic property
TasksMax= and DefaultTasksMax= can be specified as percentages. We don't
actually document of what the percentage is relative to, but the implementation
uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max,
and /sys/fs/cgroup/pids.max (when present). When the value is a percentage,
we immediately convert it to an absolute value. If the limit later changes
(which can happen e.g. when systemd-sysctl runs), the absolute value becomes
outdated.

So let's store either the percentage or absolute value, whatever was specified,
and only convert to an absolute value when the value is used. For example, when
starting a unit, the absolute value will be calculated when the cgroup for
the unit is created.

Fixes #13419.
2019-11-14 18:41:54 +01:00
Zbigniew Jędrzejewski-Szmek 754499fab2
Merge pull request #13904 from keur/job_mode_triggering
Job mode triggering
2019-11-07 08:36:26 +01:00
Kevin Kuehler 1f0f9f21c1 core: Add triggering job mode
When used with systemctl stop, follows TRIGGERED_BY dependencies and
adds them to the same transaction.

Fixes: #3043
2019-11-05 11:17:38 -08:00
Yu Watanabe 021cdf8330 tree-wide: drop signal.h when signal-util.h is included 2019-11-04 00:30:32 +09:00