Commit Graph

273 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek db868d45f9 core: make unit_set_invocation_id static
No functional change.
2020-05-28 18:47:01 +02:00
Lennart Poettering 4c42543429 core: also log about left-over processes during unit stop
Only log at LOG_INFO level, i.e. make this informational. During start
let's leave it at LOG_WARNING though.

Of course, it's ugly leaving processes around like that either in start
or in stop, but at start its more dangerous than on stop, so be tougher
there.
2020-05-26 23:52:13 +02:00
Lennart Poettering e1e214c56b
Merge pull request #15265 from fbuihuu/mount-fixes
Mount fixes
2020-05-15 11:13:45 +02:00
Benjamin Robin fcee2755ec core: Update prototype of notify_message, tags list is read only
Indicates that the tags list cannot be modified by notify_message function.
Since the tags list is created only once for multiple call to
notify_message functions.
2020-05-10 18:58:03 +02:00
Zbigniew Jędrzejewski-Szmek f6e9aa9e45 pid1: convert to the new scheme
In all the other cases, I think the code was clearer with the static table.
Here, not so much. And because of the existing dump code, the vtables cannot
be made static and need to remain exported. I still think it's worth to do the
change to have the cmdline introspection, but I'm disappointed with how this
came out.
2020-05-05 22:40:37 +02:00
Michal Sekletár d9e45bc3ab core: introduce support for cgroup freezer
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
2020-04-30 19:02:51 +02:00
Luca Boccassi c5bc2c01ee core: add log_get_max_level check optimization in log_unit_full
Just as log_full already does, check if the log level would result in
logging immediately in the macro in order to avoid doing
unnecessary work that adds up in hot spots.
2020-04-21 18:05:24 +01:00
Franck Bui b862c25716 device: drop refuse_after
Scheduling devices after a given unit can be useful to start device *jobs* at a
specific time in the transaction, see commit 4195077ab4.

This (hidden) change was introduced by commit eef85c4a3f.
2020-04-01 10:35:14 +02:00
Zbigniew Jędrzejewski-Szmek eda0cbf071
Use Finished instead of Started for Type=oneshot services (#14851)
UnitStatusMessageFormats.finished_job, if present,
will be called with the same arguments as
job_get_done_status_message_format() to provide a format string
appropriate for the context

This commit replaces "Started" with "Finished" for started oneshot
units, as mentioned in the referenced issue

Closes #2458.
2020-03-05 17:24:19 +01:00
Zbigniew Jędrzejewski-Szmek 5bcf34ebf3 pid1: when showing error status, do not switch to status=temporary
We would flip to status=temporary mode on the first error, and then switch back
to status=auto after the initial transaction was done. This isn't very useful,
because usually all the messages about successfully started units and not
related to the original failure. In fact, all those messages most likely cause
the information about the prime error to scroll off screen. And if the user
requested quiet boot, there's no reason to think that they care about those
success messages.

Also, when logging about dependency cycles, treat this similarly to a unit
error and show the message even if the status is "soft disabled" (before we
wouldn't show it in that case).
2020-03-01 11:42:42 +01:00
Christian Göttsche f156e60c66 core: unit_label_path(): take const unit 2020-02-04 18:36:19 +01:00
Lennart Poettering 44b0d1fd59 core: add implicit ordering dep on blockdev@.target from all mount units
This way we shuld be able to order mounts properly against their backing
services in case complex storage is used (i.e. LUKS), even if the device
path used for mounting the devices is different from the expected device
node of the backing service.

Specifically, if we have a LUKS device /dev/mapper/foo that is mounted
by this name all is trivial as the relationship can be established a
priori easily. But if it is mounted via a /dev/disk/by-uuid/ symlink or
similar we only can relate the device node generated to the one mounted
at the moment the device is actually established. That's because the
UUID of the fs is stored inside the encrypted volume and thus not
knowable until the volume is set up. This patch tries to improve on this
situation: a implicit After=blockdev@.target dependency is generated for
all mounts, based on the data from /proc/self/mountinfo, which should be
the actual device node, with all symlinks resolved. This means that as
soon as the mount is established the ordering via blockdev@.target will
work, and that means during shutdown it is honoured, which is what we
are looking for.

Note that specifying /etc/fstab entries via UUID= for LUKS devices still
sucks and shouldn't be done, because it means we cannot know which LUKS
device to activate to make an fs appear, and that means unless the
volume is set up at boot anyway we can't really handle things
automatically when putting together transactions that need the mount.
2020-01-21 20:23:44 +01:00
Lennart Poettering b90cf10245 core: make a number of functions not used externally static 2020-01-21 11:51:45 +01:00
Lennart Poettering eea45a3399
Merge pull request #14424 from poettering/watch-bus-name-rework
pid1: simplify drastically how we watch bus names for service's BusName= setting
2020-01-15 11:46:11 +01:00
Lennart Poettering c80a9a33d0 core: clearly refuse OnFailure= deps on units that can't fail
Similar, refuse triggering deps on units that cannot trigger.

And rework how we ignore After= dependencies on device units, to work
the same way.

See: #14142
2020-01-09 11:03:53 +01:00
Lennart Poettering fc67a943d9 core: drop initial ListNames() bus call from PID 1
Previously, when first connecting to the bus after connecting to it we'd
issue a ListNames() bus call to the driver to figure out which bus names
are currently active. This information was then used to initialize the
initial state for services that use BusName=.

This change removes the whole code for this and replaces it with
something vastly simpler.

First of all, the ListNames() call was issues synchronosuly, which meant
if dbus was for some reason synchronously calling into PID1 for some
reason we'd deadlock. As it turns out there's now a good chance it does:
the nss-systemd userdb hookup means that any user dbus-daemon resolves
might result in a varlink call into PID 1, and dbus resolves quite a lot
of users while parsing its policy. My original goal was to fix this
deadlock.

But as it turns out we don't need the ListNames() call at all anymore,
since #12957 has been merged. That PR was supposed to fix a race where
asynchronous installation of bus matches would cause us missing the
initial owner of a bus name when a service is first started. It fixed it
(correctly) by enquiring with GetOwnerName() who currently owns the
name, right after installing the match. But this means whenever we start watching a bus name we anyway
issue a GetOwnerName() for it, and that means also when first connecting
to the bus we don't need to issue ListNames() anymore since that just
tells us the same info: which names are currently owned.

hence, let's drop ListNames() and instead make better use of the
GetOwnerName() result: if it failed the name is not owned.

Also, while we are at it, let's simplify the unit's owner_name_changed()
callback(): let's drop the "old_owner" argument. We never used that
besides logging, and it's hard to synthesize from just the return of a
GetOwnerName(), hence don't bother.
2020-01-06 15:21:47 +01:00
Franck Bui d336ba9fa6 core: drop 'wants' parameter from unit_add_node_dependency()
Since Wants dependency is no more automagically added to swap and mount units,
this parameter is no more used hence this patch drops it.
2019-10-28 18:51:23 +01:00
Zbigniew Jędrzejewski-Szmek c362077087 core: turn unit_load_fragment_and_dropin_optional() into a flag
unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional()
are really the same, with one minor difference in behaviour. Let's drop
the second function.

"_optional" in the name suggests that it's the "dropin" part that is optional.
(Which it is, but in this case, we mean the fragment to be optional.)
I think the new version with a flag is easier to understand.
2019-10-11 10:45:33 +02:00
Zbigniew Jędrzejewski-Szmek a232ebcc2c core: add support for RestartKillSignal= to override signal used for restart jobs
v2:
- if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary
  to preserve backwards compatibility (and keep KillSignal= generally useful).
2019-10-02 14:01:25 +02:00
Zbigniew Jędrzejewski-Szmek 28a2dfe801 core: add helper function to check job status
Since job.h includes unit.h, and unit.h includes job.h, imports need to
be adjusted to make sure unit.h is included first if the helper is used.
2019-10-01 15:05:27 +02:00
Zbigniew Jędrzejewski-Szmek 5ac1530eca tree-wide: say "ratelimit" not "rate_limit"
"ratelimit" is a real word, so we don't need to use the other form anywhere.
We had both forms in various places, let's standarize on the shorter and more
correct one.
2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek 7bf081a1e5 pid1: rename start_limit to start_ratelimit
This way it is clearer what the type is. We also have auto_stop_ratelimit adjacent,
and it feels ugly to have a different suffix for those two.
2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek de5ae832f2
Merge pull request #13439 from yuwata/core-support-systemctl-clean-more
core: support systemctl clean more
2019-09-13 16:15:02 +02:00
Yu Watanabe 810ef3180e core: introduce unit_fork_and_watch_rm_rf() 2019-08-28 23:09:54 +09:00
Yu Watanabe 52a12341f9 core: make RuntimeDirectoryPreserve= works with non-service units 2019-08-23 00:08:16 +09:00
Yu Watanabe 95939aed21 core: introduce unit_destroy_runtime_directory()
Currently `unit_will_restart()` can return true only when the unit is
service. Hence, should not change anything.
2019-08-22 23:50:52 +09:00
Zbigniew Jędrzejewski-Szmek 5cc2cd1cd8 pid1: always log successfull process termination quietly
Fixes #13372.
2019-08-22 09:09:45 +02:00
Mattias Jernberg a5a8776ae5 core: Avoid race when starting dbus services
In high load scenarios it is possible for services to be started
before the NameOwnerChanged signal is properly installed.

Emulate a callback by also queuing a GetNameOwner when the match is
installed.

Fixes: #12956
2019-08-14 16:12:31 +02:00
Zbigniew Jędrzejewski-Szmek 5cfa33e0bc Create src/shared/unit-file.[ch] for unit-file related ops
So far we put such functinos in install.[ch], but that is tied too closely
to enable/disable. Let's start moving things to a place with a better name.
2019-07-19 16:51:14 +02:00
Zbigniew Jędrzejewski-Szmek 96cf3ec966 pid1: get rid of unit_supported() helper
Another case where "open code" is easier to read than the helper.
2019-07-19 16:51:14 +02:00
Anita Zhang 31cd5f63ce core: ExecCondition= for services
Closes #10596
2019-07-17 11:35:02 +02:00
Lennart Poettering 380dc8b0a2 core: add generic "clean" operation to units
This adds basic infrastructure to implement a "clean" operation for unit
types. This "clean" operation is supposed to remove on-disk resources of
units, and is supposed to be used in a later commit to clean our
RuntimeDirectory=, StateDirectory= and so on of service units.

Later commits will open this up to the bus, and hook up service units
with this.

This also adds a new generic ActiveState called UNIT_MAINTENANCE. It's
supposed to cover all kinds of "maintainance" state of units.
Specifically, this is supposed to cover the "cleaning" operations later
added for service units which might take a bit of time. This high-level,
generic, abstract state is called UNIT_MAINTENANCE instead of the
more specific "UNIT_CLEANING", since I think this should be kept open
for different operations possibly later on that could be nicely subsumed
under this (for example, maybe a recursive chown()ing operation could be
covered by this, and similar).
2019-07-11 12:18:51 +02:00
Lennart Poettering 261e7d9270
Merge pull request #12755 from keszybz/short-identifiers
Allow using unit names in status messages
2019-07-11 00:00:51 +02:00
Zbigniew Jędrzejewski-Szmek 2a8f53c67b Use unit->id instead of description in messages
v2:
- rename unit_identifier to unit_status_string
2019-07-10 13:35:26 +02:00
Zbigniew Jędrzejewski-Szmek 62c6bbbc09 tree-wide: use PROJECT_FILE instead of __FILE__
This replaces the internal uses of __FILE__ with the new macro.
2019-07-04 10:36:00 +02:00
Kai Lüke fab347489f bpf-firewall: custom BPF programs through IP(Ingress|Egress)FilterPath=
Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be
specified multiple times. An empty assignment resets all previous filters.

Closes https://github.com/systemd/systemd/issues/10227
2019-06-25 09:56:16 +02:00
Ben Boeckel 5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Lennart Poettering fbe14fc9a7 croup: expose IO accounting data per unit
This was the last kind of accounting still not exposed on for each unit.
Let's fix that.

Note that this is a relatively simplistic approach: we don't expose
per-device stats, but sum them all up, much like cgtop does. This kind
of metric is probably the most interesting for most usecases, and covers
the "systemctl status" output best. If we want per-device stats one day
we can of course always add that eventually.
2019-04-12 14:25:44 +02:00
Lennart Poettering afcfaa695c core: implement OOMPolicy= and watch cgroups for OOM killings
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).

On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".

All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
2019-04-09 11:17:58 +02:00
Lennart Poettering 0bb814c2c2 core: rename cgroup_inotify_wd → cgroup_control_inotify_wd
Let's rename the .cgroup_inotify_wd field of the Unit object to
.cgroup_control_inotify_wd. Let's similarly rename the hashmap
.cgroup_inotify_wd_unit of the Manager object to
.cgroup_control_inotify_wd_unit.

Why? As preparation for a later commit that allows us to watch the
"memory.events" cgroup attribute file in addition to the "cgroup.events"
file we already watch with the fields above. In that later commit we'll
add new fields "cgroup_memory_inotify_wd" to Unit and
"cgroup_memory_inotify_wd_unit" to Manager, that are used to watch these
other events file.

No change in behaviour. Just some renaming.
2019-04-09 11:17:57 +02:00
Franck Bui f75f613d25 core: reduce the number of stalled PIDs from the watched processes list when possible
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.

However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.

If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
2019-03-20 10:51:49 +01:00
Lennart Poettering a4191c9fb5 core: unify code for checking whether unit to trigger is loaded 2019-03-18 16:06:36 +01:00
Lennart Poettering 97a3f4ee05 core: rename unit_{start_limit|condition|assert}_test() to unit_test_xyz()
Just some renaming, no change in behaviour.

Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
2019-03-18 16:06:36 +01:00
Lennart Poettering 5bcffb4b54
Merge pull request #11457 from grooverdan/sendsigkill_no
service: killmode=cgroup|mixed, SendSIGKILL=no services are not multiprocess
2019-02-18 13:41:52 +01:00
Filipe Brandenburger 527ede0c63 core: downgrade CPUQuotaPeriodSec= clamping logs to debug
After the first warning log, further messages are downgraded to LOG_DEBUG.
2019-02-14 11:04:42 -08:00
Daniel Black c53d2d54bd service: make killmode=cgroup|mixed, SendSIGKILL=no services singletons
KillMode=mixed and control group are used to indicate that all
process should be killed off. SendSIGKILL is used for services
that require a clean shutdown. These are typically database
service where a SigKilled process would result in a lengthy
recovery and who's shutdown or startup time is quite variable
(so Timeout settings aren't of use).

Here we take these two factors and refuse to start a service if
there are existing processes within a control group. Databases,
while generally having some protection against multiple instances
running, lets not stress the rigor of these. Also ExecStartPre
parts of the service aren't as rigoriously written to protect
against against multiple use.

closes #8630
2019-01-29 15:35:59 +11:00
Chris Down 4e1dfa45e9 cgroup: s/cgroups? ?v?([0-9])/cgroup v\1/gI
Nitpicky, but we've used a lot of random spacings and names in the past,
but we're trying to be completely consistent on "cgroup vN" now.

Generated by `fd -0 | xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups?  ?v?([0-9])/cgroup v\1/gI'`.

I manually ignored places where it's not appropriate to replace (eg.
"cgroup2" fstype and in src/shared/linux).
2019-01-03 11:32:40 +09:00
Zbigniew Jędrzejewski-Szmek 303ee60151 Mark *data and *userdata params to specifier_printf() as const
It would be very wrong if any of the specfier printf calls modified
any of the objects or data being printed. Let's mark all arguments as const
(primarily to make it easier for the reader to see where modifications cannot
occur).
2018-12-12 16:45:33 +01:00
Lennart Poettering a95c0505ad core: extend comments regarding coldplug() vs. catchup() 2018-12-12 11:20:53 +01:00
Lennart Poettering 7af67e9a8b core: allow to set exit status when using SuccessAction=/FailureAction=exit in units
This adds SuccessActionExitStatus= and FailureActionExitStatus= that may
be used to configure the exit status to propagate in when
SuccessAction=exit or FailureAction=exit is used.

When not specified let's also propagate the exit status of the main
process we fork off for the unit.
2018-11-27 09:44:40 +01:00