Commit Graph

587 Commits

Author SHA1 Message Date
Ben Boeckel 5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Lennart Poettering bc40a20ebe core: include IO data in per-unit resource log msg 2019-04-12 14:25:44 +02:00
Lennart Poettering fbe14fc9a7 croup: expose IO accounting data per unit
This was the last kind of accounting still not exposed on for each unit.
Let's fix that.

Note that this is a relatively simplistic approach: we don't expose
per-device stats, but sum them all up, much like cgtop does. This kind
of metric is probably the most interesting for most usecases, and covers
the "systemctl status" output best. If we want per-device stats one day
we can of course always add that eventually.
2019-04-12 14:25:44 +02:00
Lennart Poettering 83f18c91d0 core: use string_table_lookup() at more places 2019-04-12 14:25:44 +02:00
Lennart Poettering 9b2559a13e core: add new call unit_reset_accounting()
It's a simple wrapper for resetting both IP and CPU accounting in one
go.

This will become particularly useful when we also needs this to reset IO
accounting (to be added in a later commit).
2019-04-12 14:25:44 +02:00
Lennart Poettering cc6625212f core: no need to initialize ip_accounting twice 2019-04-12 14:25:44 +02:00
Lennart Poettering afcfaa695c core: implement OOMPolicy= and watch cgroups for OOM killings
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).

On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".

All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
2019-04-09 11:17:58 +02:00
Lennart Poettering 0bb814c2c2 core: rename cgroup_inotify_wd → cgroup_control_inotify_wd
Let's rename the .cgroup_inotify_wd field of the Unit object to
.cgroup_control_inotify_wd. Let's similarly rename the hashmap
.cgroup_inotify_wd_unit of the Manager object to
.cgroup_control_inotify_wd_unit.

Why? As preparation for a later commit that allows us to watch the
"memory.events" cgroup attribute file in addition to the "cgroup.events"
file we already watch with the fields above. In that later commit we'll
add new fields "cgroup_memory_inotify_wd" to Unit and
"cgroup_memory_inotify_wd_unit" to Manager, that are used to watch these
other events file.

No change in behaviour. Just some renaming.
2019-04-09 11:17:57 +02:00
Lennart Poettering bf65b7e0c9 core: imply NNP and SUID/SGID restriction for DynamicUser=yes service
Let's be safe, rather than sorry. This way DynamicUser=yes services can
neither take benefit of, nor create SUID/SGID binaries.

Given that DynamicUser= is a recent addition only we should be able to
get away with turning this on, even though this is strictly speaking a
binary compatibility breakage.
2019-04-02 16:56:48 +02:00
Lennart Poettering 50cbaba4fe core: add new API for enqueing a job with returning the transaction data 2019-03-27 12:37:37 +01:00
Lennart Poettering b82f71c7ff tree-wide: constify a few static string tables 2019-03-25 14:04:34 +01:00
Yu Watanabe f9f88198ce core/unit: use condition_test_list() 2019-03-21 23:37:39 +09:00
Franck Bui f75f613d25 core: reduce the number of stalled PIDs from the watched processes list when possible
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.

However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.

If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
2019-03-20 10:51:49 +01:00
Lennart Poettering 9adb695987 core: split error list in comment for unit_start() in two 2019-03-18 16:06:36 +01:00
Lennart Poettering 36c4dc089e core: change emergency_action() to return void
The function so far always returned -ECANCELLED, which is ignored in all
cases the function is invoked, except one: in unit_test_start_limit()
where -ECANCELLED is returned when the start limit is hit, which is part
of unit_start()'s protocol of return values.

Since the emergency_action() logic should be relatively generic and is
used in many places, let's drop the return value from it, since it's
constant anyway, and in alll cases useless. Instead, let's return it in
unit_test_start_limit(), where it's part of the protocol.

No change in behaviour.
2019-03-18 16:06:36 +01:00
Lennart Poettering 2de9b9793b core: check start limit on condition checks too
Let's add a safety precaution: if the start condition checks for a unit
are tested too often and fail each time, let's rate limit this too.

This should add extra safety in case people define .path, .timer or
.automount units that trigger a service that as a conditoin that always
fails.
2019-03-18 16:06:36 +01:00
Lennart Poettering 5766aca8d2 core: modernize unit_start() a bit
No change in behaviour, just a re-line-breaking of the various comments
to our current coding style, and some use of SYNTHETIC_ERRNO().
2019-03-18 16:06:36 +01:00
Lennart Poettering a4191c9fb5 core: unify code for checking whether unit to trigger is loaded 2019-03-18 16:06:36 +01:00
Lennart Poettering 97a3f4ee05 core: rename unit_{start_limit|condition|assert}_test() to unit_test_xyz()
Just some renaming, no change in behaviour.

Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
2019-03-18 16:06:36 +01:00
Lennart Poettering 9e30cf74ce core: add comment explaining ECOMM return value of unit_start()
we explain all other return values, explain these ones too.
2019-03-18 16:06:36 +01:00
Stephane Chazelas 106bf8e445 remove "." path components from required mount paths
unit_require_mounts_for may be passed path arguments that contain "."
components like for user's home directories where "." is sometimes used
to specify some form of anchor point.

This change stops considering such path as an error and removes the "."
components instead.

Closes: #11910
2019-03-07 10:12:03 +01:00
Lennart Poettering 5bcffb4b54
Merge pull request #11457 from grooverdan/sendsigkill_no
service: killmode=cgroup|mixed, SendSIGKILL=no services are not multiprocess
2019-02-18 13:41:52 +01:00
Daniel Black c53d2d54bd service: make killmode=cgroup|mixed, SendSIGKILL=no services singletons
KillMode=mixed and control group are used to indicate that all
process should be killed off. SendSIGKILL is used for services
that require a clean shutdown. These are typically database
service where a SigKilled process would result in a lengthy
recovery and who's shutdown or startup time is quite variable
(so Timeout settings aren't of use).

Here we take these two factors and refuse to start a service if
there are existing processes within a control group. Databases,
while generally having some protection against multiple instances
running, lets not stress the rigor of these. Also ExecStartPre
parts of the service aren't as rigoriously written to protect
against against multiple use.

closes #8630
2019-01-29 15:35:59 +11:00
Jonathon Kowalski 6255af75d7 Return -EAGAIN instead of -EALREADY from unit_reload
Fixes: #11499

Let's return -EAGAIN so that on state change, unit_process_job tries to
add our job to run_queue again so that all the reloads that coalesced
into the installed reload (which itself merged into a running one)
inititally atleast runs *once*. This should ensure service picks up all
config changes reliably.

See the issue being fixed for a detailed explanation.
2019-01-20 22:12:24 +00:00
Lennart Poettering 2d41e9b7a0
Merge pull request #11143 from keszybz/enable-symlink
Runtime mask symlink confusion fix
2018-12-16 12:37:07 +01:00
Zbigniew Jędrzejewski-Szmek 58d9d89b4b pid1: fix free of uninitialized pointer in unit_fail_if_noncanonical()
https://bugzilla.redhat.com/show_bug.cgi?id=1653068
2018-12-14 11:21:16 +01:00
Zbigniew Jędrzejewski-Szmek 303ee60151 Mark *data and *userdata params to specifier_printf() as const
It would be very wrong if any of the specfier printf calls modified
any of the objects or data being printed. Let's mark all arguments as const
(primarily to make it easier for the reader to see where modifications cannot
occur).
2018-12-12 16:45:33 +01:00
Lennart Poettering a1c7334b61 core: when a unit state changes only propagate to jobs after reloading is complete
Previously, we'd immediately propagate unit state changes into any jobs
pending for them, always. With this we only do this if the manager is
out of the "reload" state. This fixes the problem #8803 tried to
address, by simply not completing jobs until after the reload (and thus
reestablishment of the dbus connection) is complete.

Note that there's no need to later on explicitly catch up with the
missed job state changes (i.e. there's no need to call
unit_process_job() later one explicitly). That's because for jobs in
JOB_WAITING state on deserialization all jobs are requeued into the run
queue anyway, and thus checked again if they can complete now. And for
JOB_RUNNING jobs unit_catchup() phase is going to trigger missed out
state changes *after* the reload complete anyway (after all that's what
distinguishes from unit_coldplug()).

Replaces: #8803
2018-12-12 11:15:07 +01:00
Lennart Poettering 16c74914d2 core: split out all logic that updates a Job on a unit's unit_notify() invocation
Just some refactoring, no change in behaviour.
2018-12-12 11:15:07 +01:00
Lennart Poettering b17c9620c8 core: rework how we deserialize jobs
Let's add a helper call unit_deserialize_job() for this purpose, and
let's move registration in the global jobs hash table into
job_install_deserialized() so that it it is done after all superficial
checks are done, and before transitioning into installed states, so that
rollback code is not necessary anymore.
2018-12-12 11:15:07 +01:00
Zbigniew Jędrzejewski-Szmek 4cb06c5949 Use VLA instead of alloca
The test is the same, but an array is more readable.
2018-12-10 11:57:26 +01:00
Zbigniew Jędrzejewski-Szmek 2d479ff1cc
Merge pull request #10963 from poettering/bus-force-state-change-signal
force PropertiesChanged bus signal on all unit state changes
2018-12-06 16:42:21 +01:00
Lennart Poettering e4de72876e util-lib: split out all temporary file related calls into tmpfiles-util.c
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.

No code changes, just some rearranging of source files.
2018-12-02 13:22:29 +01:00
Lennart Poettering ee228be10c util-lib: don't include fileio.h from fileio-label.h
There's no reason for doing that, hence simply don't.
2018-12-02 13:22:29 +01:00
Lennart Poettering 3c4832ada4 core: enqueue unit earlier when state changes
Previously, we'd enqueue a unit to the dbus queue whenever the state
changed, after we processed the state change fully. This commit to the
beginning of the state change. This has the benefit that when the state
change causes a job to complete the unit is already in the dbus queue,
and thus we get the guarantee that any unit change can be sent out to
clients before the job change.
2018-12-01 12:53:26 +01:00
Lennart Poettering af92c603bb core: send out unit change events when a new invocation ID is acquired
It's free, as this generally coincides with unit_start(), but let's make
this clean and explicit.
2018-12-01 12:53:26 +01:00
Lennart Poettering e18f8852f3 core: invalidate invidual Assert/Condition properties when sending out change messages
Let's inform the clients about assert/condition property changes as they
happen, it's basically for free because assert/condition property
changes generally coincide with other unit state changes (after all
these checks are done on unit_start())
2018-12-01 12:53:26 +01:00
Lennart Poettering 37d0b962ef core: when we manage to resolve a user, only enqueue dbus event, don't send out message right-away
Let's only enqueue the dbus signal generation, let's not do it
right-away, after all we want coalescing to take effect here.
2018-12-01 12:53:26 +01:00
Zbigniew Jędrzejewski-Szmek 8b4e51a60e
Merge pull request #10797 from poettering/run-generator
add new "systemd-run-generator" for running arbitrary commands from the kernel command line as system services using the "systemd.run=" kernel command line switch
2018-11-28 22:40:55 +01:00
Lennart Poettering 7af67e9a8b core: allow to set exit status when using SuccessAction=/FailureAction=exit in units
This adds SuccessActionExitStatus= and FailureActionExitStatus= that may
be used to configure the exit status to propagate in when
SuccessAction=exit or FailureAction=exit is used.

When not specified let's also propagate the exit status of the main
process we fork off for the unit.
2018-11-27 09:44:40 +01:00
Lennart Poettering 5b262f74e4 unit: tweak status output a bit
Let's highlight the unit description string in the status updates, to
separate them a bit more the english sentence they are part of, and thus
make the different casing less surprising.
2018-11-26 18:24:12 +01:00
Lennart Poettering b8b6f32104 cgroup: when we unload a unit, also update all its parent's members mask
This way we can corectly ensure that when a unit that requires some
controller goes away, we propagate the removal of it all the way up, so
that the controller is turned off in all the parents too.
2018-11-23 13:41:37 +01:00
Lennart Poettering 5af8805872 cgroup: drastically simplify caching of cgroups members mask
Previously we tried to be smart: when a new unit appeared and it only
added controllers to the cgroup mask we'd update the cached members mask
in all parents by ORing in the controller flags in their cached values.
Unfortunately this was quite broken, as we missed some conditions when
this cache had to be reset (for example, when a unit got unloaded),
moreover the optimization doesn't work when a controller is removed
anyway (as in that case there's no other way for the parent to iterate
though all children if any other, remaining child unit still needs it).
Hence, let's simplify the logic substantially: instead of updating the
cache on the right events (which we didn't get right), let's simply
invalidate the cache, and generate it lazily when we encounter it later.
This should actually result in better behaviour as we don't have to
calculate the new members mask for a whole subtree whever we have the
suspicion something changed, but can delay it to the point where we
actually need the members mask.

This allows us to simplify things quite a bit, which is good, since
validating this cache for correctness is hard enough.

Fixes: #9512
2018-11-23 13:41:37 +01:00
Lennart Poettering 0adf88b68c cgroup: dump delegation mask too 2018-11-23 12:24:37 +01:00
Lennart Poettering 00e7b3c8e5 unit: minor optimization, use stack over heap, when we can 2018-11-23 00:46:56 +01:00
Lennart Poettering 66fa4bdd70 core: add two minor comments (#10890) 2018-11-23 06:25:27 +09:00
Lennart Poettering 6e64994d69 core: make unit_start() return a distinguishable error code in case conditions didn't hold
Ideally we'd even propagate this all the way to the client, by having a
separate JobType enum value for this. But it's hard to add this without
breaking compat, hence for now let's at least internally propagate this
case differently from the case "already on it".

This is then used to call job_finish_and_invalidate() slightly
differently, with the already= parameter false, as in the failed
condition case no message was likely produced so far.
2018-11-16 15:22:48 +01:00
Lennart Poettering 523ee2d414 core: log a recognizable message when a unit succeeds, too
We already are doing it on failure, let's do it on success, too.

Fixes: #10265
2018-11-16 15:22:48 +01:00
Lennart Poettering 91bbd9b796 core: make log messages about unit processes exiting recognizable 2018-11-16 15:22:48 +01:00
Lennart Poettering 7c047d7443 core: make log messages about units entering a 'failed' state recognizable
Let's make this recognizable, and carry result information in a
structure fashion.
2018-11-16 15:22:48 +01:00