Commit Graph

424 Commits

Author SHA1 Message Date
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Zbigniew Jędrzejewski-Szmek 064c593899 core/service: fix memleak of USBFunctionStrings and USBFunctionDescriptors
oss-fuzz #6892.
2018-03-17 09:01:53 +01:00
Lennart Poettering 62d74c78b5 coccinelle: add reallocarray() coccinelle script
Let's systematically make use of reallocarray() whereever we invoke
realloc() with a product of two values.
2018-03-02 12:39:07 +01:00
Lennart Poettering 00f5ad93b5 core: change KeyringMode= to "shared" by default for non-service units in the system manager (#8172)
Before this change all unit types would default to "private" in the
system service manager and "inherit" to in the user service manager.

With this change this is slightly altered: non-service units of the
system service manager are now run with KeyringMode=shared. This appears
to be the more appropriate choice as isolation is not as desirable for
mount tools, which regularly consume key material. After all mounts are
a shared resource themselves as they appear system-wide hence it makes a
lot of sense to share their key material too.

Fixes: #8159
2018-02-20 08:53:34 +01:00
Lennart Poettering a94ab7acfd
Merge pull request #8175 from keszybz/gc-cleanup
Garbage collection cleanup
2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek 7f7d01ed58 pid1: include the source unit in UnitRef
No functional change.

The source unit manages the reference. It allocates the UnitRef structure and
registers it in the target unit, and then the reference must be destroyed
before the source unit is destroyed. Thus, is should be OK to include the
pointer to the source unit, it should be live as long as the reference exists.

v2:
- rename refs to refs_by_target
2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek f2f725e5cc pid1: rename unit_check_gc to unit_may_gc
"check" is unclear: what is true, what is false? Let's rename to "can_gc" and
revert the return value ("positive" values are easier to grok).

v2:
- rename from unit_can_gc to unit_may_gc
2018-02-15 13:04:12 +01:00
Lennart Poettering 004c7f169e core: fold manager_set_exec_params() into unit_set_exec_params()
Let's simplify things a bit: we so far called both functions every
single time, let's just merge one into the other, so that we have fewer
functions to call.
2018-02-12 11:34:00 +01:00
Lennart Poettering 1d9cc8768f cgroup: add a new "can_delegate" flag to the unit vtable, and set it for scope and service units only
Currently we allowed delegation for alluntis with cgroup backing
except for slices. Let's make this a bit more strict for now, and only
allow this in service and scope units.

Let's also add a generic accessor unit_cgroup_delegate() for checking
whether a unit has delegation turned on that checks the new bool first.

Also, when doing transient units, let's explcitly refuse turning on
delegation for unit types that don#t support it. This is mostly
cosmetical as we wouldn't act on the delegation request anyway, but
certainly helpful for debugging.
2018-02-12 11:34:00 +01:00
Lennart Poettering 73969ab61c service: relax PID file symlink chain checks a bit (#8133)
Let's read the PID file after all if there's a potentially unsafe
symlink chain in place. But if we do, then refuse taking the PID if its
outside of the cgroup.

Fixes: #8085
2018-02-09 17:05:17 +01:00
Yu Watanabe f2e18ef1a3 core: remove unnecessary initialization 2018-02-09 16:36:37 +09:00
Yu Watanabe e8a565cb66 core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.

This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.

Fixes #7790.
2018-02-06 16:00:34 +09:00
Yu Watanabe c9d4169919 core/service: dump more settings 2018-01-30 17:10:47 +09:00
Lennart Poettering adefcf2821 core: rework how we count the n_on_console counter
Let's add a per-unit boolean that tells us whether our unit is currently
counted or not. This way it's unlikely we get out of sync again and
things are generally more robust.

This also allows us to remove the counting logic specific to service
units (which was in fact mostly a copy from the generic implementation),
in favour of fully generic code.

Replaces: #7824
2018-01-24 20:14:51 +01:00
Lennart Poettering bb2c768545 core: add a new unit_needs_console() call
This call determines whether a specific unit currently needs access to
the console. It's a fancy wrapper around
exec_context_may_touch_console() ultimately, however for service units
we'll explicitly exclude the SERVICE_EXITED state from when we report
true.
2018-01-24 19:54:26 +01:00
Lennart Poettering 9acac21249 service: simplify condition
The left side of the || expression is conditionalized on SERVICE_START,
but SERVICE_START is blanket listed on the right side anyway, hence we
can drop the left side entirely without any change in behaviour.

Moreover, if main_pid is initialized, it should be watched, hence this
is even the safe and right thing to do.
2018-01-23 21:29:31 +01:00
Lennart Poettering eabd3e56a6 service: don't bother with watching PIDs during deserialization
service_coldplug() takes care of that anyway, hence drop the
unit_watch_pid() invocation entirely during serialization, it's
redundant.
2018-01-23 21:29:31 +01:00
Lennart Poettering 11aef522c1 core: unify call we use to synthesize cgroup empty events when we stopped watching any unit PIDs
This code is very similar in scope and service units, let's unify it in
one function. This changes little for service units, but for scope units
makes sure we go through the cgroup queue, which is something we should
do anyway.
2018-01-23 21:22:50 +01:00
Lennart Poettering 5cdabc8d7b service: don't send out dbus change notifications spuriously on SIGCHLD
Let's send them out only if the main or control processe exited and we
recorded a new exit status that is worth reporting. But if any other
service process died this is nothing to report since we don't expose any
properties about that anyway.
2018-01-23 21:22:50 +01:00
Jan Klötzke 2a12e32efa pid1: add option to disable service watchdogs
Add a "systemd.service_watchdogs=" option to the command line which
disables all service runtime watchdogs and emergency actions.
2018-01-22 18:10:03 +01:00
Zbigniew Jędrzejewski-Szmek e0b6d3cabe
Merge pull request #7816 from poettering/chase-pid
Make MAINPID= and PIDFile= handling more restrictive (and other stuff)
2018-01-15 14:14:34 +04:00
Lennart Poettering db256aab13 core: be stricter when handling PID files and MAINPID sd_notify() messages
Let's be more restrictive when validating PID files and MAINPID=
messages: don't accept PIDs that make no sense, and if the configuration
source is not trusted, don't accept out-of-cgroup PIDs. A configuratin
source is considered trusted when the PID file is owned by root, or the
message was received from root.

This should lock things down a bit, in case service authors write out
PID files from unprivileged code or use NotifyAccess=all with
unprivileged code. Note that doing so was always problematic, just now
it's a bit less problematic.

When we open the PID file we'll now use the CHASE_SAFE chase_symlinks()
logic, to ensure that we won't follow an unpriviled-owned symlink to a
privileged-owned file thinking this was a valid privileged PID file,
even though it really isn't.

Fixes: #6632
2018-01-11 15:12:16 +01:00
Lennart Poettering 8895eb7815 unit: log when we cannot add a watch on a specific PID 2018-01-11 15:07:14 +01:00
Lennart Poettering f1d34068ef tree-wide: add DEBUG_LOGGING macro that checks whether debug logging is on (#7645)
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
2017-12-15 11:09:00 +01:00
Daniel Black a327431bd1 core: add EXTEND_TIMEOUT_USEC={usec} - prevent timeouts in startup/runtime/shutdown (#7214)
With Type=notify services, EXTEND_TIMEOUT_USEC= messages will delay any startup/
runtime/shutdown timeouts.

A service that hasn't timed out, i.e, start time < TimeStartSec,
runtime < RuntimeMaxSec and stop time < TimeoutStopSec, may by sending
EXTEND_TIMEOUT_USEC=, allow the service to continue beyond the limit for
the execution phase (i.e TimeStartSec, RunTimeMaxSec and TimeoutStopSec).

EXTEND_TIMEOUT_USEC= must continue to be sent (in the same way as
WATCHDOG=1) within the time interval specified to continue to reprevent
the timeout from occuring.

Watchdog timeouts are also extended if a EXTEND_TIMEOUT_USEC is greater
than the remaining time on the watchdog counter.

Fixes #5868.
2017-12-14 12:17:43 +01:00
Michal Koutný deb4e7080d service: Don't stop unneeded units needed by restarted service (#7526)
An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes.
If A stops before B's restart timeout expires, it'll be started again as part
of B's dependent jobs. However, if stopping takes longer than the timeout, B's
running stop job collides start job which also cancels B's start job. Result is
that neither A or B are active.

Currently, when a service with automatic restarting fails, it transitions
through following states:
        1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure,
        2) SERVICE_AUTO_RESTART while restart timer is running.

The StopWhenUnneeded= check takes place in service_enter_dead between the two
state mentioned above. We temporarily store the auto restart flag to query it
during the check. Because we don't return control to the main event loop, this
new service unit flag needn't be serialized.

This patch prevents the pathologic situation when the service with Restart=
won't restart automatically. As a side effect it also avoid restarting the
dependency unit with StopWhenUnneeded=yes.

Fixes: #7377
2017-12-05 16:51:19 +01:00
Lennart Poettering f3b900311f service: shortcut operations if the MAINPID= doesn't actually cause a change 2017-11-27 17:04:57 +01:00
Lennart Poettering 2fa40742a4 service: use parse_errno() for parsing error numbers
Let's always use the same logic when parsing error numbers, i.e. use
parse_errno() here too, to unify some code, and tighten the checks a
bit.

This also allows clients to pass errors as symbolic names. Probably
nothing we want to advertise too eagerly (since new daemons generating
this on old service managers won't understand), but still pretty
useful I think, in particular in scripting languages and such, where the
numeric error numbers might not be readily available.
2017-11-27 17:04:57 +01:00
Lennart Poettering e78ee06de1 core: add a new sd_notify() message for removing fds from the FD store again
Currenly the only way to remove fds from the fdstore is to fully
stop the service, or to somehow trigger POLLERR/POLLHUP on the fd, in
which case systemd will remove the fd automatically.

Let's add another way: a new message that can be sent to remove fds
explicitly, given their name.
2017-11-27 17:04:04 +01:00
Lennart Poettering cc2b7b11b4 core: only process one of READY=1, STOPPING=1 or RELOADING=1 in sd_notify() handling
Of course, it's not really a valid sd_notify() message if multiple of
these fields are used in one, but let's handle this somewhat gracefully,
by only processing one of them, and ignoring the rest.
2017-11-27 17:01:00 +01:00
Lennart Poettering c45d11cb30 service: reorder sd_notify() handling a bit
Let's keep handling of WATCHDOG= and WATCHDOG_USEC= together. No
functional changes.
2017-11-27 16:59:52 +01:00
Lennart Poettering e328523777 service: split out sd_notify() message authorization code into a function of its own
Let's shorten service_notify_message() a bit, and do the authentication
outside of the main function body.

No functional changes.
2017-11-27 16:59:52 +01:00
Lennart Poettering 9711848ff1 core: only log about sd_notify() message contents, when debug logging is on
Let's optimize things a bit for the non-debug case. No change in
behaviour.

Main reason to do this is not so much the speed benefit though, but
merely to isolate the code from its surroundings more.
2017-11-27 16:39:43 +01:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Lennart Poettering e98b2fbbe9 core: generalize the cgroup empty check on GC
Let's move the cgroup empty check for all unit types into the generic
unit_check_gc() call, out of the per-unit-type _check_gc() type. This
not only allows us to share some code, but also hooks up mount and
socket units with this kind of check, for free, as it was missing there
previously.
2017-11-25 17:08:21 +01:00
Lennart Poettering e9a4f67609 cgroup: remove logic for maintaining /control subcgroup for the service unit type
Previously, in the service unit type we ran all control processes in a
special subcgroup /control of the unit's main cgroup. Remove that, and
run the control program in the main cgroup instead.

The concept conflicts with cgroupv2's logic of "no processes in inner
nodes": if a unit has a main daemon process running in the main cgroup,
and a reload control process would be started in the /control subcgroup,
then this would necessarily fail, as the main daemon process would
become an inner node process that way.

We could in theory continue to support this in cgroupv1, but in the
interest in keeping behaviour similar in both hierarchies, let's drop
this altogether.

Philosophically maybe it wasn't the greatest idea anyway to just go
berserk and SIGKILL all those processes — loud warning logging might
have sufficed, too.
2017-11-25 17:08:21 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Zbigniew Jędrzejewski-Szmek 82a27ba821
Merge pull request #7389 from shawnl/warning
tree-wide: adjust fall through comments so that gcc is happy
2017-11-22 07:38:51 +01:00
Lennart Poettering 3c7416b6ca core: unify common code for preparing for forking off unit processes
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.

This allows us to neatly unify a bit of code between unit types and
shorten our code.
2017-11-21 11:54:08 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Lennart Poettering 53c35a766f core: generalize FailureAction= move it from service to unit
All kinds of units can fail, hence it makes sense to offer this as
generic concept for all unit types.
2017-11-20 16:37:22 +01:00
Lennart Poettering 0133d5553a
Merge pull request #7198 from poettering/stdin-stdout
Add StandardInput=data, StandardInput=file:... and more
2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering f56e7bfe2b core: be more defensive if we can't determine per-connection socket peer (#7329)
Let's handle gracefully if a client disconnects very early on.

This builds on #4120, but relaxes the condition checks further, since we
getpeername() might already fail during ExecStartPre= and friends.

Fixes: #7172
2017-11-17 15:22:11 +01:00
Lennart Poettering 08f3be7a38 core: add two new unit file settings: StandardInputData= + StandardInputText=
Both permit configuring data to pass through STDIN to an invoked
process. StandardInputText= accepts a line of text (possibly with
embedded C-style escapes as well as unit specifiers), which is appended
to the buffer to pass as stdin, followed by a single newline.
StandardInputData= is similar, but accepts arbitrary base64 encoded
data, and will not resolve specifiers or C-style escapes, nor append
newlines.

This may be used to pass input/configuration data to services, directly
in-line from unit files, either in a cooked or in a more raw format.
2017-11-17 11:13:44 +01:00
Lennart Poettering 7eb2a8a125 unit: rework a bit how we keep the service fdstore from being destroyed during service restart
When preparing for a restart we quickly go through the DEAD/INACTIVE
service state before entering AUTO_RESTART. When doing this, we need to
make sure we don't destroy the FD store. Previously this was done by
checking the failure state of the unit, and keeping the FD store around
when the unit failed, under the assumption that the restart logic will
then get into action.

This is not entirely correct howver, as there might be failure states
that will no result in restarts.

With this commit we slightly alter the logic: a ref counter for the fd
store is added, that is increased right before we handle the restart
logic, and decreased again right-after.

This should ensure that the fdstore lives exactly as long as it needs.

Follow-up for f0bfbfac43.
2017-11-16 14:37:33 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Lennart Poettering eef85c4a3f core: track why unit dependencies came to be
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.

The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.

Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.

Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:

1. We can fix UDEV_WANTS dependency generation: so far we kept adding
   dependencies configured that way, but if a device lost such a
   dependency we couldn't them again as there was no scheme for removing
   of dependencies in place.

2. We can implement "pin-pointed" reload of unit files. If we know what
   dependencies were created as result of configuration in a unit file,
   then we know what to flush out when we want to reload it.

3. It's useful for debugging: "systemd-analyze dump" now shows
   this information, helping substantially with understanding how
   systemd's dependency tree came to be the way it came to be.
2017-11-10 19:45:29 +01:00
Alan Jenkins 4cd9fa8176 core: failure to spawn ExecStartPost should not run ExecStop
Failure to spawn ExecStartPost was being handled differently to e.g.
EXIT_FAILURE returned by ExecStartPost.  It looks like this was an
oversight.  Fix to match documented behaviour.

`man systemd.service`:

> Note that if any of the commands specified in ExecStartPre=, ExecStart=,
> or ExecStartPost= fail (and are not prefixed with "-", see above) or time
> out before the service is fully up, execution continues with commands
> specified in ExecStopPost=, the commands in ExecStop= are skipped.
2017-11-01 15:28:50 +00:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00