Commit graph

535 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 79e221d078
Merge pull request #9158 from poettering/notify-auto-reload
trigger OnFailure= only if Restart= is not in effect
2018-06-05 13:51:07 +02:00
Zbigniew Jędrzejewski-Szmek a1230ff972 basic/log: add the log_struct terminator to macro
This way all callers do not need to specify it.
Exhaustively tested by running test-log under valgrind ;)
2018-06-04 13:46:03 +02:00
Lennart Poettering ec5b1452ac core: go to failure state if the main service process fails and RemainAfterExit=yes (#9159)
Previously, we'd not care about failures that were seen earlier and
remain in "exited" state. This could be triggered if the main process of
a service failed while ExecStartPost= was still running, as in that case
we'd not immediately act on the main process failure because we needed
to wait for ExecStartPost= to finish, before acting on it.

Fixes: #8929
2018-06-04 11:35:25 +02:00
Yu Watanabe 858d36c1ec path-util: introduce path_simplify()
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
2018-06-03 23:39:26 +09:00
Lennart Poettering 2ad2e41a72 core: don't trigger OnFailure= deps when a unit is going to restart
This adds a flags parameter to unit_notify() which can be used to pass
additional notification information to the function. We the make the old
reload_failure boolean parameter one of these flags, and then add a new
flag that let's unit_notify() if we are configured to restart the
service.

Note that this adjusts behaviour of systemd to match what the docs say.

Fixes: #8398
2018-06-01 19:08:30 +02:00
Alan Jenkins 4330dc03a0 service: FileDescriptorStoreMax should also imply NotifyAccess
Commenting out "WatchdogTimeout=3min" in systemd-logind.service causes
NotifyAccess to go from "main" to "none", breaking support for logind
restart.  Let's fix that.
2018-05-15 12:33:56 +02:00
Lennart Poettering da6053d0a7 tree-wide: be more careful with the type of array sizes
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.

Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.

So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.

This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:

1. strv_length()' return type becomes size_t

2. the unit file changes array size becomes size_t

3. DNS answer and query array sizes become size_t

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
2018-04-27 14:29:06 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Lennart Poettering 12b6b3b7a4
Merge pull request #8562 from keszybz/docs
Man page and log message fixes
2018-03-26 15:34:39 +02:00
Zbigniew Jędrzejewski-Szmek 5ce6e7f525 core/service: rework the hold-off time over message
"hold-off" is apparently confusing, because we also have HoldoffTimeoutSec=.
Let's use RestartSec= directly in the message.

Fixes #5472.
2018-03-24 14:22:42 +01:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Zbigniew Jędrzejewski-Szmek 064c593899 core/service: fix memleak of USBFunctionStrings and USBFunctionDescriptors
oss-fuzz #6892.
2018-03-17 09:01:53 +01:00
Lennart Poettering 62d74c78b5 coccinelle: add reallocarray() coccinelle script
Let's systematically make use of reallocarray() whereever we invoke
realloc() with a product of two values.
2018-03-02 12:39:07 +01:00
Lennart Poettering 00f5ad93b5 core: change KeyringMode= to "shared" by default for non-service units in the system manager (#8172)
Before this change all unit types would default to "private" in the
system service manager and "inherit" to in the user service manager.

With this change this is slightly altered: non-service units of the
system service manager are now run with KeyringMode=shared. This appears
to be the more appropriate choice as isolation is not as desirable for
mount tools, which regularly consume key material. After all mounts are
a shared resource themselves as they appear system-wide hence it makes a
lot of sense to share their key material too.

Fixes: #8159
2018-02-20 08:53:34 +01:00
Lennart Poettering a94ab7acfd
Merge pull request #8175 from keszybz/gc-cleanup
Garbage collection cleanup
2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek 7f7d01ed58 pid1: include the source unit in UnitRef
No functional change.

The source unit manages the reference. It allocates the UnitRef structure and
registers it in the target unit, and then the reference must be destroyed
before the source unit is destroyed. Thus, is should be OK to include the
pointer to the source unit, it should be live as long as the reference exists.

v2:
- rename refs to refs_by_target
2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek f2f725e5cc pid1: rename unit_check_gc to unit_may_gc
"check" is unclear: what is true, what is false? Let's rename to "can_gc" and
revert the return value ("positive" values are easier to grok).

v2:
- rename from unit_can_gc to unit_may_gc
2018-02-15 13:04:12 +01:00
Lennart Poettering 004c7f169e core: fold manager_set_exec_params() into unit_set_exec_params()
Let's simplify things a bit: we so far called both functions every
single time, let's just merge one into the other, so that we have fewer
functions to call.
2018-02-12 11:34:00 +01:00
Lennart Poettering 1d9cc8768f cgroup: add a new "can_delegate" flag to the unit vtable, and set it for scope and service units only
Currently we allowed delegation for alluntis with cgroup backing
except for slices. Let's make this a bit more strict for now, and only
allow this in service and scope units.

Let's also add a generic accessor unit_cgroup_delegate() for checking
whether a unit has delegation turned on that checks the new bool first.

Also, when doing transient units, let's explcitly refuse turning on
delegation for unit types that don#t support it. This is mostly
cosmetical as we wouldn't act on the delegation request anyway, but
certainly helpful for debugging.
2018-02-12 11:34:00 +01:00
Lennart Poettering 73969ab61c service: relax PID file symlink chain checks a bit (#8133)
Let's read the PID file after all if there's a potentially unsafe
symlink chain in place. But if we do, then refuse taking the PID if its
outside of the cgroup.

Fixes: #8085
2018-02-09 17:05:17 +01:00
Yu Watanabe f2e18ef1a3 core: remove unnecessary initialization 2018-02-09 16:36:37 +09:00
Yu Watanabe e8a565cb66 core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.

This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.

Fixes #7790.
2018-02-06 16:00:34 +09:00
Yu Watanabe c9d4169919 core/service: dump more settings 2018-01-30 17:10:47 +09:00
Lennart Poettering adefcf2821 core: rework how we count the n_on_console counter
Let's add a per-unit boolean that tells us whether our unit is currently
counted or not. This way it's unlikely we get out of sync again and
things are generally more robust.

This also allows us to remove the counting logic specific to service
units (which was in fact mostly a copy from the generic implementation),
in favour of fully generic code.

Replaces: #7824
2018-01-24 20:14:51 +01:00
Lennart Poettering bb2c768545 core: add a new unit_needs_console() call
This call determines whether a specific unit currently needs access to
the console. It's a fancy wrapper around
exec_context_may_touch_console() ultimately, however for service units
we'll explicitly exclude the SERVICE_EXITED state from when we report
true.
2018-01-24 19:54:26 +01:00
Lennart Poettering 9acac21249 service: simplify condition
The left side of the || expression is conditionalized on SERVICE_START,
but SERVICE_START is blanket listed on the right side anyway, hence we
can drop the left side entirely without any change in behaviour.

Moreover, if main_pid is initialized, it should be watched, hence this
is even the safe and right thing to do.
2018-01-23 21:29:31 +01:00
Lennart Poettering eabd3e56a6 service: don't bother with watching PIDs during deserialization
service_coldplug() takes care of that anyway, hence drop the
unit_watch_pid() invocation entirely during serialization, it's
redundant.
2018-01-23 21:29:31 +01:00
Lennart Poettering 11aef522c1 core: unify call we use to synthesize cgroup empty events when we stopped watching any unit PIDs
This code is very similar in scope and service units, let's unify it in
one function. This changes little for service units, but for scope units
makes sure we go through the cgroup queue, which is something we should
do anyway.
2018-01-23 21:22:50 +01:00
Lennart Poettering 5cdabc8d7b service: don't send out dbus change notifications spuriously on SIGCHLD
Let's send them out only if the main or control processe exited and we
recorded a new exit status that is worth reporting. But if any other
service process died this is nothing to report since we don't expose any
properties about that anyway.
2018-01-23 21:22:50 +01:00
Jan Klötzke 2a12e32efa pid1: add option to disable service watchdogs
Add a "systemd.service_watchdogs=" option to the command line which
disables all service runtime watchdogs and emergency actions.
2018-01-22 18:10:03 +01:00
Zbigniew Jędrzejewski-Szmek e0b6d3cabe
Merge pull request #7816 from poettering/chase-pid
Make MAINPID= and PIDFile= handling more restrictive (and other stuff)
2018-01-15 14:14:34 +04:00
Lennart Poettering db256aab13 core: be stricter when handling PID files and MAINPID sd_notify() messages
Let's be more restrictive when validating PID files and MAINPID=
messages: don't accept PIDs that make no sense, and if the configuration
source is not trusted, don't accept out-of-cgroup PIDs. A configuratin
source is considered trusted when the PID file is owned by root, or the
message was received from root.

This should lock things down a bit, in case service authors write out
PID files from unprivileged code or use NotifyAccess=all with
unprivileged code. Note that doing so was always problematic, just now
it's a bit less problematic.

When we open the PID file we'll now use the CHASE_SAFE chase_symlinks()
logic, to ensure that we won't follow an unpriviled-owned symlink to a
privileged-owned file thinking this was a valid privileged PID file,
even though it really isn't.

Fixes: #6632
2018-01-11 15:12:16 +01:00
Lennart Poettering 8895eb7815 unit: log when we cannot add a watch on a specific PID 2018-01-11 15:07:14 +01:00
Lennart Poettering f1d34068ef tree-wide: add DEBUG_LOGGING macro that checks whether debug logging is on (#7645)
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
2017-12-15 11:09:00 +01:00
Daniel Black a327431bd1 core: add EXTEND_TIMEOUT_USEC={usec} - prevent timeouts in startup/runtime/shutdown (#7214)
With Type=notify services, EXTEND_TIMEOUT_USEC= messages will delay any startup/
runtime/shutdown timeouts.

A service that hasn't timed out, i.e, start time < TimeStartSec,
runtime < RuntimeMaxSec and stop time < TimeoutStopSec, may by sending
EXTEND_TIMEOUT_USEC=, allow the service to continue beyond the limit for
the execution phase (i.e TimeStartSec, RunTimeMaxSec and TimeoutStopSec).

EXTEND_TIMEOUT_USEC= must continue to be sent (in the same way as
WATCHDOG=1) within the time interval specified to continue to reprevent
the timeout from occuring.

Watchdog timeouts are also extended if a EXTEND_TIMEOUT_USEC is greater
than the remaining time on the watchdog counter.

Fixes #5868.
2017-12-14 12:17:43 +01:00
Michal Koutný deb4e7080d service: Don't stop unneeded units needed by restarted service (#7526)
An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes.
If A stops before B's restart timeout expires, it'll be started again as part
of B's dependent jobs. However, if stopping takes longer than the timeout, B's
running stop job collides start job which also cancels B's start job. Result is
that neither A or B are active.

Currently, when a service with automatic restarting fails, it transitions
through following states:
        1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure,
        2) SERVICE_AUTO_RESTART while restart timer is running.

The StopWhenUnneeded= check takes place in service_enter_dead between the two
state mentioned above. We temporarily store the auto restart flag to query it
during the check. Because we don't return control to the main event loop, this
new service unit flag needn't be serialized.

This patch prevents the pathologic situation when the service with Restart=
won't restart automatically. As a side effect it also avoid restarting the
dependency unit with StopWhenUnneeded=yes.

Fixes: #7377
2017-12-05 16:51:19 +01:00
Lennart Poettering f3b900311f service: shortcut operations if the MAINPID= doesn't actually cause a change 2017-11-27 17:04:57 +01:00
Lennart Poettering 2fa40742a4 service: use parse_errno() for parsing error numbers
Let's always use the same logic when parsing error numbers, i.e. use
parse_errno() here too, to unify some code, and tighten the checks a
bit.

This also allows clients to pass errors as symbolic names. Probably
nothing we want to advertise too eagerly (since new daemons generating
this on old service managers won't understand), but still pretty
useful I think, in particular in scripting languages and such, where the
numeric error numbers might not be readily available.
2017-11-27 17:04:57 +01:00
Lennart Poettering e78ee06de1 core: add a new sd_notify() message for removing fds from the FD store again
Currenly the only way to remove fds from the fdstore is to fully
stop the service, or to somehow trigger POLLERR/POLLHUP on the fd, in
which case systemd will remove the fd automatically.

Let's add another way: a new message that can be sent to remove fds
explicitly, given their name.
2017-11-27 17:04:04 +01:00
Lennart Poettering cc2b7b11b4 core: only process one of READY=1, STOPPING=1 or RELOADING=1 in sd_notify() handling
Of course, it's not really a valid sd_notify() message if multiple of
these fields are used in one, but let's handle this somewhat gracefully,
by only processing one of them, and ignoring the rest.
2017-11-27 17:01:00 +01:00
Lennart Poettering c45d11cb30 service: reorder sd_notify() handling a bit
Let's keep handling of WATCHDOG= and WATCHDOG_USEC= together. No
functional changes.
2017-11-27 16:59:52 +01:00
Lennart Poettering e328523777 service: split out sd_notify() message authorization code into a function of its own
Let's shorten service_notify_message() a bit, and do the authentication
outside of the main function body.

No functional changes.
2017-11-27 16:59:52 +01:00
Lennart Poettering 9711848ff1 core: only log about sd_notify() message contents, when debug logging is on
Let's optimize things a bit for the non-debug case. No change in
behaviour.

Main reason to do this is not so much the speed benefit though, but
merely to isolate the code from its surroundings more.
2017-11-27 16:39:43 +01:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Lennart Poettering e98b2fbbe9 core: generalize the cgroup empty check on GC
Let's move the cgroup empty check for all unit types into the generic
unit_check_gc() call, out of the per-unit-type _check_gc() type. This
not only allows us to share some code, but also hooks up mount and
socket units with this kind of check, for free, as it was missing there
previously.
2017-11-25 17:08:21 +01:00
Lennart Poettering e9a4f67609 cgroup: remove logic for maintaining /control subcgroup for the service unit type
Previously, in the service unit type we ran all control processes in a
special subcgroup /control of the unit's main cgroup. Remove that, and
run the control program in the main cgroup instead.

The concept conflicts with cgroupv2's logic of "no processes in inner
nodes": if a unit has a main daemon process running in the main cgroup,
and a reload control process would be started in the /control subcgroup,
then this would necessarily fail, as the main daemon process would
become an inner node process that way.

We could in theory continue to support this in cgroupv1, but in the
interest in keeping behaviour similar in both hierarchies, let's drop
this altogether.

Philosophically maybe it wasn't the greatest idea anyway to just go
berserk and SIGKILL all those processes — loud warning logging might
have sufficed, too.
2017-11-25 17:08:21 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Zbigniew Jędrzejewski-Szmek 82a27ba821
Merge pull request #7389 from shawnl/warning
tree-wide: adjust fall through comments so that gcc is happy
2017-11-22 07:38:51 +01:00
Lennart Poettering 3c7416b6ca core: unify common code for preparing for forking off unit processes
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.

This allows us to neatly unify a bit of code between unit types and
shorten our code.
2017-11-21 11:54:08 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Lennart Poettering 53c35a766f core: generalize FailureAction= move it from service to unit
All kinds of units can fail, hence it makes sense to offer this as
generic concept for all unit types.
2017-11-20 16:37:22 +01:00
Lennart Poettering 0133d5553a
Merge pull request #7198 from poettering/stdin-stdout
Add StandardInput=data, StandardInput=file:... and more
2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering f56e7bfe2b core: be more defensive if we can't determine per-connection socket peer (#7329)
Let's handle gracefully if a client disconnects very early on.

This builds on #4120, but relaxes the condition checks further, since we
getpeername() might already fail during ExecStartPre= and friends.

Fixes: #7172
2017-11-17 15:22:11 +01:00
Lennart Poettering 08f3be7a38 core: add two new unit file settings: StandardInputData= + StandardInputText=
Both permit configuring data to pass through STDIN to an invoked
process. StandardInputText= accepts a line of text (possibly with
embedded C-style escapes as well as unit specifiers), which is appended
to the buffer to pass as stdin, followed by a single newline.
StandardInputData= is similar, but accepts arbitrary base64 encoded
data, and will not resolve specifiers or C-style escapes, nor append
newlines.

This may be used to pass input/configuration data to services, directly
in-line from unit files, either in a cooked or in a more raw format.
2017-11-17 11:13:44 +01:00
Lennart Poettering 7eb2a8a125 unit: rework a bit how we keep the service fdstore from being destroyed during service restart
When preparing for a restart we quickly go through the DEAD/INACTIVE
service state before entering AUTO_RESTART. When doing this, we need to
make sure we don't destroy the FD store. Previously this was done by
checking the failure state of the unit, and keeping the FD store around
when the unit failed, under the assumption that the restart logic will
then get into action.

This is not entirely correct howver, as there might be failure states
that will no result in restarts.

With this commit we slightly alter the logic: a ref counter for the fd
store is added, that is increased right before we handle the restart
logic, and decreased again right-after.

This should ensure that the fdstore lives exactly as long as it needs.

Follow-up for f0bfbfac43.
2017-11-16 14:37:33 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Lennart Poettering eef85c4a3f core: track why unit dependencies came to be
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.

The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.

Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.

Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:

1. We can fix UDEV_WANTS dependency generation: so far we kept adding
   dependencies configured that way, but if a device lost such a
   dependency we couldn't them again as there was no scheme for removing
   of dependencies in place.

2. We can implement "pin-pointed" reload of unit files. If we know what
   dependencies were created as result of configuration in a unit file,
   then we know what to flush out when we want to reload it.

3. It's useful for debugging: "systemd-analyze dump" now shows
   this information, helping substantially with understanding how
   systemd's dependency tree came to be the way it came to be.
2017-11-10 19:45:29 +01:00
Alan Jenkins 4cd9fa8176 core: failure to spawn ExecStartPost should not run ExecStop
Failure to spawn ExecStartPost was being handled differently to e.g.
EXIT_FAILURE returned by ExecStartPost.  It looks like this was an
oversight.  Fix to match documented behaviour.

`man systemd.service`:

> Note that if any of the commands specified in ExecStartPre=, ExecStart=,
> or ExecStartPost= fail (and are not prefixed with "-", see above) or time
> out before the service is fully up, execution continues with commands
> specified in ExecStopPost=, the commands in ExecStop= are skipped.
2017-11-01 15:28:50 +00:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00
Jouke Witteveen df66b93fe2 service: better detect when a Type=notify service cannot become active anymore (#6959)
No need to wait for a timeout when we know things are not going to work out.
When the main process goes away and only notifications from the main process are
accepted, then we will not receive any notifications anymore.
2017-10-02 16:35:27 +02:00
Zbigniew Jędrzejewski-Szmek b139c95bc4 Merge pull request #6941 from andir/use-in_set
use IN_SET where possible
2017-10-02 15:08:10 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Lennart Poettering b13ddbbcf3 service: accept the fact that the three xyz_good() functions return ints
Currently, all three of cgroup_good(), main_pid_good(),
control_pid_good() all return an "int" (two of them propagate errors).
It's a good thing to keep the three functions similar, so let's leave it
at that, but then let's clean up the invocation of the three functions
so that they always clearly acknowledge that the return value is not a
bool, but potentially negative.
2017-10-02 12:58:42 +02:00
Lennart Poettering 019be28676 service: drop _pure_ decorator on static function
The compiler should be good enough to figure this out on its own if this
is a static function, and it makes control_pid_good() an outlier anyway,
and decorators like this tend to bitrot. Hence, to keep things simple
and automatic, let's just drop the decorator.
2017-10-02 12:58:42 +02:00
Lennart Poettering 3c751b1bfa service: a cgroup empty notification isn't reason enough to go down
The processes associated with a service are not just the ones in its
cgroup, but also the control and main processes, which might possibly
live outside of it, for example if they transitioned into their own
cgroups because they registered a PAM session of their own. Hence, if we
get a cgroup empty notification always check if the main PID is still
around before taking action too eagerly.

Fixes: #6045
2017-10-02 12:58:42 +02:00
Lennart Poettering 07697d7ec5 service: add explanatory comments to control_pid_good() and cgroup_good()
Let's add a similar comment to each as we already have for
main_pid_good(), emphasizing that these functions are supposed to be
have very similar.
2017-10-02 12:58:42 +02:00
Lennart Poettering 51894d706f service: fix main_pid_good() comment
We don't actually return -1, don't claim that.
2017-10-02 12:58:37 +02:00
Lennart Poettering ed77d407d3 core: log unit failure with type-specific result code
This slightly changes how we log about failures. Previously,
service_enter_dead() would log that a service unit failed along with its
result code, and unit_notify() would do this again but without the
result code. For other unit types only the latter would take effect.

This cleans this up: we keep the message in unit_notify() only for debug
purposes, and add type-specific log lines to all our unit types that can
fail, and always place them before unit_notify() is invoked.

Or in other words: the duplicate log message for service units is
removed, and all other unit types get a more useful line with the
precise result code.
2017-09-27 18:26:18 +02:00
Lennart Poettering 09e2465407 cgroup: after determining that a cgroup is empty, asynchronously dispatch this
This makes sure that if we learn via inotify or another event source
that a cgroup is empty, and we checked that this is indeed the case (as
we might get spurious notifications through inotify, as the inotify
logic through the "cgroups.event" is pretty unspecific and might be
trigger for a variety of reasons), then we'll enqueue a defer event for
it, at a priority lower than SIGCHLD handling, so that we know for sure
that if there's waitid() data for a process we used it before
considering the cgroup empty notification.

Fixes: #6608
2017-09-27 18:26:18 +02:00
Lennart Poettering a6951a5079 service: rework service_kill_control_processes()
Let's make sure we explicitly also kill any control process we know of,
given that it might have moved outside of our control group.
2017-09-26 16:17:22 +02:00
Lennart Poettering f1c50becda core: make sure to log invocation ID of units also when doing structured logging 2017-09-22 15:24:55 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Lennart Poettering 18f573aaf9 core: make sure to dump cgroup context when unit_dump() is called for all unit types
For some reason we didn't dump the cgroup context for a number of unit
types, including service units. Not sure how this wasn't noticed
before... Add this in.
2017-09-22 15:24:54 +02:00
Evgeny Vereshchagin 7f92388482 core: serialize n-restarts and flush-n-restarts correctly (#6736)
This makes n-restarts and flush-n-restarts survive `systemctl daemon-[reload|rexec]`.
2017-09-04 15:36:01 +02:00
Lennart Poettering 0f52f8e552 core: disable the effect of Restart= if there's a stop job pending for a service (#6581)
We shouldn't undo the job already enqueued, under any circumstances.

Fixes: #6504
2017-08-26 22:07:23 +09:00
Yu Watanabe 2c5ad0fd6d Merge pull request #6577 from poettering/more-exec-flags
add ! and !! ExecStart= flags to make ambient caps useful
2017-08-26 21:49:05 +09:00
Michal Sekletar b58aeb70db service: attempt to execute next main command only for oneshot services (#6619)
This commit fixes crash described in
https://github.com/systemd/systemd/issues/6533

Multiple ExecStart lines are allowed only for oneshot services
anyway so it doesn't make sense to call service_run_next_main() with
services of type other than SERVICE_ONESHOT.

Referring back to reproducer from the issue, previously we didn't observe
this problem because s->main_command was reset after daemon-reload hence
we never reached the assert statement in service_run_next_main().

Fixes #6533
2017-08-25 16:36:10 +03:00
Lennart Poettering 1703fa41a7 core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
2017-08-10 15:02:50 +02:00
Lennart Poettering f0d477979e core: introduce unit_set_exec_params()
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
2017-08-10 15:02:50 +02:00
Lennart Poettering 19bbdd985e core: manager_set_exec_params() cannot fail, hence make it void
Let's simplify things a bit.
2017-08-10 15:02:50 +02:00
Lennart Poettering 584b8688d1 execute: also fold the cgroup delegate bit into ExecFlags 2017-08-10 15:02:50 +02:00
Lennart Poettering ac6479781e execute: also control the SYSTEMD_NSS_BYPASS_BUS through an ExecFlags field
Also, correct the logic while we are at it: the variable is only
required for system services, not user services.
2017-08-10 15:02:49 +02:00
Lennart Poettering 5bf7569cf8 service: let's set EXEC_NEW_KEYRING through SET_FLAG()
Not that it really matters, but it matches how we set the flags in
manager_set_exec_params() too.
2017-08-10 15:02:49 +02:00
Lennart Poettering 3ed0cd26ea execute: replace command flag bools by a flags field
This way, we can extend it later on in an easier way, and can pass it
along nicely.
2017-08-10 14:44:58 +02:00
Lennart Poettering 7a0019d373 core: introduce a restart counter (#6495)
This adds a per-service restart counter. Each time an automatic
restart is scheduled (due to Restart=) it is increased by one. Its
current value is exposed over the bus as NRestarts=. It is also logged
(in a structured, recognizable way) on each restart.

Note that this really only counts automatic starts triggered by Restart=
(which it nicely complements). Manual restarts will reset the counter,
as will explicit calls to "systemctl reset-failed". It's supposed to be
a tool for measure the automatic restart feature, and nothing else.

Fixes: #4126
2017-08-09 21:12:55 +02:00
Jouke Witteveen 15d167f8a3 core: propagate reload from RELOADING=1 notification (#6550) 2017-08-07 11:27:24 +02:00
Zbigniew Jędrzejewski-Szmek a132bef023 Drop kdbus bits
Some kdbus_flag and memfd related parts are left behind, because they
are entangled with the "legacy" dbus support.

test-bus-benchmark is switched to "manual". It was already broken before
(in the non-kdbus mode) but apparently nobody noticed. Hopefully it can
be fixed later.
2017-07-23 12:01:54 -04:00
Lennart Poettering df0ff12775 tree-wide: make use of getpid_cached() wherever we can
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
2017-07-20 20:27:24 +02:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Yu Watanabe 53f47dfc7b core: allow preserving contents of RuntimeDirectory= over process restart
This introduces RuntimeDirectoryPreserve= option which takes a boolean
argument or 'restart'.

Closes #6087.
2017-07-17 16:22:25 +09:00
Lennart Poettering 9efb9df9e3 core: make NotifyAccess= and FileDescriptorStoreMax= available to transient services
This is helpful for debugging/testing #5606.
2017-06-26 15:14:41 +02:00
Lennart Poettering 3ceb72e558 core: permit FDSTORE=1 messages with non-pollable fds
This also alters the documentation to recommend memfds rather than /run
for serializing state across reboots. That's because /run doesn't
actually have the same lifecycle as the fd store, as it is cleared out
on restarts.

Fixes: #5606
2017-06-26 15:14:41 +02:00
Franck Bui 4c47affcf1 core: remove the redundancy of 'n_fds' and 'n_storage_fds' in ExecParameters struct
'n_fds' field in the ExecParameters structure was counting the total number of
file descriptors to be passed to a unit.

This counter also includes the number of passed socket fds which is counted by
'n_socket_fds' already.

This patch removes that redundancy by replacing 'n_fds' with
'n_storage_fds'. The new field only counts the fds passed via the storage store
mechanism.  That way each fd is counted at one place only.

Subsequently the patch makes sure to fix code that used 'n_fds' and also wanted
to iterate through all of them by explicitly adding 'n_socket_fds' + 'n_storage_fds'.

Suggested by Lennart.
2017-06-08 16:21:35 +02:00
Franck Bui 9b1419111a core: only apply NonBlocking= to fds passed via socket activation
Make sure to only apply the O_NONBLOCK flag to the fds passed via socket
activation.

Previously the flag was also applied to the fds which came from the fd store
but this was incorrect since services, after being restarted, expect that these
passed fds have their flags unchanged and can be reused as before.

The documentation was a bit unclear about this so clarify it.
2017-06-06 22:42:50 +02:00
Thomas Hindoe Paaboel Andersen 6eeec374c1 tree-wide: remove unused variables 2017-04-28 23:56:44 +02:00
Lennart Poettering 8ea9aa9e88 Merge pull request #5354 from msekletar/issue-518
service: serialize information about currently executing command
2017-04-24 19:51:34 +02:00
Zbigniew Jędrzejewski-Szmek ba360bb05c tree-wide: mark log_struct with _printf_ and fix fallout
log_struct takes multiple format strings, each one followed by arguments.
The _printf_ annotation is not sufficiently flexible to express this,
but we can still annotate the first format string, though not its
arguments (because their number is unknown).

With the annotation, the places which specified the message id or similar
as the first pattern cause a warning from -Wformat-nonliteral. This can
be trivially fixed by putting the MESSAGE= first.

This change will help find issues where a non-literal is erroneously used
as the pattern.
2017-04-21 13:37:04 -04:00
Michal Sekletar e266c068b5 service: serialize information about currently executing command
Stored information will help us to resume execution after the
daemon-reload.

This commit implements following scheme,

* On serialization:
  - we count rank of the currently executing command
  - we store command type, its rank and command line arguments

* On deserialization:
  - configuration is parsed and loaded
  - we deserialize stored data, command type, rank and arguments
  - we look at the given rank in the list and if command there has same
    arguments then we restore execution at that point
  - otherwise we search respective command list and we look for command
    that has the same arguments
  - if both methods fail we do not do not resume execution at all

To better illustrate how does above scheme works, please consider
following cases (<<< denotes position where we resume execution after reload)

; Original unit file
[Service]
ExecStart=/bin/true <<<
ExecStart=/bin/false

; Swapped commands
; Second command is not going to be executed
[Service]
ExecStart=/bin/false
ExecStart=/bin/true <<<

; Commands added before
; Same commands are problematic and execution could be restarted at wrong place
[Service]
ExecStart=/bin/foo
ExecStart=/bin/bar
ExecStart=/bin/true <<<
ExecStart=/bin/false

; Commands added after
; Same commands are not an issue in this case
[Service]
ExecStart=/bin/true <<<
ExecStart=/bin/false
ExecStart=/bin/foo
ExecStart=/bin/bar

; New commands interleaved with old commands
; Some new commands will be executed while others won't
ExecStart=/bin/foo
ExecStart=/bin/true <<<
ExecStart=/bin/bar
ExecStart=/bin/false

As you can see, above scheme has some drawbacks. However, in most
cases (we assume that in most common case unit file command list is not
changed while some other command is running for the same unit) it
should cause that systemd does the right thing, which is restoring
execution exactly at the point we were before daemon-reload.

Fixes #518
2017-04-11 09:22:25 +02:00
Lennart Poettering 6939ce648a service: refuse using PID 1 as MAINPID for a service 2017-02-28 16:08:40 +01:00
Lennart Poettering e8b509d3be service: make use of log_unit_warning_errno()'s return value 2017-02-28 16:08:21 +01:00
Lennart Poettering 7c102d6092 core: use PID_FMT where appropriate 2017-02-28 16:07:56 +01:00
Lennart Poettering c22800e40e cgroup: rename cg_unified() → cg_unified_controller()
cg_unified() is a bit generic a name, let's make clear that it checks
whether a specified controller is in unified mode.
2017-02-24 18:00:04 +01:00
Lennart Poettering b4cccbc13a cgroup: change cg_unified() to possibly return errors again
We use our cgroup APIs in various contexts, including from our libraries
sd-login, sd-bus. As we don#t control those environments we can't rely
that the unified cgroup setup logic succeeds, and hence really shouldn't
assert on it.

This more or less reverts 415fc41cea.
2017-02-24 17:52:58 +01:00
Tejun Heo 415fc41cea core: simplify cg_[all_]unified()
cg_[all_]unified() test whether a specific controller or all controllers are on
the unified hierarchy.  While what's being asked is a simple binary question,
the callers must assume that the functions may fail any time, which
unnecessarily complicates their usages.  This complication is unnecessary.
Internally, the test result is cached anyway and there are only a few places
where the test actually needs to be performed.

This patch simplifies cg_[all_]unified().

* cg_[all_]unified() are updated to return bool.  If the result can't be
  decided, assertion failure is triggered.  Error handlings from their callers
  are dropped.

* cg_unified_flush() is updated to calculate the new result synchrnously and
  return whether it succeeded or not.  Places which need to flush the test
  result are updated to test for failure.  This ensures that all the following
  cg_[all_]unified() tests succeed.

* Places which expected possible cg_[all_]unified() failures are updated to
  call and test cg_unified_flush() before calling cg_[all_]unified().  This
  includes functions used while setting up mounts during boot and
  manager_setup_cgroup().
2017-02-18 17:51:13 -05:00
Stefan Hajnoczi 359a5bcf78 core: add AF_VSOCK support to socket units
Accept AF_VSOCK listen addresses in socket unit files.  Both guest and
host can now take advantage of socket activation.

The QEMU guest agent has recently been modified to support socket
activation and can run over AF_VSOCK with this patch.
2017-01-10 15:29:04 +00:00
Stefan Hajnoczi 882ac6e769 socket-util: introduce port argument in sockaddr_port()
sockaddr_port() either returns a >= 0 port number or a negative errno.
This works for AF_INET and AF_INET6 because port ranges are only 16-bit.

In AF_VSOCK ports are 32-bit so an int cannot represent all port number
and negative errnos.  Separate the port and the return code.
2017-01-10 15:29:04 +00:00
Lennart Poettering 74dd6b515f core: run each system service with a fresh session keyring
This patch ensures that each system service gets its own session kernel keyring
automatically, and implicitly. Without this a keyring is allocated for it
on-demand, but is then linked with the user's kernel keyring, which is OK
behaviour for logged in users, but not so much for system services.

With this change each service gets a session keyring that is specific to the
service and ceases to exist when the service is shut down. The session keyring
is not linked up with the user keyring and keys hence only search within the
session boundaries by default.

(This is useful in a later commit to store per-service material in the keyring,
for example the invocation ID)

(With input from David Howells)
2016-12-13 20:59:10 +01:00
Zbigniew Jędrzejewski-Szmek 1ac7a93574 Merge pull request #4835 from poettering/unit-name-printf
Various specifier resolution fixes.
2016-12-10 01:29:52 -05:00
Lennart Poettering 5125e76243 core: move specifier expansion out of service.c/socket.c
This monopolizes unit file specifier expansion in load-fragment.c, and removes
it from socket.c + service.c. This way expansion becomes an operation done exclusively at time of loading unit files.

Previously specifiers were resolved for all settings during loading of unit
files with the exception of ExecStart= and friends which were resolved in
socket.c and service.c. With this change the latter is also moved to the
loading of unit files.

Fixes: #3061
2016-12-07 18:47:32 +01:00
Jouke Witteveen c3fda31da3 service: go through stop_post on failure (#4770) 2016-12-06 14:02:36 +01:00
Jouke Witteveen 6375bd2007 service: new NotifyAccess= value for control processes (#4212)
Setting NotifyAccess=exec allows notifications coming directly from any
control process.
2016-11-29 23:20:04 +01:00
Jouke Witteveen 3c9512c71d service: prevent registering control pids as the main pid
We assume a process can be only one of the two in service_sigchld_event.
2016-11-29 10:34:33 +01:00
Jouke Witteveen 71e529fcf1 service: only fail notify services on empty cgroup during start
We stay in the SERVICE_START while no READY=1 notification message has
been received. When we are in the SERVICE_START_POST state, we have
already received a ready notification. Hence we should not fail when the
cgroup becomes empty in that state.
2016-11-29 10:34:33 +01:00
Jouke Witteveen 3d474ef7a6 service: fix main processes exit behavior for type notify services
Before this commit, when the main process of a Type=notify service exits the
service would enter a running state without passing through the startup post
state. This meant ExecStartPost= from being executed and allowed follow-up
units to start too early (before the ready notification).
Additionally, when RemainAfterExit=yes is used on a Type=notify service, the
exit status of the main process would be disregarded.

After this commit, an unsuccessful exit of the main process of a Type=notify
service puts the unit in a failed state. A successful exit is inconsequential
in case RemainAfterExit=yes. Otherwise, when no ready notification has been
received, the unit is put in a failed state because it has never been active.
When all processes in the cgroup of a Type=notify service are gone and no ready
notification has been received yet, the unit is also put in a failed state.
2016-11-22 17:54:27 +01:00
Jouke Witteveen c35755fb87 service: introduce protocol error type
Introduce a SERVICE_FAILURE_PROTOCOL error type for when a service does
not follow the protocol.
This error type is used when a pid file is expected, but not delivered.
2016-11-22 17:54:27 +01:00
Franck Bui 7d5ceb6416 core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).

This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
2016-11-17 18:16:16 +01:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Zbigniew Jędrzejewski-Szmek b09246352f pid1: fix fd memleak when we hit FileDescriptorStoreMax limit
Since service_add_fd_store() already does the check, remove the redundant check
from service_add_fd_store_set().

Also, print a warning when repopulating FDStore after daemon-reexec and we hit
the limit. This is a user visible issue, so we should not discard fds silently.
(Note that service_deserialize_item is impacted by the return value from
service_add_fd_store(), but we rely on the general error message, so the caller
does not need to be modified, and does not show up in the diff.)
2016-11-02 15:07:17 -04:00
Zbigniew Jędrzejewski-Szmek f0bfbfac43 core: when restarting services, don't close fds
We would close all the stored fds in service_release_resources(), which of
course broke the whole concept of storing fds over service restart.

Fixes #4408.
2016-11-01 21:20:21 -04:00
Zbigniew Jędrzejewski-Szmek 16f70d6362 pid1: nicely log when doing operation on stored fds
Should help with debugging #4408.
2016-10-28 22:45:05 -04:00
Zbigniew Jędrzejewski-Szmek 9021ff17e2 pid1: only log about added fd if it was really added
If it was a duplicate, log nothing.
2016-10-28 22:45:05 -04:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 7d78f7cea8 Merge pull request #4428 from lnykryn/ctrl_v2
rename failure-action to emergency-action and use it for ctrl+alt+del burst
2016-10-22 23:16:11 -04:00
Lukas Nykryn 87a47f99bc failure-action: generalize failure action to emergency action 2016-10-21 15:13:50 +02:00
Lennart Poettering 47fffb3530 core: if the start command vanishes during runtime don't hit an assert
This can happen when the configuration is changed and reloaded while we are
executing a service. Let's not hit an assert in this case.

Fixes: #4444
2016-10-21 12:27:46 +02:00
Lennart Poettering 5368222db6 core: let's upgrade the log level for service processes dying of signal (#4415)
As suggested in
https://github.com/systemd/systemd/pull/4367#issuecomment-253670328
2016-10-19 19:48:35 -04:00
Zbigniew Jędrzejewski-Szmek 3b319885c4 tree-wide: introduce free_and_replace helper
It's a common pattern, so add a helper for it. A macro is necessary
because a function that takes a pointer to a pointer would be type specific,
similarly to cleanup functions. Seems better to use a macro.
2016-10-16 23:35:39 -04:00
Zbigniew Jędrzejewski-Szmek b744e8937c Merge pull request #4067 from poettering/invocation-id
Add an "invocation ID" concept to the service manager
2016-10-11 13:40:50 -04:00
Lennart Poettering 1f0958f640 core: when determining whether a process exit status is clean, consider whether it is a command or a daemon
SIGTERM should be considered a clean exit code for daemons (i.e. long-running
processes, as a daemon without SIGTERM handler may be shut down without issues
via SIGTERM still) while it should not be considered a clean exit code for
commands (i.e. short-running processes).

Let's add two different clean checking modes for this, and use the right one at
the appropriate places.

Fixes: #4275
2016-10-10 22:57:01 +02:00
Lennart Poettering 41e2036eb8 exit-status: kill is_clean_exit_lsb(), move logic to sysv-generator
Let's get rid of is_clean_exit_lsb(), let's move the logic for the special
handling of the two LSB exit codes into the sysv-generator by writing out
appropriate SuccessExitStatus= lines if the LSB header exists. This is not only
semantically more correct, bug also fixes a bug as the code in service.c that
chose between is_clean_exit_lsb() and is_clean_exit() based this check on
whether a native unit files was available for the unit. However, that check was
bogus since a long time, since the SysV generator was introduced and native
SysV script support was removed from PID 1, as in that case a unit file always
existed.
2016-10-10 21:48:08 +02:00
Lennart Poettering 4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
Kyle Russell 7dd736abec service: fixup ExecStop for socket-activated shutdown (#4120)
Previous fix didn't consider handling multiple ExecStop commands.
2016-09-10 08:55:36 +03:00
Kyle Russell f2dbd059a6 service: Continue shutdown on socket activated unit on termination (#4108)
ENOTCONN may be a legitimate return code if the endpoint disappeared,
but the service should still attempt to shutdown cleanly.
2016-09-09 05:34:43 +03:00
Zbigniew Jędrzejewski-Szmek 2056ec1927 Merge pull request #3965 from htejun/systemd-controller-on-unified 2016-08-19 19:58:01 -04:00
Lennart Poettering 00d9ef8560 core: add RemoveIPC= setting
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e.  all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.

This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.

In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
2016-08-19 00:37:25 +02:00
Tejun Heo 5da38d0768 core: use the unified hierarchy for the systemd cgroup controller hierarchy
Currently, systemd uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, systemd uses a named legacy hierarchy
mounted on /sys/fs/cgroup/systemd without any kernel controllers for process
management.  Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.

Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for systemd to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies.  It can simply mount the unified hierarchy under
/sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test.  In time, this would also allow deleting the said
complexities.

This patch updates systemd so that it prefers the unified hierarchy for the
systemd cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.

* cg_unified(@controller) is introduced which tests whether the specific
  controller in on unified hierarchy and used to choose the unified hierarchy
  code path for process and service management when available.  Kernel
  controller specific operations remain gated by cg_all_unified().

* "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to
  force the use of legacy hierarchy for systemd cgroup controller.

* nspawn: By default nspawn uses the same hierarchies as the host.  If
  UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all.  If
  0, legacy for all.

* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
  three options - legacy, only systemd controller on unified, and unified.  The
  value is passed into mount setup functions and controls cgroup configuration.

* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
  option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
  appropriate action depending on the configuration of the host.

v2: - CGroupUnified enum replaces open coded integer values to indicate the
      cgroup operation mode.
    - Various style updates.

v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.

v4: Restored legacy container on unified host support and fixed another bug in
    detect_unified_cgroup_hierarchy().
2016-08-17 17:44:36 -04:00
Tejun Heo ca2f6384aa core: rename cg_unified() to cg_all_unified()
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies.  This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is.  In preparation, this patch renames cg_unified() to
cg_all_unified().

This patch doesn't cause any functional changes.
2016-08-15 18:13:36 -04:00
Zbigniew Jędrzejewski-Szmek 3bb81a80bd Merge pull request #3818 from poettering/exit-status-env
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
2016-08-05 20:55:08 -04:00
Zbigniew Jędrzejewski-Szmek 3ebcd323bd systemd: do not serialize peer, bump count when deserializing socket instead 2016-08-05 08:16:31 -04:00
Zbigniew Jędrzejewski-Szmek 9dfb64f87d core/service: serialize and deserialize accept_socket
This fixes an issue during reexec — the count of connections would be lost:

[zbyszek@fedora-rawhide ~]$ systemctl status testlimit.socket | grep Connected
 Accepted: 1; Connected: 1

[zbyszek@fedora-rawhide ~]$ sudo systemctl daemon-reexec
[zbyszek@fedora-rawhide ~]$ systemctl status testlimit.socket | grep Connected
 Accepted: 1; Connected: 0

With the patch, Connected count is preserved.

Also add "Accept Socket" to the dump output for services.
2016-08-05 08:16:31 -04:00
Lennart Poettering b08af3b127 core: only set the watchdog variables in ExecStart= lines 2016-08-04 23:08:05 +02:00
Lennart Poettering a0fef983ab core: remember first unit failure, not last unit failure
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.

With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.

This addresses the issue pointed out here:

https://github.com/systemd/systemd/pull/3818#discussion_r73433520
2016-08-04 23:08:05 +02:00
Lennart Poettering 136dc4c435 core: set $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS in ExecStop=/ExecStopPost= commands
This should simplify monitoring tools for services, by passing the most basic
information about service result/exit information via environment variables,
thus making it unnecessary to retrieve them explicitly via the bus.
2016-08-04 23:08:05 +02:00
Lennart Poettering 9c1a61adba core: move masking of chroot/permission masking into service_spawn()
Let's fix up the flags fields in service_spawn() rather than its callers, in
order to simplify things a bit.
2016-08-04 16:27:07 +02:00
Lennart Poettering c39f1ce24d core: turn various execution flags into a proper flags parameter
The ExecParameters structure contains a number of bit-flags, that were so far
exposed as bool:1, change this to a proper, single binary bit flag field. This
makes things a bit more expressive, and is helpful as we add more flags, since
these booleans are passed around in various callers, for example
service_spawn(), whose signature can be made much shorter now.

Not all bit booleans from ExecParameters are moved into the flags field for
now, but this can be added later.
2016-08-04 16:27:07 +02:00
Susant Sahani 9d56542764 socket: add support to control no. of connections from one source (#3607)
Introduce MaxConnectionsPerSource= that is number of concurrent
connections allowed per IP.

RFE: 1939
2016-08-02 13:48:23 -04:00
Lennart Poettering 29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00