Commit Graph

219 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek aac99f303a core: introduce a helper function to wrap unit_log_{success,failure}
It's inline so that the compiler can easily optimize away the call to get
status string.
2018-11-16 19:47:07 +01:00
Lennart Poettering 523ee2d414 core: log a recognizable message when a unit succeeds, too
We already are doing it on failure, let's do it on success, too.

Fixes: #10265
2018-11-16 15:22:48 +01:00
Lennart Poettering 91bbd9b796 core: make log messages about unit processes exiting recognizable 2018-11-16 15:22:48 +01:00
Lennart Poettering 7c047d7443 core: make log messages about units entering a 'failed' state recognizable
Let's make this recognizable, and carry result information in a
structure fashion.
2018-11-16 15:22:48 +01:00
Lennart Poettering 33a3fdd978 core: move unit_status_emit_starting_stopping_reloading() and related calls to job.c
This call is only used by job.c and very specific to job handling.
Moreover the very similar logic of job_emit_status_message() is already
in job.c.

Hence, let's clean this up, and move both sets of functions to job.c,
and rename them a bit so that they express precisely what they do:

1. unit_status_emit_starting_stopping_reloading() →
   job_emit_begin_status_message()
2. job_emit_status_message() → job_emit_done_status_message()

The first call is after all what we call when we begin with the
execution of a job, and the second call what we call when we are done
wiht it.

Just some moving and renaming, not other changes, and hence no change in
behaviour.
2018-11-16 15:22:48 +01:00
Lennart Poettering 6529ccfa20 unit: replace three non-type-safe macros by type-safe inline functions
Behaviour is prett ymuch the same, but there's some additional type
checking done on the input parameters.

(In the case of UNIT_WRITE_FLAGS_NOOP() the C compiler won't actually do
the type checking necessarily, but static chckers at least could)
2018-11-08 13:55:25 +01:00
Lennart Poettering bbf1120623 unit: make UNIT() cast function deal with NULL pointers
Fixes: #10681
2018-11-08 10:47:08 +01:00
Lennart Poettering 1ad6e8b302 core: split environment block mantained by PID 1's Manager object in two
This splits the "environment" field of Manager into two:
transient_environment and client_environment. The former is generated
from configuration file, kernel cmdline, environment generators. The
latter is the one the user can control with "systemctl set-environment"
and similar.

Both sets are merged transparently whenever needed. Separating the two
sets has the benefit that we can safely flush out the former while
keeping the latter during daemon reload cycles, so that env var settings
from env generators or configuration files do not accumulate, but
dynamic API changes are kept around.

Note that this change is not entirely transparent to users: if the user
first uses "set-environment" to override a transient variable, and then
uses "unset-environment" to unset it again things will revert to the
original transient variable now, while previously the variable was fully
removed. This change in behaviour should not matter too much though I
figure.

Fixes: #9972
2018-10-31 18:00:53 +01:00
Lennart Poettering d68c645bd3 core: rework serialization
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.

In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.

While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
2018-10-26 10:52:41 +02:00
Lennart Poettering 8948b3415d core: when deserializing state always use read_line(…, LONG_LINE_MAX, …)
This should be much better than fgets(), as we can read substantially
longer lines and overly long lines result in proper errors.

Fixes a vulnerability discovered by Jann Horn at Google.

CVE-2018-15686
LP: #1796402
https://bugzilla.redhat.com/show_bug.cgi?id=1639071
2018-10-26 10:40:01 +02:00
Anita Zhang 90fc172e19 core: implement per unit journal rate limiting
Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for
services. If provided, these values get passed to the journald
client context, and those values are used in the rate limiting
function in the journal over the the journald.conf values.

Part of #10230
2018-10-18 09:56:20 +02:00
Roman Gushchin 084c700780 core: support cgroup v2 device controller
Cgroup v2 provides the eBPF-based device controller, which isn't currently
supported by systemd. This commit aims to provide such support.

There are no user-visible changes, just the device policy and whitelist
start working if cgroup v2 is used.
2018-10-09 09:47:51 -07:00
Roman Gushchin 17f149556a core: refactor bpf firewall support into a pseudo-controller
The idea is to introduce a concept of bpf-based pseudo-controllers
to make adding new bpf-based features easier.
2018-10-09 09:46:08 -07:00
Lennart Poettering 334415b16e
Merge pull request #10094 from keszybz/wants-loading
Fix bogus fragment paths in units in .wants/.requires
2018-10-05 17:36:31 +02:00
Anita Zhang c87700a133 Make Watchdog Signal Configurable
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.

Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.

Closes #8658
2018-09-26 16:14:29 +02:00
Zbigniew Jędrzejewski-Szmek 5a72417084 pid1: drop unused path parameter to add_two_dependencies_by_name() 2018-09-15 20:02:00 +02:00
Zbigniew Jędrzejewski-Szmek 35d8c19ace pid1: drop now-unused path parameter to add_dependency_by_name() 2018-09-15 19:57:52 +02:00
Zbigniew Jędrzejewski-Szmek fda09318e3 core: rename function to better reflect semantics 2018-08-20 10:43:31 +02:00
Lennart Poettering a3c1168ac2 core: rework StopWhenUnneeded= logic
Previously, we'd act immediately on StopWhenUnneeded= when a unit state
changes. With this rework we'll maintain a queue instead: whenever
there's the chance that StopWhenUneeded= might have an effect we enqueue
the unit, and process it later when we have nothing better to do.

This should make the implementation a bit more reliable, as the unit notify event
cannot immediately enqueue tons of side-effect jobs that might
contradict each other, but we do so only in a strictly ordered fashion,
from the main event loop.

This slightly changes the check when to consider a unit "unneeded".
Previously, we'd assume that a unit in "deactivating" state could also
be cleaned up. With this new logic we'll only consider units unneeded
that are fully up and have no job queued. This means that whenever
there's something pending for a unit we won't clean it up.
2018-08-10 16:19:01 +02:00
Yu Watanabe 845d247a3d tree-wide: use 'signed int' instead of 'int' for bit field variables
Suggested by LGTM: https://lgtm.com/rules/1506024027114/
2018-06-28 10:16:51 +02:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering ef31828d06 tree-wide: unify how we define bit mak enums
Let's always write "1 << 0", "1 << 1" and so on, except where we need
more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force
64bit values.
2018-06-12 21:44:00 +02:00
Lennart Poettering 04eb582acc core: enumerate perpetual units in a separate per-unit-type method
Previously the enumerate() callback defined for each unit type would do
two things:

1. It would create perpetual units (i.e. -.slice, system.slice, -.mount and
   init.scope)

2. It would enumerate units from /proc/self/mountinfo, /proc/swaps and
   the udev database

With this change these two parts are split into two seperate methods:
enumerate() now only does #2, while enumerate_perpetual() is responsible
for #1. Why make this change? Well, perpetual units should have a
slightly different effect that those found through enumeration: as
perpetual units should be up unconditionally, perpetually and thus never
change state, they should also not pull in deps by their state changing,
not even when the state is first set to active. Thus, their state is
generally initialized through the per-device coldplug() method in
similar  fashion to the deserialized state from a previous run would be
put into place. OTOH units found through regular enumeration should
result in state changes (and thus pull in deps due to state changes),
hence their state should be put in effect in the catchup() method
instead. Hence, given this difference, let's also separate the
functions, so that the rule is:

1. What is created in enumerate_perpetual() should be started in
   coldplug()

2. What is created in enumerate() should be started in catchup().
2018-06-07 15:29:17 +02:00
Lennart Poettering f0831ed2a0 core: add a new unit method "catchup()"
This is very similar to the existing unit method coldplug() but is
called a bit later. The idea is that that coldplug() restores the unit
state from before any prior reload/restart, i.e. puts the deserialized
state in effect. The catchup() call is then called a bit later, to
catch up with the system state for which we missed notifications while
we were reloading. This is only really useful for mount, swap and device
mount points were we should be careful to generate all missing unit
state change events (i.e. call unit_notify() appropriately) for
everything that happened while we were reloading.
2018-06-07 15:28:50 +02:00
Lennart Poettering bbf5fd8e41 core: subscribe to /etc/localtime timezone changes and update timer elapsation accordingly
Fixes: #8233

This is our first real-life usecase for the new sd_event_add_inotify()
calls we just added.
2018-06-06 10:53:56 +02:00
Lennart Poettering 50be4f4a46 core: rework how we track service and scope PIDs
This reworks how systemd tracks processes on cgroupv1 systems where
cgroup notification is not reliable. Previously, whenever we had reason
to believe that new processes showed up or got removed we'd scan the
cgroup of the scope or service unit for new processes, and would tidy up
the list of PIDs previously watched. This scanning is relatively slow,
and does not scale well. With this change behaviour is changed: instead
of scanning for new/removed processes right away we do this work in a
per-unit deferred event loop job. This event source is scheduled at a
very low priority, so that it is executed when we have time but does not
starve other event sources. This has two benefits: this expensive work is
coalesced, if events happen in quick succession, and we won't delay
SIGCHLD handling for too long.

This patch basically replaces all direct invocation of
unit_watch_all_pids() in scope.c and service.c with invocations of the
new unit_enqueue_rewatch_pids() call which just enqueues a request of
watching/tidying up the PID sets (with one exception: in
scope_enter_signal() and service_enter_signal() we'll still do
unit_watch_all_pids() synchronously first, since we really want to know
all processes we are about to kill so that we can track them properly.

Moreover, all direct invocations of unit_tidy_watch_pids() and
unit_synthesize_cgroup_empty_event() are removed too, when the
unit_enqueue_rewatch_pids() call is invoked, as the queued job will run
those operations too.

All of this is done on cgroupsv1 systems only, and is disabled on
cgroupsv2 systems as cgroup-empty notifications are reliable there, and
we do not need SIGCHLD events to track processes there.

Fixes: #9138
2018-06-05 22:06:48 +02:00
Lennart Poettering 2ad2e41a72 core: don't trigger OnFailure= deps when a unit is going to restart
This adds a flags parameter to unit_notify() which can be used to pass
additional notification information to the function. We the make the old
reload_failure boolean parameter one of these flags, and then add a new
flag that let's unit_notify() if we are configured to restart the
service.

Note that this adjusts behaviour of systemd to match what the docs say.

Fixes: #8398
2018-06-01 19:08:30 +02:00
Felipe Sateler 57b7a260c2 core: undo the dependency inversion between unit.h and all unit types 2018-05-15 14:24:34 -04:00
Lennart Poettering d4fd1cf208 core: enforce that scope units can be started only once
Scope units are populated from PIDs specified by the bus client. We do
that when a scope is started. We really shouldn't allow scopes to be
started multiple times, as the PIDs then might be heavily out of date.
Moreover, clients should have the guarantee that any scope they allocate
has a clear runtime cycle which is not repetitive.
2018-04-27 21:52:45 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Michal Sekletar 19496554e2 core: delay adding target dependencies until all units are loaded and aliases resolved (#8381)
Currently we add target dependencies while we are loading units. This
can create ordering loops even if configuration doesn't contain any
loop. Take for example following configuration,

$ systemctl get-default
multi-user.target

$ cat /etc/systemd/system/test.service
[Unit]
After=default.target

[Service]
ExecStart=/bin/true

[Install]
WantedBy=multi-user.target

If we encounter such unit file early during manager start-up (e.g. load
queue is dispatched while enumerating devices due to SYSTEMD_WANTS in
udev rules) we would add stub unit default.target and we order it Before
test.service. At the same time we add implicit Before to
multi-user.target. Later we merge two units and we create ordering cycle
in the process.

To fix the issue we will now never add any target dependencies until we
loaded all the unit files and resolved all the aliases.
2018-03-23 15:28:06 +01:00
Zbigniew Jędrzejewski-Szmek dc409696cf Introduce _cleanup_(unit_freep) 2018-03-11 16:33:58 +01:00
Lennart Poettering aa2b6f1d2b bpf: rework how we keep track and attach cgroup bpf programs
So, the kernel's management of cgroup/BPF programs is a bit misdesigned:
if you attach a BPF program to a cgroup and close the fd for it it will
stay pinned to the cgroup with no chance of ever removing it again (or
otherwise getting ahold of it again), because the fd is used for
selecting which BPF program to detach. The only way to get rid of the
program again is to destroy the cgroup itself.

This is particularly bad for root the cgroup (and in fact any other
cgroup that we cannot realistically remove during runtime, such as
/system.slice, /init.scope or /system.slice/dbus.service) as getting rid
of the program only works by rebooting the system.

To counter this let's closely keep track to which cgroup a BPF program
is attached and let's implicitly detach the BPF program when we are
about to close the BPF fd.

This hence changes the bpf_program_cgroup_attach() function to track
where we attached the program and changes bpf_program_cgroup_detach() to
use this information. Moreover bpf_program_unref() will now implicitly
call bpf_program_cgroup_detach().

In order to simplify things, bpf_program_cgroup_attach() will now
implicitly invoke bpf_program_load_kernel() when necessary, simplifying
the caller's side.

Finally, this adds proper reference counting to BPF programs. This
is useful for working with two BPF programs in parallel: the BPF program
we are preparing for installation and the BPF program we so far
installed, shortening the window when we detach the old one and reattach
the new one.
2018-02-21 16:43:36 +01:00
Lennart Poettering a94ab7acfd
Merge pull request #8175 from keszybz/gc-cleanup
Garbage collection cleanup
2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek 7f7d01ed58 pid1: include the source unit in UnitRef
No functional change.

The source unit manages the reference. It allocates the UnitRef structure and
registers it in the target unit, and then the reference must be destroyed
before the source unit is destroyed. Thus, is should be OK to include the
pointer to the source unit, it should be live as long as the reference exists.

v2:
- rename refs to refs_by_target
2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek f2f725e5cc pid1: rename unit_check_gc to unit_may_gc
"check" is unclear: what is true, what is false? Let's rename to "can_gc" and
revert the return value ("positive" values are easier to grok).

v2:
- rename from unit_can_gc to unit_may_gc
2018-02-15 13:04:12 +01:00
Lennart Poettering 6592b9759c core: add new new bus call for migrating foreign processes to scope/service units
This adds a new bus call to service and scope units called
AttachProcesses() that moves arbitrary processes into the cgroup of the
unit. The primary user for this new API is systemd itself: the systemd
--user instance uses this call of the systemd --system instance to
migrate processes if itself gets the request to migrate processes and
the kernel refuses this due to access restrictions.

The primary use-case of this is to make "systemd-run --scope --user …"
invoked from user session scopes work correctly on pure cgroupsv2
environments. There, the kernel refuses to migrate processes between two
unprivileged-owned cgroups unless the requestor as well as the ownership
of the closest parent cgroup all match. This however is not the case
between the session-XYZ.scope unit of a login session and the
user@ABC.service of the systemd --user instance.

The new logic always tries to move the processes on its own, but if
that doesn't work when being the user manager, then the system manager
is asked to do it instead.

The new operation is relatively restrictive: it will only allow to move
the processes like this if the caller is root, or the UID of the target
unit, caller and process all match. Note that this means that
unprivileged users cannot attach processes to scope units, as those do
not have "owning" users (i.e. they have now User= field).

Fixes: #3388
2018-02-12 11:34:00 +01:00
Lennart Poettering 1d9cc8768f cgroup: add a new "can_delegate" flag to the unit vtable, and set it for scope and service units only
Currently we allowed delegation for alluntis with cgroup backing
except for slices. Let's make this a bit more strict for now, and only
allow this in service and scope units.

Let's also add a generic accessor unit_cgroup_delegate() for checking
whether a unit has delegation turned on that checks the new bool first.

Also, when doing transient units, let's explcitly refuse turning on
delegation for unit types that don#t support it. This is mostly
cosmetical as we wouldn't act on the delegation request anyway, but
certainly helpful for debugging.
2018-02-12 11:34:00 +01:00
Lennart Poettering 81e9871e87 selinux: make sure we never use /dev/null for making unit selinux access decisions 2018-01-31 19:54:25 +01:00
Lennart Poettering adefcf2821 core: rework how we count the n_on_console counter
Let's add a per-unit boolean that tells us whether our unit is currently
counted or not. This way it's unlikely we get out of sync again and
things are generally more robust.

This also allows us to remove the counting logic specific to service
units (which was in fact mostly a copy from the generic implementation),
in favour of fully generic code.

Replaces: #7824
2018-01-24 20:14:51 +01:00
Lennart Poettering bb2c768545 core: add a new unit_needs_console() call
This call determines whether a specific unit currently needs access to
the console. It's a fancy wrapper around
exec_context_may_touch_console() ultimately, however for service units
we'll explicitly exclude the SERVICE_EXITED state from when we report
true.
2018-01-24 19:54:26 +01:00
Lennart Poettering 62a769136d core: rework how we track which PIDs to watch for a unit
Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit
interested in SIGCHLD events for them. This scheme allowed a specific
PID to be watched by exactly 0, 1 or 2 units.

With this rework this is replaced by a single hashmap which is primarily
keyed by the PID and points to a Unit interested in it. However, it
optionally also keyed by the negated PID, in which case it points to a
NULL terminated array of additional Unit objects also interested. This
scheme means arbitrary numbers of Units may now watch the same PID.

Runtime and memory behaviour should not be impact by this change, as for
the common case (i.e. each PID only watched by a single unit) behaviour
stays the same, but for the uncommon case (a PID watched by more than
one unit) we only pay with a single additional memory allocation for the
array.

Why this all? Primarily, because allowing exactly two units to watch a
specific PID is not sufficient for some niche cases, as processes can
belong to more than one unit these days:

1. sd_notify() with MAINPID= can be used to attach a process from a
   different cgroup to multiple units.

2. Similar, the PIDFile= setting in unit files can be used for similar
   setups,

3. By creating a scope unit a main process of a service may join a
   different unit, too.

4. On cgroupsv1 we frequently end up watching all processes remaining in
   a scope, and if a process opens lots of scopes one after the other it
   might thus end up being watch by many of them.

This patch hence removes the 2-unit-per-PID limit. It also makes a
couple of other changes, some of them quite relevant:

- manager_get_unit_by_pid() (and the bus call wrapping it) when there's
  ambiguity will prefer returning the Unit the process belongs to based on
  cgroup membership, and only check the watch-pids hashmap if that
  fails. This change in logic is probably more in line with what people
  expect and makes things more stable as each process can belong to
  exactly one cgroup only.

- Every SIGCHLD event is now dispatched to all units interested in its
  PID. Previously, there was some magic conditionalization: the SIGCHLD
  would only be dispatched to the unit if it was only interested in a
  single PID only, or the PID belonged to the control or main PID or we
  didn't dispatch a signle SIGCHLD to the unit in the current event loop
  iteration yet. These rules were quite arbitrary and also redundant as
  the the per-unit handlers would filter the PIDs anyway a second time.
  With this change we'll hence relax the rules: all we do now is
  dispatch every SIGCHLD event exactly once to each unit interested in
  it, and it's up to the unit to then use or ignore this. We use a
  generation counter in the unit to ensure that we only invoke the unit
  handler once for each event, protecting us from confusion if a unit is
  both associated with a specific PID through cgroup membership and
  through the "watch_pids" logic. It also protects us from being
  confused if the "watch_pids" hashmap is altered while we are
  dispatching to it (which is a very likely case).

- sd_notify() message dispatching has been reworked to be very similar
  to SIGCHLD handling now. A generation counter is used for dispatching
  as well.

This also adds a new test that validates that "watch_pid" registration
and unregstration works correctly.
2018-01-23 21:29:31 +01:00
Alan Jenkins 25cd49647c mount: forbid mount on path with symlinks
It was forbidden to create mount units for a symlink.  But the reason is
that the mount unit needs to know the real path that will appear in
/proc/self/mountinfo.  The kernel dereferences *all* the symlinks in the
path at mount time (I checked this with `mount -c` running under `strace`).

This will have no effect on most systems.  As recommended by docs, most
systems use /etc/fstab, as opposed to native mount unit files.
fstab-generator dereferences symlinks for backwards compatibility.

A relatively minor issue regarding Time Of Check / Time Of Use also exists
here.  I can't see how to get rid of it entirely.  If we pass an absolute
path to mount, the racing process can replace it with a symlink.  If we
chdir() to the mount point and pass ".", the racing process can move the
directory.  The latter might potentially be nicer, except that it breaks
WorkingDirectory=.

I'm not saying the race is relevant to security - I just want to consider
how bad the effect is.  Currently, it can make the mount unit active (and
hence the job return success), despite there never being a matching entry
in /proc/self/mountinfo.  This wart will be removed in the next commit;
i.e. it will make the mount unit fail instead.
2018-01-20 22:06:34 +00:00
Lennart Poettering db256aab13 core: be stricter when handling PID files and MAINPID sd_notify() messages
Let's be more restrictive when validating PID files and MAINPID=
messages: don't accept PIDs that make no sense, and if the configuration
source is not trusted, don't accept out-of-cgroup PIDs. A configuratin
source is considered trusted when the PID file is owned by root, or the
message was received from root.

This should lock things down a bit, in case service authors write out
PID files from unprivileged code or use NotifyAccess=all with
unprivileged code. Note that doing so was always problematic, just now
it's a bit less problematic.

When we open the PID file we'll now use the CHASE_SAFE chase_symlinks()
logic, to ensure that we won't follow an unpriviled-owned symlink to a
privileged-owned file thinking this was a valid privileged PID file,
even though it really isn't.

Fixes: #6632
2018-01-11 15:12:16 +01:00
Lennart Poettering 4c253ed1ca tree-wide: introduce new safe_fork() helper and port everything over
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:

1. Optionally resets signal handlers/mask in the child

2. Sets a name on all processes we fork off right after forking off (and
   the patch assigns useful names for all processes we fork off now,
   following a systematic naming scheme: always enclosed in () – in order
   to indicate that these are not proper, exec()ed processes, but only
   forked off children, and if the process is long-running with only our
   own code, without execve()'ing something else, it gets am "sd-" prefix.)

3. Optionally closes all file descriptors in the child

4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
   way so that the parent dying before this happens being handled
   safely.

5. Optionally reopens the logs

6. Optionally connects stdin/stdout/stderr to /dev/null

7. Debug logs about the forked off processes.
2017-12-25 11:48:21 +01:00
Michal Koutný deb4e7080d service: Don't stop unneeded units needed by restarted service (#7526)
An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes.
If A stops before B's restart timeout expires, it'll be started again as part
of B's dependent jobs. However, if stopping takes longer than the timeout, B's
running stop job collides start job which also cancels B's start job. Result is
that neither A or B are active.

Currently, when a service with automatic restarting fails, it transitions
through following states:
        1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure,
        2) SERVICE_AUTO_RESTART while restart timer is running.

The StopWhenUnneeded= check takes place in service_enter_dead between the two
state mentioned above. We temporarily store the auto restart flag to query it
during the check. Because we don't return control to the main event loop, this
new service unit flag needn't be serialized.

This patch prevents the pathologic situation when the service with Restart=
won't restart automatically. As a side effect it also avoid restarting the
dependency unit with StopWhenUnneeded=yes.

Fixes: #7377
2017-12-05 16:51:19 +01:00
Lennart Poettering 2e59b241ca core: add proper escaping to writing of drop-ins/transient unit files
This majorly refactors the transient unit file and drop-in writing
logic, so that we properly C-escape and specifier-escape (% → %%)
everything we write out, so that when we read it back again, specifiers
are parsed that aren't supposed to be parsed.

This renames unit_write_drop_in() and friends by unit_write_setting().
The name change is supposed to clarify that the functions are not only
used to write drop-in files, but also transient unit files.

The previous "mode" parameter to this function is replaced by a more
generic "flags", which knows additional flags for implicit C-style and
specifier escaping before writing things out. This can cover most
properties where either form of escaping is defined. For the cases where
this isn't sufficient, we add helpers unit_escape_setting() and
unit_concat_strv() for escaping individual strings or strvs properly.

While we are at it, we also prettify generation of transient unit files:
we try to reduce the number of section headers written out: previously
we'd write the right section header our for each setting. With this
change we do so only if the setting lives in a different section than
the one before.

(This should also be considered preparation for when we add proper APIs
to systemd to write normal, persistant unit files through the bus API)
2017-11-29 12:34:12 +01:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00