Commit graph

279 commits

Author SHA1 Message Date
Lennart Poettering 2ae7ee58fa bpf: beef up bpf detection, check if BPF_F_ALLOW_MULTI is supported
This improves the BPF/cgroup detection logic, and looks whether
BPF_ALLOW_MULTI is supported. This flag allows execution of multiple
BPF filters in a recursive fashion for a whole cgroup tree. It enables
us to properly report IP accounting for slice units, as well as
delegation of BPF support to units without breaking our own IP
accounting.
2018-02-21 16:43:36 +01:00
Lennart Poettering a94ab7acfd
Merge pull request #8175 from keszybz/gc-cleanup
Garbage collection cleanup
2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek 7f7d01ed58 pid1: include the source unit in UnitRef
No functional change.

The source unit manages the reference. It allocates the UnitRef structure and
registers it in the target unit, and then the reference must be destroyed
before the source unit is destroyed. Thus, is should be OK to include the
pointer to the source unit, it should be live as long as the reference exists.

v2:
- rename refs to refs_by_target
2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek f2f725e5cc pid1: rename unit_check_gc to unit_may_gc
"check" is unclear: what is true, what is false? Let's rename to "can_gc" and
revert the return value ("positive" values are easier to grok).

v2:
- rename from unit_can_gc to unit_may_gc
2018-02-15 13:04:12 +01:00
Lennart Poettering 004c7f169e core: fold manager_set_exec_params() into unit_set_exec_params()
Let's simplify things a bit: we so far called both functions every
single time, let's just merge one into the other, so that we have fewer
functions to call.
2018-02-12 11:34:00 +01:00
Yu Watanabe e8a565cb66 core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.

This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.

Fixes #7790.
2018-02-06 16:00:34 +09:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Lennart Poettering 4c253ed1ca tree-wide: introduce new safe_fork() helper and port everything over
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:

1. Optionally resets signal handlers/mask in the child

2. Sets a name on all processes we fork off right after forking off (and
   the patch assigns useful names for all processes we fork off now,
   following a systematic naming scheme: always enclosed in () – in order
   to indicate that these are not proper, exec()ed processes, but only
   forked off children, and if the process is long-running with only our
   own code, without execve()'ing something else, it gets am "sd-" prefix.)

3. Optionally closes all file descriptors in the child

4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
   way so that the parent dying before this happens being handled
   safely.

5. Optionally reopens the logs

6. Optionally connects stdin/stdout/stderr to /dev/null

7. Debug logs about the forked off processes.
2017-12-25 11:48:21 +01:00
Yu Watanabe 845001221d core/socket: shorten socket_fdname() 2017-12-23 19:32:40 +09:00
Yu Watanabe 827d9bf297 core/socket: dump more settings 2017-12-23 19:32:38 +09:00
Yu Watanabe e045e325df basic: introduce socket_protocol_{from,to}_name()
And use them where they can be applicable.
2017-12-23 19:32:04 +09:00
Yu Watanabe 9c0320e7ab core: implement transient socket unit 2017-12-23 18:47:33 +09:00
Yu Watanabe 038ed5a4b6 core/socket: add socket_port_type_from_string() 2017-12-23 18:46:16 +09:00
Yu Watanabe 836bb1cd42 core:socket: fix string in socket_exec_command_table 2017-12-23 18:45:59 +09:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Lennart Poettering 3c7416b6ca core: unify common code for preparing for forking off unit processes
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.

This allows us to neatly unify a bit of code between unit types and
shorten our code.
2017-11-21 11:54:08 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Lennart Poettering eef85c4a3f core: track why unit dependencies came to be
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.

The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.

Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.

Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:

1. We can fix UDEV_WANTS dependency generation: so far we kept adding
   dependencies configured that way, but if a device lost such a
   dependency we couldn't them again as there was no scheme for removing
   of dependencies in place.

2. We can implement "pin-pointed" reload of unit files. If we know what
   dependencies were created as result of configuration in a unit file,
   then we know what to flush out when we want to reload it.

3. It's useful for debugging: "systemd-analyze dump" now shows
   this information, helping substantially with understanding how
   systemd's dependency tree came to be the way it came to be.
2017-11-10 19:45:29 +01:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00
Andreas Rammhold ec2ce0c5d7
tree-wide: use !IN_SET(..) for a != b && a != c && …
The included cocci was used to generate the changes.

Thanks to @flo-wer for pointing this case out.
2017-10-02 13:09:56 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Zbigniew Jędrzejewski-Szmek 9500b9209b Merge pull request #6928 from poettering/cgroup-empty-race
rework cgroup empty notification handling (i.e. a fix for #6608)
2017-09-28 08:48:21 +02:00
Lennart Poettering ed77d407d3 core: log unit failure with type-specific result code
This slightly changes how we log about failures. Previously,
service_enter_dead() would log that a service unit failed along with its
result code, and unit_notify() would do this again but without the
result code. For other unit types only the latter would take effect.

This cleans this up: we keep the message in unit_notify() only for debug
purposes, and add type-specific log lines to all our unit types that can
fail, and always place them before unit_notify() is invoked.

Or in other words: the duplicate log message for service units is
removed, and all other unit types get a more useful line with the
precise result code.
2017-09-27 18:26:18 +02:00
Lennart Poettering 22b20752e2 socket: if RemoveOnStop= is turned on for a socket, try to unlink() pre-existing symlinks
Normally, Symlinks= failing is not considered fatal nor destructive.
Let's slightly alter behaviour here if RemoveOnStop= is turned on. In
that case the use in a way opted for destructive behaviour and we do
unlink all sockets and symlinks when the socket unit goes down. And that
means we might as well unlink any pre-existing if this mode is selected.

Yeah, it's a bit of a stretch to do this, but @OhNoMoreGit is right: if
RemoveOnStop= is on we are destructive regarding any pre-existing
symlinks on stop, and it would be quite weird if we wouldn't be on
start.
2017-09-27 17:53:00 +02:00
Lennart Poettering 1af87ab7d6 socket: create leading directories for socket symlinks
It really doesn't hurt creating prefix directories if necessary, as we
tend to do that for other file nodes we create, too.

Fixes: #6920
2017-09-27 17:53:00 +02:00
Lennart Poettering 95f7fbbf88 socket: make sure we warn loudly about symlinks we can't create
Note that this change does not make symlink creation failing fatal. I am
not entirely sure about whether it should be, but I am leaning towards
not making it fatal for two reasons: symlinks like this tend to be a
compatibility feature, and hence unlikely to be essential for operation,
in a way this breaks compatibility, and while doing that is not off the
table, we should probably avoid it if we are not entirely sure it's a
good thing.

Note that this also changes plain symlink() to symlink_idempotent() so
that existing symlinks with the right destination are nothing we log
about.

Fixes: #6920
2017-09-27 17:53:00 +02:00
Jan Synacek 0cde65e263 test-cpu-set-util.c: fix typo in comment (#6916) 2017-09-26 16:07:34 +02:00
Lennart Poettering 88af31f922 socket: assign socket units to a default slice unconditionally
Due to the chown() logic socket units might end up with processes even
if no explicit command is defined for them, hence let's make sure these
processes are in the right cgroup, and that means within a slice.

Mount, swap and service units unconditionally are assigned to a slice
already, let's do the same here, too.

(This becomes more important as soon as the ebpf/firewall stuff is
merged, as there'll be another reason to fork off processes then)
2017-09-22 20:09:21 +02:00
Lennart Poettering a79279c7fd core: when creating the socket fds for a socket unit, join socket's cgroup first
Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny=
settings are in effect on all socket fds associated with it. In order to
make this happen we need to make sure the cgroup the fds are associated
with are the socket unit's cgroup. The only way to do that is invoking
socket()+accept() in them. Since we really don't want to migrate PID 1
around we do this by forking off a helper process, which invokes
socket()/accept() and sends the newly created fd to PID 1. Ugly, but
works, and there's apparently no better way right now.

This generalizes forking off per-unit helper processes in a new function
unit_fork_helper_process(), which is then also used by the NSS chown()
code of socket units.
2017-09-22 15:24:55 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Lennart Poettering 18f573aaf9 core: make sure to dump cgroup context when unit_dump() is called for all unit types
For some reason we didn't dump the cgroup context for a number of unit
types, including service units. Not sure how this wasn't noticed
before... Add this in.
2017-09-22 15:24:54 +02:00
Lennart Poettering 1703fa41a7 core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
2017-08-10 15:02:50 +02:00
Lennart Poettering f0d477979e core: introduce unit_set_exec_params()
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
2017-08-10 15:02:50 +02:00
Lennart Poettering 19bbdd985e core: manager_set_exec_params() cannot fail, hence make it void
Let's simplify things a bit.
2017-08-10 15:02:50 +02:00
Lennart Poettering 584b8688d1 execute: also fold the cgroup delegate bit into ExecFlags 2017-08-10 15:02:50 +02:00
Lennart Poettering 3ed0cd26ea execute: replace command flag bools by a flags field
This way, we can extend it later on in an easier way, and can pass it
along nicely.
2017-08-10 14:44:58 +02:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Zbigniew Jędrzejewski-Szmek e3f791a2b3 basic/path-util: allow flags for path_equal_or_files_same
No functional change, just a new parameters and the tests that
AT_SYMLINK_NOFOLLOW works as expected.
2017-06-17 12:37:16 -04:00
AsciiWolf 13e785f7a0 Fix missing space in comments (#5439) 2017-02-24 18:14:02 +01:00
Lennart Poettering 1c876927e4 copy: change the various copy_xyz() calls to take a unified flags parameter
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
2017-02-17 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek ec251fe7d5 tree-wide: adjust fall through comments so that gcc is happy
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.

(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
2017-01-31 14:04:55 -05:00
Zbigniew Jędrzejewski-Szmek da3bddc993 core: add missing "=" in message
For consistency. Also drop "e.g." because it's somewhat redundant with the
ellipsis and the message is pretty long already.

Follow-up for 4d1fe20a58.
2017-01-11 16:37:34 -05:00
Stefan Hajnoczi 359a5bcf78 core: add AF_VSOCK support to socket units
Accept AF_VSOCK listen addresses in socket unit files.  Both guest and
host can now take advantage of socket activation.

The QEMU guest agent has recently been modified to support socket
activation and can run over AF_VSOCK with this patch.
2017-01-10 15:29:04 +00:00
Lennart Poettering 41733ae1e0 core: fix sockaddr length calculation for sockaddr_pretty() (#4966)
Let's simply store the socket address length in the SocketPeer object so
that we can use it when invoking sockaddr_pretty():

This fixes the issue described in #4943, but avoids calling
getpeername() twice.
2016-12-29 11:21:37 +01:00
Lennart Poettering 4d1fe20a58 core: improve log message about missing Listen setting (#4988)
Fixes: #4987
2016-12-29 10:39:30 +01:00
Stefan Hajnoczi b9495e8d58 core: prevent invalid socket symlink target dereference (#4895)
socket_find_symlink_target() returns a pointer to
p->address.sockaddr.un.sun_path when the first byte is non-zero without
checking that this is AF_UNIX socket.  Since sockaddr is a union this
byte could be non-zero for AF_INET sockets.

Existing callers happen to be safe but is an accident waiting to happen.
Use socket_address_get_path() since it checks for AF_UNIX.
2016-12-16 11:20:27 +01:00