Commit graph

18411 commits

Author SHA1 Message Date
Djalal Harouni 8f44de08e9 seccomp: add sched_yield syscall to the @default syscall set 2017-10-04 10:41:42 +01:00
Zbigniew Jędrzejewski-Szmek 8bdaf088ca Merge pull request #6946 from poettering/synthesize-dns
Some DNS RR synthesizing fixes
2017-10-03 10:26:53 +02:00
Djalal Harouni 09d3020b0a seccomp: remove '@credentials' syscall set (#6958)
This removes the '@credentials' syscall set that was added in commit
v234-468-gcd0ddf6f75.

Most of these syscalls are so simple that we do not want to filter them.
They work on the current calling process, doing only read operations,
they do not have a deep kernel path.

The problem may only be in 'capget' syscall since it can query arbitrary
processes, and used to discover processes, however sending signal 0 to
arbitrary processes can be used to discover if a process exists or not.
It is unfortunate that Linux allows to query processes of different
users. Lets put it now in '@process' syscall set, and later we may add
it to a new '@basic-process' set that allows most basic process
operations.
2017-10-03 07:20:05 +02:00
Yu Watanabe 8502cadd4c Merge pull request #6940 from poettering/magic-dirs
make sure StateDirectory= and friends play nicely with DynamicUser= and RootImage=/RootDirectory=
2017-10-03 13:28:48 +09:00
Zbigniew Jędrzejewski-Szmek ebfa2c158e Merge pull request #6943 from poettering/dissect-ro
Automatically recognize that "squashfs" and "iso9660" area always read-only
2017-10-02 20:39:08 +02:00
Lennart Poettering c621849539 core: fix special directories for user services
The system paths were listed where the user paths should have been
listed. Correct that.
2017-10-02 17:41:44 +02:00
Lennart Poettering 2a5beb669f path-util: some updates to path_make_relative()
Don't miscount number of "../" to generate, if we "." is included in an
input path.

Also, refuse if we encounter "../" since we can't possibly follow that
up properly, without file system access.

Some other modernizations.
2017-10-02 17:41:44 +02:00
Lennart Poettering 091e9efed3 core: fix StateDirectory= (and friends) safety checks when decoding transient unit properties
Let's make sure relative directories such as "foo/bar" are accepted, by
using the same validation checks as in unit file parsing.
2017-10-02 17:41:44 +02:00
Lennart Poettering 8adb3d63e6 test: add test for DynamicUser= + StateDirectory=
Also, tests for DynamicUser= should really run for system mode, as we
allocate from a system resource.

(This also increases the test timeout to 2min. If one of our tests
really hangs then waiting for 2min longer doesn't hurt either. The old
2s is really short, given that we run in potentially slow VM
environments for this test. This becomes noticable when the slow "find"
command this adds is triggered)
2017-10-02 17:41:44 +02:00
Lennart Poettering e53c42ca0a core: pass the correct error to the caller 2017-10-02 17:41:44 +02:00
Lennart Poettering da50b85af7 core: when looking for a UID to use for a dynamic UID start with the current owner of the StateDirectory= and friends
Let's optimize dynamic UID allocation a bit: if a StateDirectory= (or
suchlike) is configured, we start our allocation loop from that UID and
use it if it currently isn't used otherwise. This is beneficial as it
saves us from having to expensively recursively chown() these
directories in the typical case (which StateDirectory= does when it
notices that the owner of the directory doesn't match the UID picked).

With this in place we now have the a three-phase logic for allocating a
dynamic UID:

a) first, we try to use the owning UID of StateDirectory=,
   CacheDirectory=, LogDirectory= if that exists and is currently
   otherwise unused.

b) if that didn't work out, we hash the UID from the service name

c) if that didn't yield an unused UID either, randomly pick new ones
   until we find a free one.
2017-10-02 17:41:44 +02:00
Lennart Poettering 6c47cd7d3b execute: make StateDirectory= and friends compatible with DynamicUser=1 and RootDirectory=/RootImage=
Let's clean up the interaction of StateDirectory= (and friends) to
DynamicUser=1: instead of creating these directories directly below
/var/lib, place them in /var/lib/private instead if DynamicUser=1 is
set, making that directory 0700 and owned by root:root. This way, if a
dynamic UID is later reused, access to the old run's state directory is
prohibited for that user. Then, use file system namespacing inside the
service to make /var/lib/private a readable tmpfs, hiding all state
directories that are not listed in StateDirectory=, and making access to
the actual state directory possible. Mount all directories listed in
StateDirectory= to the same places inside the service (which means
they'll now be mounted into the tmpfs instance). Finally, add a symlink
from the state directory name in /var/lib/ to the one in
/var/lib/private, so that both the host and the service can access the
path under the same location.

Here's an example: let's say a service runs with StateDirectory=foo.
When DynamicUser=0 is set, it will get the following setup, and no
difference between what the unit and what the host sees:

        /var/lib/foo (created as directory)

Now, if DynamicUser=1 is set, we'll instead get this on the host:

        /var/lib/private (created as directory with mode 0700, root:root)
        /var/lib/private/foo (created as directory)
        /var/lib/foo → private/foo (created as symlink)

And from inside the unit:

        /var/lib/private (a tmpfs mount with mode 0755, root:root)
        /var/lib/private/foo (bind mounted from the host)
        /var/lib/foo → private/foo (the same symlink as above)

This takes inspiration from how container trees are protected below
/var/lib/machines: they generally reuse UIDs/GIDs of the host, but
because /var/lib/machines itself is set to 0700 host users cannot access
files in the container tree even if the UIDs/GIDs are reused. However,
for this commit we add one further trick: inside and outside of the unit
/var/lib/private is a different thing: outside it is a plain,
inaccessible directory, and inside it is a world-readable tmpfs mount
with only the whitelisted subdirs below it, bind mounte din.  This
means, from the outside the dir acts as an access barrier, but from the
inside it does not. And the symlink created in /var/lib/foo itself
points across the barrier in both cases, so that root and the unit's
user always have access to these dirs without knowing the details of
this mounting magic.

This logic resolves a major shortcoming of DynamicUser=1 units:
previously they couldn't safely store persistant data. With this change
they can have their own private state, log and data directories, which
they can write to, but which are protected from UID recycling.

With this change, if RootDirectory= or RootImage= are used it is ensured
that the specified state/log/cache directories are always mounted in
from the host. This change of semantics I think is much preferable since
this means the root directory/image logic can be used easily for
read-only resource bundling (as all writable data resides outside of the
image). Note that this is a change of behaviour, but given that we
haven't released any systemd version with StateDirectory= and friends
implemented this should be a safe change to make (in particular as
previously it wasn't clear what would actually happen when used in
combination). Moreover, by making this change we can later add a "+"
modifier to these setings too working similar to the same modifier in
ReadOnlyPaths= and friends, making specified paths relative to the
container itself.
2017-10-02 17:41:44 +02:00
Lennart Poettering a227a4be48 namespace: if we can create the destination of bind and PrivateTmp= mounts
When putting together the namespace, always create the file or directory
we are supposed to bind mount on, the same way we do it for most other
stuff, for example mount units or systemd-nspawn's --bind= option.

This has the big benefit that we can use namespace bind mounts on dirs
in /tmp or /var/tmp even in conjunction with PrivateTmp=.
2017-10-02 17:41:43 +02:00
Lennart Poettering e908468b5b namespace: properly handle bind mounts from the host
Before this patch we had an ordering problem: if we have no namespacing
enabled except for two bind mounts that intend to swap /a and /b via
bind mounts, then we'd execute the bind mount binding /b to /a, followed
by thebind mount from /a to /b, thus having the effect that /b is now
visible in both /a and /b, which was not intended.

With this change, as soon as any bind mount is configured we'll put
together the service mount namespace in a temporary directory instead of
operating directly in the root. This solves the problem in a
straightforward fashion: the source of bind mounts will always refer to
the host, and thus be unaffected from the bind mounts we already
created.
2017-10-02 17:41:43 +02:00
Lennart Poettering 645767d6b5 namespace: create /dev, /proc, /sys when needed
We already create /dev implicitly if PrivateTmp=yes is on, if it is
missing. Do so too for the other two API VFS, as well as for /dev if
PrivateTmp=yes is off but MountAPIVFS=yes is on (i.e. when /dev is bind
mounted from the host).
2017-10-02 17:41:43 +02:00
Lennart Poettering 72fd17682d core: usually our enum's _INVALID and _MAX special values are named after the full type
In most cases we followed the rule that the special _INVALID and _MAX
values we use in our enums use the full type name as prefix (in contrast
to regular values that we often make shorter), do so for
ExecDirectoryType as well.

No functional changes, just a little bit of renaming to make this code
more like the rest.
2017-10-02 17:41:43 +02:00
Lennart Poettering a1164ae380 core: chown() StateDirectory= and friends recursively when starting a service
This is particularly useful when used in conjunction with DynamicUser=1,
where the UID might change for every invocation, but is useful in other
cases too, for example, when these directories are shared between
systems where the UID assignments differ slightly.
2017-10-02 17:41:43 +02:00
Lennart Poettering 64fbdc0f91 nspawn: properly report all kinds of changed UID/GID when patching things for userns
We forgot to propagate one chmod().
2017-10-02 17:41:43 +02:00
Jouke Witteveen df66b93fe2 service: better detect when a Type=notify service cannot become active anymore (#6959)
No need to wait for a timeout when we know things are not going to work out.
When the main process goes away and only notifications from the main process are
accepted, then we will not receive any notifications anymore.
2017-10-02 16:35:27 +02:00
Zbigniew Jędrzejewski-Szmek b139c95bc4 Merge pull request #6941 from andir/use-in_set
use IN_SET where possible
2017-10-02 15:08:10 +02:00
Zbigniew Jędrzejewski-Szmek d29ad81e3b Minor line wrapping adjustment 2017-10-02 14:52:12 +02:00
Andreas Rammhold ec2ce0c5d7
tree-wide: use !IN_SET(..) for a != b && a != c && …
The included cocci was used to generate the changes.

Thanks to @flo-wer for pointing this case out.
2017-10-02 13:09:56 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Lennart Poettering b13ddbbcf3 service: accept the fact that the three xyz_good() functions return ints
Currently, all three of cgroup_good(), main_pid_good(),
control_pid_good() all return an "int" (two of them propagate errors).
It's a good thing to keep the three functions similar, so let's leave it
at that, but then let's clean up the invocation of the three functions
so that they always clearly acknowledge that the return value is not a
bool, but potentially negative.
2017-10-02 12:58:42 +02:00
Lennart Poettering 019be28676 service: drop _pure_ decorator on static function
The compiler should be good enough to figure this out on its own if this
is a static function, and it makes control_pid_good() an outlier anyway,
and decorators like this tend to bitrot. Hence, to keep things simple
and automatic, let's just drop the decorator.
2017-10-02 12:58:42 +02:00
Lennart Poettering 3c751b1bfa service: a cgroup empty notification isn't reason enough to go down
The processes associated with a service are not just the ones in its
cgroup, but also the control and main processes, which might possibly
live outside of it, for example if they transitioned into their own
cgroups because they registered a PAM session of their own. Hence, if we
get a cgroup empty notification always check if the main PID is still
around before taking action too eagerly.

Fixes: #6045
2017-10-02 12:58:42 +02:00
Lennart Poettering 07697d7ec5 service: add explanatory comments to control_pid_good() and cgroup_good()
Let's add a similar comment to each as we already have for
main_pid_good(), emphasizing that these functions are supposed to be
have very similar.
2017-10-02 12:58:42 +02:00
Lennart Poettering 51894d706f service: fix main_pid_good() comment
We don't actually return -1, don't claim that.
2017-10-02 12:58:37 +02:00
Lennart Poettering a4f3375d72 resolved: synthesize records for the full local hostname, too
This was forgotten, let's add it too, so that the llmnr, mdns and full
hostname RRs are all synthesized if needed.
2017-09-29 18:05:51 +02:00
Lennart Poettering 2855b6c39c resolved: make sure a non-existing PTR record never gets mangled into NODATA
Previously, if a PTR query is seen for a non-existing record, we'd
generate an empty response (but not NXDOMAIN or so). Fix that. If we
have no data about an IP address, then let's say so, so that the
original error is returned, instead of anything synthesized.

Fixes: #6543
2017-09-29 18:02:31 +02:00
Lennart Poettering acf06088d3 resolved: when there is no gateway, make sure _gateway results in NXDOMAIN
Let's ensure that "no gateway" translates to "no domain", instead of an
empty reply. This is in line with what nss-myhostname does in the same
case, hence let's unify behaviour here of nss-myhostname and resolved.
2017-09-29 18:01:04 +02:00
Lennart Poettering 21a13ab054 sd-bus: drop bloom fields
These fields are unused since kdbus support has been removed.
2017-09-29 17:58:11 +02:00
Lennart Poettering 532f808fd1 sd-bus: drop match cookie concept
THe match cookie was used by kdbus to identify matches we install
uniquely. But given that kdbus is gone, the cookie serves no process
anymore, let's kill it.
2017-09-29 17:57:34 +02:00
Lennart Poettering e28d086553 sd-bus: when showing brief message info show error name in debug out put too
When debug logging is enabled we show brief information about every bus
message we send or receieve. Pretty much all information is shown,
except for the error name if a message is an error (interestingly we do
print the error text however). Fix that, and add the error name as well.
2017-09-29 17:48:29 +02:00
Lennart Poettering 7941e2189b mount-util: add fusectl to list of API VFS 2017-09-29 14:36:33 +02:00
Lennart Poettering 154d22695a dissect: split list of discard-supporting fs out into mount-util.c
Let's manage the list of file systems that do a specific thing at one
place, following similar naming.

No functional changes.
2017-09-29 14:36:29 +02:00
Lennart Poettering 896f937f58 dissect: automatically mark partitions read-only that have a read-only file system
Specifically, squashfs and iso9660 are always read-only, hence make sure
we never even think about mounting them writable.
2017-09-29 14:36:29 +02:00
Zbigniew Jędrzejewski-Szmek 56d50ab1d3 meson: move library version defines to the top (#6939) 2017-09-28 19:24:16 +02:00
Lennart Poettering aecc6f6b34 Merge pull request #6933 from yuwata/use_in_set
use IN_SET macro
2017-09-28 19:22:09 +02:00
Yu Watanabe 945c2931bb libsystemd: use IN_SET macro 2017-09-28 17:37:59 +09:00
Lennart Poettering cd4826e0e6 Merge pull request #6924 from andir/vrf-dhcpv4
networkd: use VRFs routing table for DHCP routes
2017-09-28 09:46:03 +02:00
Franck Bui 7e760b79ad udev-rules: all values can contain escaped double quotes now (#6890)
This is primarly useful to support escaped double quotes in PROGRAM or
IMPORT{program} directives.

The only possibilty before this patch was to use an external shell script but
this seems too cumbersome for trivial logics such as

 PROGRAM=="/bin/sh -c 'FOO=\"%s{model}\"; echo ${FOO:0:4}'"

or any similar shell constructs that needs to deals with patterns including
whitespaces.

As it's the case for single quote and for directives running a program, words
within escaped double quotes will be considered as a single argument.

Fixes: #6835
2017-09-28 08:53:46 +02:00
Zbigniew Jędrzejewski-Szmek 9500b9209b Merge pull request #6928 from poettering/cgroup-empty-race
rework cgroup empty notification handling (i.e. a fix for #6608)
2017-09-28 08:48:21 +02:00
Yu Watanabe 74dead3461 networkd: use assert_not_reached() 2017-09-28 15:38:50 +09:00
Yu Watanabe 25a1bffc8f networkd: use IN_SET macro 2017-09-28 15:37:38 +09:00
Andreas Rammhold fc1ba79d65 networkd: use VRFs routing table for DHCP routes
When an interface has been enslaved to a VRF the received routes should
be added to the VRFs RT instead of the main table.

This change modifies the default behaviour of routes in the case where a
network belongs to an VRF.  When the user does not configure a
`DHCP.RouteTable` in a `systemd.network` file and the interface belongs
to a VRF, the VRFs routing table is used instead of RT_TABLE_MAIN.

When the user has configured a custom routing table for DHCP the VRFs
table is ignored and the users preference takes precedence.
2017-09-27 20:02:15 +02:00
Zbigniew Jędrzejewski-Szmek 7e56da12e8 Merge pull request #6922 from poettering/symlink-sockets
Fixes for Symlinks= handling in socket units
2017-09-27 19:37:25 +02:00
Lennart Poettering ed77d407d3 core: log unit failure with type-specific result code
This slightly changes how we log about failures. Previously,
service_enter_dead() would log that a service unit failed along with its
result code, and unit_notify() would do this again but without the
result code. For other unit types only the latter would take effect.

This cleans this up: we keep the message in unit_notify() only for debug
purposes, and add type-specific log lines to all our unit types that can
fail, and always place them before unit_notify() is invoked.

Or in other words: the duplicate log message for service units is
removed, and all other unit types get a more useful line with the
precise result code.
2017-09-27 18:26:18 +02:00
Lennart Poettering 84b26d5149 core: free_and_strdup() FTW! 2017-09-27 18:26:18 +02:00
Lennart Poettering 4724964040 cgroup: IN_SET() FTW! 2017-09-27 18:26:18 +02:00