Merge pull request #6940 from poettering/magic-dirs

make sure StateDirectory= and friends play nicely with DynamicUser= and RootImage=/RootDirectory=
This commit is contained in:
Yu Watanabe 2017-10-03 13:28:48 +09:00 committed by GitHub
commit 8502cadd4c
24 changed files with 836 additions and 139 deletions

20
TODO
View file

@ -24,6 +24,19 @@ Janitorial Clean-ups:
Features:
* maybe set a new set of env vars for services, based on RuntimeDirectory=,
StateDirectory=, LogsDirectory=, CacheDirectory= and ConfigurationDirectory=
automatically. For example, there could be $RUNTIME_DIRECTORY,
$STATE_DIRECTORY, $LOGS_DIRECTORY=, $CACHE_DIRECTORY and
$CONFIGURATION_DIRECTORY or so. This could be useful to write services that
can adapt to varying directories for these purposes. Special care has to be
taken if multiple dirs are configured. Maybe avoid setting the env vars in
that case?
* In a similar vein, consider adding unit specifiers that resolve to the root
directory used for state, logs, cache and configuration
directory. i.e. similar to %t, but for the root of the other special dirs.
* expose IO accounting data on the bus, show it in systemd-run --wait and log
about it in the resource log message
@ -33,10 +46,6 @@ Features:
* show whether a service has out-of-date configuration in "systemctl status" by
using mtime data of ConfigurationDirectory=.
* Properly chmod() RuntimeDirectory=, StateDirectory=, LogsDirectory= and
CacheDirectory= when we start up and the directory isn't properly owned. In
particular to make DynamicUser= work
* replace all uses of fgets() + LINE_MAX by read_line()
* set IPAddressDeny=any on all services that shouldn't do networking (possibly
@ -176,9 +185,6 @@ Features:
* DeviceAllow= should also generate seccomp filters for mknod()
* Add DataDirectory=, CacheDirectory= and LogDirectory= to match
RuntimeDirectory=, and create it as necessary when starting a service, owned by the right user.
* make sure the ratelimit object can deal with USEC_INFINITY as way to turn off things
* journalctl: make sure -f ends when the container indicated by -M terminates

View file

@ -220,10 +220,13 @@
cannot leave files around after unit termination. Moreover <varname>ProtectSystem=strict</varname> and
<varname>ProtectHome=read-only</varname> are implied, thus prohibiting the service to write to arbitrary file
system locations. In order to allow the service to write to certain directories, they have to be whitelisted
using <varname>ReadWritePaths=</varname>, but care must be taken so that UID/GID recycling doesn't
create security issues involving files created by the service. Use <varname>RuntimeDirectory=</varname> (see
below) in order to assign a writable runtime directory to a service, owned by the dynamic user/group and
removed automatically when the unit is terminated. Defaults to off.</para></listitem>
using <varname>ReadWritePaths=</varname>, but care must be taken so that UID/GID recycling doesn't create
security issues involving files created by the service. Use <varname>RuntimeDirectory=</varname> (see below) in
order to assign a writable runtime directory to a service, owned by the dynamic user/group and removed
automatically when the unit is terminated. Use <varname>StateDirectory=</varname>,
<varname>CacheDirectory=</varname> and <varname>LogsDirectory=</varname> in order to assign a set of writable
directories for specific purposes to the service in a way that they are protected from vulnerabilities due to
UID reuse (see below). Defaults to off.</para></listitem>
</varlistentry>
<varlistentry>
@ -1753,23 +1756,58 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
<varlistentry>
<term><varname>RuntimeDirectory=</varname></term>
<term><varname>StateDirectory=</varname></term>
<term><varname>CacheDirectory=</varname></term>
<term><varname>LogsDirectory=</varname></term>
<term><varname>ConfigurationDirectory=</varname></term>
<listitem><para>Takes a whitespace-separated list of directory names. The specified directory names must be
relative, and may not include <literal>.</literal> or <literal>..</literal>. If set, one or more directories
including their parents by the specified names will be created below <filename>/run</filename> (for system
services) or below <varname>$XDG_RUNTIME_DIR</varname> (for user services) when the unit is started. The
lowest subdirectories are removed when the unit is stopped. It is possible to preserve the directories if
<varname>RuntimeDirectoryPreserve=</varname> is configured to <option>restart</option> or <option>yes</option>.
The lowest subdirectories will have the access mode specified in <varname>RuntimeDirectoryMode=</varname>,
and be owned by the user and group specified in <varname>User=</varname> and <varname>Group=</varname>.
This implies <varname>ReadWritePaths=</varname>, that is, the directories specified
in this option are accessible with the access mode specified in <varname>RuntimeDirectoryMode=</varname>
even if <varname>ProtectSystem=</varname> is set to <option>strict</option>.
Use this to manage one or more runtime directories of the unit and bind their
lifetime to the daemon runtime. This is particularly useful for unprivileged daemons that cannot create
<listitem><para>These options take a whitespace-separated list of directory names. The specified directory
names must be relative, and may not include <literal>.</literal> or <literal>..</literal>. If set, one or more
directories by the specified names will be created (including their parents) below <filename>/run</filename>
(or <varname>$XDG_RUNTIME_DIR</varname> for user services), <filename>/var/lib</filename> (or
<varname>$XDG_CONFIG_HOME</varname> for user services), <filename>/var/cache</filename> (or
<varname>$XDG_CACHE_HOME</varname> for user services), <filename>/var/log</filename> (or
<varname>$XDG_CONFIG_HOME</varname><filename>/log</filename> for user services), or <filename>/etc</filename>
(or <varname>$XDG_CONFIG_HOME</varname> for user services), respectively, when the unit is started.</para>
<para>In case of <varname>RuntimeDirectory=</varname> the lowest subdirectories are removed when the unit is
stopped. It is possible to preserve the specified directories in this case if
<varname>RuntimeDirectoryPreserve=</varname> is configured to <option>restart</option> or <option>yes</option>
(see below). The directories specified with <varname>StateDirectory=</varname>,
<varname>CacheDirectory=</varname>, <varname>LogsDirectory=</varname>,
<varname>ConfigurationDirectory=</varname> are not removed when the unit is stopped.</para>
<para>Except in case of <varname>ConfigurationDirectory=</varname>, the innermost specified directories will be
owned by the user and group specified in <varname>User=</varname> and <varname>Group=</varname>. If the
specified directories already exist and their owning user or group do not match the configured ones, all files
and directories below the specified directories as well as the directories themselves will have their file
ownership recursively changed to match what is configured. As an optimization, if the specified directories are
already owned by the right user and group, files and directories below of them are left as-is, even if they do
not match what is requested. The innermost specified directories will have their access mode adjusted to the
what is specified in <varname>RuntimeDirectoryMode=</varname>, <varname>StateDirectoryMode=</varname>,
<varname>CacheDirectoryMode=</varname>, <varname>LogsDirectoryMode=</varname> and
<varname>ConfigurationDirectoryMode=</varname>.</para>
<para>Except in case of <varname>ConfigurationDirectory=</varname>, these options imply
<varname>ReadWritePaths=</varname> for the specified paths. When combined with
<varname>RootDirectory=</varname> or <varname>RootImage=</varname> these paths always reside on the host and
are mounted from there into the unit's file system namespace. If <varname>DynamicUser=</varname> is used in
conjunction with <varname>RuntimeDirectory=</varname>, <varname>StateDirectory=</varname>,
<varname>CacheDirectory=</varname> and <varname>LogsDirectory=</varname>, the behaviour of these options is
slightly altered: the directories are created below <filename>/run/private</filename>,
<filename>/var/lib/private</filename>, <filename>/var/cache/private</filename> and
<filename>/var/log/private</filename>, respectively, which are host directories made inaccessible to
unprivileged users, which ensures that access to these directories cannot be gained through dynamic user ID
recycling. Symbolic links are created to hide this difference in behaviour. Both from perspective of the host
and from inside the unit, the relevant directories hence always appear directly below
<filename>/run</filename>, <filename>/var/lib</filename>, <filename>/var/cache</filename> and
<filename>/var/log</filename>.</para>
<para>Use <varname>RuntimeDirectory=</varname> to manage one or more runtime directories for the unit and bind
their lifetime to the daemon runtime. This is particularly useful for unprivileged daemons that cannot create
runtime directories in <filename>/run</filename> due to lack of privileges, and to make sure the runtime
directory is cleaned up automatically after use. For runtime directories that require more complex or
different configuration or lifetime guarantees, please consider using
directory is cleaned up automatically after use. For runtime directories that require more complex or different
configuration or lifetime guarantees, please consider using
<citerefentry><refentrytitle>tmpfiles.d</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
<para>Example: if a system service unit has the following,
@ -1779,22 +1817,7 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
except <filename>/run/foo</filename> are owned by the user and group specified in <varname>User=</varname> and
<varname>Group=</varname>, and removed when the service is stopped.
</para></listitem>
</varlistentry>
<varlistentry>
<term><varname>StateDirectory=</varname></term>
<term><varname>CacheDirectory=</varname></term>
<term><varname>LogsDirectory=</varname></term>
<term><varname>ConfigurationDirectory=</varname></term>
<listitem><para>Takes a whitespace-separated list of directory names. If set, as similar to
<varname>RuntimeDirectory=</varname>, one or more directories including their parents by the specified names
will be created below <filename>/var/lib</filename>, <filename>/var/cache</filename>, <filename>/var/log</filename>,
or <filename>/etc</filename>, respectively, when the unit is started.
Unlike <varname>RuntimeDirectory=</varname>, the directories are not removed when the unit is stopped.
The lowest subdirectories will be owned by the user and group specified in <varname>User=</varname>
and <varname>Group=</varname>. The options imply <varname>ReadWritePaths=</varname>.
</para></listitem>
</varlistentry>
<varlistentry>

View file

@ -131,8 +131,7 @@ int path_make_relative(const char *from_dir, const char *to_path, char **_r) {
/* Skip the common part. */
for (;;) {
size_t a;
size_t b;
size_t a, b;
from_dir += strspn(from_dir, "/");
to_path += strspn(to_path, "/");
@ -144,7 +143,6 @@ int path_make_relative(const char *from_dir, const char *to_path, char **_r) {
else
/* from_dir is a parent directory of to_path. */
r = strdup(to_path);
if (!r)
return -ENOMEM;
@ -175,21 +173,32 @@ int path_make_relative(const char *from_dir, const char *to_path, char **_r) {
/* Count the number of necessary ".." elements. */
for (n_parents = 0;;) {
size_t w;
from_dir += strspn(from_dir, "/");
if (!*from_dir)
break;
from_dir += strcspn(from_dir, "/");
n_parents++;
w = strcspn(from_dir, "/");
/* If this includes ".." we can't do a simple series of "..", refuse */
if (w == 2 && from_dir[0] == '.' && from_dir[1] == '.')
return -EINVAL;
/* Count number of elements, except if they are "." */
if (w != 1 || from_dir[0] != '.')
n_parents++;
from_dir += w;
}
r = malloc(n_parents * 3 + strlen(to_path) + 1);
r = new(char, n_parents * 3 + strlen(to_path) + 1);
if (!r)
return -ENOMEM;
for (p = r; n_parents > 0; n_parents--, p += 3)
memcpy(p, "../", 3);
for (p = r; n_parents > 0; n_parents--)
p = mempcpy(p, "../", 3);
strcpy(p, to_path);
path_kill_slashes(r);

152
src/core/chown-recursive.c Normal file
View file

@ -0,0 +1,152 @@
/***
This file is part of systemd.
Copyright 2017 Lennart Poettering
systemd is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or
(at your option) any later version.
systemd is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with systemd; If not, see <http://www.gnu.org/licenses/>.
***/
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "user-util.h"
#include "macro.h"
#include "fd-util.h"
#include "dirent-util.h"
#include "chown-recursive.h"
static int chown_one(int fd, const char *name, const struct stat *st, uid_t uid, gid_t gid) {
int r;
assert(fd >= 0);
assert(st);
if ((!uid_is_valid(uid) || st->st_uid == uid) &&
(!gid_is_valid(gid) || st->st_gid == gid))
return 0;
if (name)
r = fchownat(fd, name, uid, gid, AT_SYMLINK_NOFOLLOW);
else
r = fchown(fd, uid, gid);
if (r < 0)
return -errno;
/* The linux kernel alters the mode in some cases of chown(). Let's undo this. */
if (name) {
if (!S_ISLNK(st->st_mode))
r = fchmodat(fd, name, st->st_mode, 0);
else /* There's currently no AT_SYMLINK_NOFOLLOW for fchmodat() */
r = 0;
} else
r = fchmod(fd, st->st_mode);
if (r < 0)
return -errno;
return 1;
}
static int chown_recursive_internal(int fd, const struct stat *st, uid_t uid, gid_t gid) {
bool changed = false;
int r;
assert(fd >= 0);
assert(st);
if (S_ISDIR(st->st_mode)) {
_cleanup_closedir_ DIR *d = NULL;
struct dirent *de;
d = fdopendir(fd);
if (!d) {
r = -errno;
goto finish;
}
fd = -1;
FOREACH_DIRENT_ALL(de, d, r = -errno; goto finish) {
struct stat fst;
if (dot_or_dot_dot(de->d_name))
continue;
if (fstatat(dirfd(d), de->d_name, &fst, AT_SYMLINK_NOFOLLOW) < 0) {
r = -errno;
goto finish;
}
if (S_ISDIR(fst.st_mode)) {
int subdir_fd;
subdir_fd = openat(dirfd(d), de->d_name, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC|O_NOFOLLOW|O_NOATIME);
if (subdir_fd < 0) {
r = -errno;
goto finish;
}
r = chown_recursive_internal(subdir_fd, &fst, uid, gid);
if (r < 0)
goto finish;
if (r > 0)
changed = true;
} else {
r = chown_one(dirfd(d), de->d_name, &fst, uid, gid);
if (r < 0)
goto finish;
if (r > 0)
changed = true;
}
}
r = chown_one(dirfd(d), NULL, st, uid, gid);
} else
r = chown_one(fd, NULL, st, uid, gid);
if (r < 0)
goto finish;
r = r > 0 || changed;
finish:
safe_close(fd);
return r;
}
int path_chown_recursive(const char *path, uid_t uid, gid_t gid) {
_cleanup_close_ int fd = -1;
struct stat st;
int r;
fd = open(path, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC|O_NOFOLLOW|O_NOATIME);
if (fd < 0)
return -errno;
if (!uid_is_valid(uid) && !gid_is_valid(gid))
return 0; /* nothing to do */
if (fstat(fd, &st) < 0)
return -errno;
/* Let's take a shortcut: if the top-level directory is properly owned, we don't descend into the whole tree,
* under the assumption that all is OK anyway. */
if ((!uid_is_valid(uid) || st.st_uid == uid) &&
(!gid_is_valid(gid) || st.st_gid == gid))
return 0;
r = chown_recursive_internal(fd, &st, uid, gid);
fd = -1; /* we donated the fd to the call, regardless if it succeeded or failed */
return r;
}

View file

@ -0,0 +1,24 @@
#pragma once
/***
This file is part of systemd.
Copyright 2017 Lennart Poettering
systemd is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or
(at your option) any later version.
systemd is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with systemd; If not, see <http://www.gnu.org/licenses/>.
***/
#include <sys/types.h>
int path_chown_recursive(const char *path, uid_t uid, gid_t gid);

View file

@ -2193,7 +2193,7 @@ int bus_exec_context_set_transient_property(
if (streq(name, "UMask"))
c->umask = m;
else
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++)
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++)
if (startswith(name, exec_directory_type_to_string(i))) {
c->directories[i].mode = m;
break;
@ -2213,8 +2213,8 @@ int bus_exec_context_set_transient_property(
return r;
STRV_FOREACH(p, l) {
if (!filename_is_valid(*p))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "%s is not valid %s", name, *p);
if (!path_is_safe(*p) || path_is_absolute(*p))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "%s= path is not valid: %s", name, *p);
}
if (mode != UNIT_CHECK) {
@ -2222,7 +2222,7 @@ int bus_exec_context_set_transient_property(
char ***dirs = NULL;
ExecDirectoryType i;
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++)
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++)
if (streq(name, exec_directory_type_to_string(i))) {
dirs = &c->directories[i].paths;
break;
@ -2235,7 +2235,6 @@ int bus_exec_context_set_transient_property(
unit_write_drop_in_private_format(u, mode, name, "%s=", name);
} else {
r = strv_extend_strv(dirs, l, true);
if (r < 0)
return -ENOMEM;

View file

@ -182,33 +182,89 @@ static int make_uid_symlinks(uid_t uid, const char *name, bool b) {
return r;
}
static int pick_uid(const char *name, uid_t *ret_uid) {
static int pick_uid(char **suggested_paths, const char *name, uid_t *ret_uid) {
/* Find a suitable free UID. We use the following strategy to find a suitable UID:
*
* 1. Initially, we try to read the UID of a number of specified paths. If any of these UIDs works, we use
* them. We use in order to increase the chance of UID reuse, if StateDirectory=, CacheDirectory= or
* LogDirectory= are used, as reusing the UID these directories are owned by saves us from having to
* recursively chown() them to new users.
*
* 2. If that didn't yield a currently unused UID, we hash the user name, and try to use that. This should be
* pretty good, as the use ris by default derived from the unit name, and hence the same service and same
* user should usually get the same UID as long as our hashing doesn't clash.
*
* 3. Finally, if that didn't work, we randomly pick UIDs, until we find one that is empty.
*
* Since the dynamic UID space is relatively small we'll stop trying after 100 iterations, giving up. */
enum {
PHASE_SUGGESTED, /* the first phase, reusing directory ownership UIDs */
PHASE_HASHED, /* the second phase, deriving a UID from the username by hashing */
PHASE_RANDOM, /* the last phase, randomly picking UIDs */
} phase = PHASE_SUGGESTED;
static const uint8_t hash_key[] = {
0x37, 0x53, 0x7e, 0x31, 0xcf, 0xce, 0x48, 0xf5,
0x8a, 0xbb, 0x39, 0x57, 0x8d, 0xd9, 0xec, 0x59
};
unsigned n_tries = 100;
uid_t candidate;
unsigned n_tries = 100, current_suggested = 0;
int r;
/* A static user by this name does not exist yet. Let's find a free ID then, and use that. We start with a UID
* generated as hash from the user name. */
candidate = UID_CLAMP_INTO_RANGE(siphash24(name, strlen(name), hash_key));
(void) mkdir("/run/systemd/dynamic-uid", 0755);
for (;;) {
char lock_path[strlen("/run/systemd/dynamic-uid/") + DECIMAL_STR_MAX(uid_t) + 1];
_cleanup_close_ int lock_fd = -1;
uid_t candidate;
ssize_t l;
if (--n_tries <= 0) /* Give up retrying eventually */
return -EBUSY;
switch (phase) {
case PHASE_SUGGESTED: {
struct stat st;
if (!suggested_paths || !suggested_paths[current_suggested]) {
/* We reached the end of the suggested paths list, let's try by hashing the name */
phase = PHASE_HASHED;
continue;
}
if (stat(suggested_paths[current_suggested++], &st) < 0)
continue; /* We can't read the UID of this path, but that doesn't matter, just try the next */
candidate = st.st_uid;
break;
}
case PHASE_HASHED:
/* A static user by this name does not exist yet. Let's find a free ID then, and use that. We
* start with a UID generated as hash from the user name. */
candidate = UID_CLAMP_INTO_RANGE(siphash24(name, strlen(name), hash_key));
/* If this one fails, we should proceed with random tries */
phase = PHASE_RANDOM;
break;
case PHASE_RANDOM:
/* Pick another random UID, and see if that works for us. */
random_bytes(&candidate, sizeof(candidate));
candidate = UID_CLAMP_INTO_RANGE(candidate);
break;
default:
assert_not_reached("unknown phase");
}
/* Make sure whatever we picked here actually is in the right range */
if (!uid_is_dynamic(candidate))
goto next;
continue;
xsprintf(lock_path, "/run/systemd/dynamic-uid/" UID_FMT, candidate);
@ -240,7 +296,7 @@ static int pick_uid(const char *name, uid_t *ret_uid) {
/* Some superficial check whether this UID/GID might already be taken by some static user */
if (getpwuid(candidate) || getgrgid((gid_t) candidate)) {
(void) unlink(lock_path);
goto next;
continue;
}
/* Let's store the user name in the lock file, so that we can use it for looking up the username for a UID */
@ -250,8 +306,9 @@ static int pick_uid(const char *name, uid_t *ret_uid) {
IOVEC_INIT((char[1]) { '\n' }, 1),
}, 2, 0);
if (l < 0) {
r = -errno;
(void) unlink(lock_path);
return -errno;
return r;
}
(void) ftruncate(lock_fd, l);
@ -264,9 +321,7 @@ static int pick_uid(const char *name, uid_t *ret_uid) {
return r;
next:
/* Pick another random UID, and see if that works for us. */
random_bytes(&candidate, sizeof(candidate));
candidate = UID_CLAMP_INTO_RANGE(candidate);
;
}
}
@ -363,7 +418,7 @@ static void unlink_uid_lock(int lock_fd, uid_t uid, const char *name) {
(void) make_uid_symlinks(uid, name, false); /* remove direct lookup symlinks */
}
int dynamic_user_realize(DynamicUser *d, uid_t *ret) {
int dynamic_user_realize(DynamicUser *d, char **suggested_dirs, uid_t *ret) {
_cleanup_close_ int etc_passwd_lock_fd = -1, uid_lock_fd = -1;
uid_t uid = UID_INVALID;
@ -421,7 +476,7 @@ int dynamic_user_realize(DynamicUser *d, uid_t *ret) {
if (uid == UID_INVALID) {
/* No static UID assigned yet, excellent. Let's pick a new dynamic one, and lock it. */
uid_lock_fd = pick_uid(d->name, &uid);
uid_lock_fd = pick_uid(suggested_dirs, d->name, &uid);
if (uid_lock_fd < 0)
return uid_lock_fd;
}
@ -744,7 +799,7 @@ int dynamic_creds_acquire(DynamicCreds *creds, Manager *m, const char *user, con
return 0;
}
int dynamic_creds_realize(DynamicCreds *creds, uid_t *uid, gid_t *gid) {
int dynamic_creds_realize(DynamicCreds *creds, char **suggested_paths, uid_t *uid, gid_t *gid) {
uid_t u = UID_INVALID;
gid_t g = GID_INVALID;
int r;
@ -756,13 +811,13 @@ int dynamic_creds_realize(DynamicCreds *creds, uid_t *uid, gid_t *gid) {
/* Realize both the referenced user and group */
if (creds->user) {
r = dynamic_user_realize(creds->user, &u);
r = dynamic_user_realize(creds->user, suggested_paths, &u);
if (r < 0)
return r;
}
if (creds->group && creds->group != creds->user) {
r = dynamic_user_realize(creds->group, &g);
r = dynamic_user_realize(creds->group, suggested_paths, &g);
if (r < 0)
return r;
} else

View file

@ -45,7 +45,7 @@ struct DynamicUser {
int dynamic_user_acquire(Manager *m, const char *name, DynamicUser **ret);
int dynamic_user_realize(DynamicUser *d, uid_t *ret);
int dynamic_user_realize(DynamicUser *d, char **suggested_paths, uid_t *ret);
int dynamic_user_current(DynamicUser *d, uid_t *ret);
DynamicUser* dynamic_user_ref(DynamicUser *d);
@ -60,7 +60,7 @@ int dynamic_user_lookup_uid(Manager *m, uid_t uid, char **ret);
int dynamic_user_lookup_name(Manager *m, const char *name, uid_t *ret);
int dynamic_creds_acquire(DynamicCreds *creds, Manager *m, const char *user, const char *group);
int dynamic_creds_realize(DynamicCreds *creds, uid_t *uid, gid_t *gid);
int dynamic_creds_realize(DynamicCreds *creds, char **suggested_paths, uid_t *uid, gid_t *gid);
void dynamic_creds_unref(DynamicCreds *creds);
void dynamic_creds_destroy(DynamicCreds *creds);

View file

@ -64,6 +64,7 @@
#include "barrier.h"
#include "cap-list.h"
#include "capability-util.h"
#include "chown-recursive.h"
#include "def.h"
#include "env-util.h"
#include "errno-list.h"
@ -76,6 +77,7 @@
#include "glob-util.h"
#include "io-util.h"
#include "ioprio.h"
#include "label.h"
#include "log.h"
#include "macro.h"
#include "missing.h"
@ -1726,6 +1728,13 @@ static bool exec_needs_mount_namespace(
if (context->mount_apivfs && (context->root_image || context->root_directory))
return true;
if (context->dynamic_user &&
(!strv_isempty(context->directories[EXEC_DIRECTORY_RUNTIME].paths) ||
!strv_isempty(context->directories[EXEC_DIRECTORY_STATE].paths) ||
!strv_isempty(context->directories[EXEC_DIRECTORY_CACHE].paths) ||
!strv_isempty(context->directories[EXEC_DIRECTORY_LOGS].paths)))
return true;
return false;
}
@ -1896,7 +1905,7 @@ static int setup_exec_directory(
ExecDirectoryType type,
int *exit_status) {
static const int exit_status_table[_EXEC_DIRECTORY_MAX] = {
static const int exit_status_table[_EXEC_DIRECTORY_TYPE_MAX] = {
[EXEC_DIRECTORY_RUNTIME] = EXIT_RUNTIME_DIRECTORY,
[EXEC_DIRECTORY_STATE] = EXIT_STATE_DIRECTORY,
[EXEC_DIRECTORY_CACHE] = EXIT_CACHE_DIRECTORY,
@ -1908,7 +1917,7 @@ static int setup_exec_directory(
assert(context);
assert(params);
assert(type >= 0 && type < _EXEC_DIRECTORY_MAX);
assert(type >= 0 && type < _EXEC_DIRECTORY_TYPE_MAX);
assert(exit_status);
if (!params->prefix[type])
@ -1922,7 +1931,8 @@ static int setup_exec_directory(
}
STRV_FOREACH(rt, context->directories[type].paths) {
_cleanup_free_ char *p;
_cleanup_free_ char *p = NULL, *pp = NULL;
const char *effective;
p = strjoin(params->prefix[type], "/", *rt);
if (!p) {
@ -1934,16 +1944,94 @@ static int setup_exec_directory(
if (r < 0)
goto fail;
r = mkdir_p_label(p, context->directories[type].mode);
if (r < 0)
if (context->dynamic_user && type != EXEC_DIRECTORY_CONFIGURATION) {
_cleanup_free_ char *private_root = NULL, *relative = NULL, *parent = NULL;
/* So, here's one extra complication when dealing with DynamicUser=1 units. In that case we
* want to avoid leaving a directory around fully accessible that is owned by a dynamic user
* whose UID is later on reused. To lock this down we use the same trick used by container
* managers to prohibit host users to get access to files of the same UID in containers: we
* place everything inside a directory that has an access mode of 0700 and is owned root:root,
* so that it acts as security boundary for unprivileged host code. We then use fs namespacing
* to make this directory permeable for the service itself.
*
* Specifically: for a service which wants a special directory "foo/" we first create a
* directory "private/" with access mode 0700 owned by root:root. Then we place "foo" inside of
* that directory (i.e. "private/foo/"), and make "foo" a symlink to "private/foo". This way,
* privileged host users can access "foo/" as usual, but unprivileged host users can't look
* into it. Inside of the namespaceof the container "private/" is replaced by a more liberally
* accessible tmpfs, into which the host's "private/foo/" is mounted under the same name, thus
* disabling the access boundary for the service and making sure it only gets access to the
* dirs it needs but no others. Tricky? Yes, absolutely, but it works!
*
* Note that we don't do this for EXEC_DIRECTORY_CONFIGURATION as that's assumed not to be
* owned by the service itself. */
private_root = strjoin(params->prefix[type], "/private");
if (!private_root) {
r = -ENOMEM;
goto fail;
}
/* First set up private root if it doesn't exist yet, with access mode 0700 and owned by root:root */
r = mkdir_safe_label(private_root, 0700, 0, 0);
if (r < 0)
goto fail;
pp = strjoin(private_root, "/", *rt);
if (!pp) {
r = -ENOMEM;
goto fail;
}
/* Create all directories between the configured directory and this private root, and mark them 0755 */
r = mkdir_parents_label(pp, 0755);
if (r < 0)
goto fail;
/* Finally, create the actual directory for the service */
r = mkdir_label(pp, context->directories[type].mode);
if (r < 0 && r != -EEXIST)
goto fail;
parent = dirname_malloc(p);
if (!parent) {
r = -ENOMEM;
goto fail;
}
r = path_make_relative(parent, pp, &relative);
if (r < 0)
goto fail;
/* And link it up from the original place */
r = symlink_idempotent(relative, p);
if (r < 0)
goto fail;
effective = pp;
} else {
r = mkdir_label(p, context->directories[type].mode);
if (r < 0 && r != -EEXIST)
goto fail;
effective = p;
}
/* First lock down the access mode */
if (chmod(effective, context->directories[type].mode) < 0) {
r = -errno;
goto fail;
}
/* Don't change the owner of the configuration directory, as in the common case it is not written to by
* a service, and shall not be writable. */
if (type == EXEC_DIRECTORY_CONFIGURATION)
continue;
r = chmod_and_chown(p, context->directories[type].mode, uid, gid);
/* Then, change the ownership of the whole tree, if necessary */
r = path_chown_recursive(effective, uid, gid);
if (r < 0)
goto fail;
}
@ -1952,7 +2040,6 @@ static int setup_exec_directory(
fail:
*exit_status = exit_status_table[type];
return r;
}
@ -2000,11 +2087,11 @@ static int compile_read_write_paths(
* the explicitly configured paths, plus all runtime directories. */
if (strv_isempty(context->read_write_paths)) {
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++)
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++)
if (!strv_isempty(context->directories[i].paths))
break;
if (i == _EXEC_DIRECTORY_MAX) {
if (i == _EXEC_DIRECTORY_TYPE_MAX) {
*ret = NULL; /* NOP if neither is set */
return 0;
}
@ -2014,7 +2101,7 @@ static int compile_read_write_paths(
if (!l)
return -ENOMEM;
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++) {
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++) {
if (!params->prefix[i])
continue;
@ -2036,6 +2123,143 @@ static int compile_read_write_paths(
return 0;
}
static int compile_bind_mounts(
const ExecContext *context,
const ExecParameters *params,
BindMount **ret_bind_mounts,
unsigned *ret_n_bind_mounts,
char ***ret_empty_directories) {
_cleanup_strv_free_ char **empty_directories = NULL;
BindMount *bind_mounts;
unsigned n, h = 0, i;
ExecDirectoryType t;
int r;
assert(context);
assert(params);
assert(ret_bind_mounts);
assert(ret_n_bind_mounts);
assert(ret_empty_directories);
n = context->n_bind_mounts;
for (t = 0; t < _EXEC_DIRECTORY_TYPE_MAX; t++) {
if (!params->prefix[t])
continue;
n += strv_length(context->directories[t].paths);
}
if (n <= 0) {
*ret_bind_mounts = NULL;
*ret_n_bind_mounts = 0;
*ret_empty_directories = NULL;
return 0;
}
bind_mounts = new(BindMount, n);
if (!bind_mounts)
return -ENOMEM;
for (i = 0; context->n_bind_mounts; i++) {
BindMount *item = context->bind_mounts + i;
char *s, *d;
s = strdup(item->source);
if (!s) {
r = -ENOMEM;
goto finish;
}
d = strdup(item->destination);
if (!d) {
free(s);
r = -ENOMEM;
goto finish;
}
bind_mounts[h++] = (BindMount) {
.source = s,
.destination = d,
.read_only = item->read_only,
.recursive = item->recursive,
.ignore_enoent = item->ignore_enoent,
};
}
for (t = 0; t < _EXEC_DIRECTORY_TYPE_MAX; t++) {
char **suffix;
if (!params->prefix[t])
continue;
if (strv_isempty(context->directories[t].paths))
continue;
if (context->dynamic_user && t != EXEC_DIRECTORY_CONFIGURATION) {
char *private_root;
/* So this is for a dynamic user, and we need to make sure the process can access its own
* directory. For that we overmount the usually inaccessible "private" subdirectory with a
* tmpfs that makes it accessible and is empty except for the submounts we do this for. */
private_root = strjoin(params->prefix[t], "/private");
if (!private_root) {
r = -ENOMEM;
goto finish;
}
r = strv_consume(&empty_directories, private_root);
if (r < 0) {
r = -ENOMEM;
goto finish;
}
}
STRV_FOREACH(suffix, context->directories[t].paths) {
char *s, *d;
if (context->dynamic_user && t != EXEC_DIRECTORY_CONFIGURATION)
s = strjoin(params->prefix[t], "/private/", *suffix);
else
s = strjoin(params->prefix[t], "/", *suffix);
if (!s) {
r = -ENOMEM;
goto finish;
}
d = strdup(s);
if (!d) {
free(s);
r = -ENOMEM;
goto finish;
}
bind_mounts[h++] = (BindMount) {
.source = s,
.destination = d,
.read_only = false,
.recursive = true,
.ignore_enoent = false,
};
}
}
assert(h == n);
*ret_bind_mounts = bind_mounts;
*ret_n_bind_mounts = n;
*ret_empty_directories = empty_directories;
empty_directories = NULL;
return (int) n;
finish:
bind_mount_free_many(bind_mounts, h);
return r;
}
static int apply_mount_namespace(
Unit *u,
ExecCommand *command,
@ -2043,7 +2267,7 @@ static int apply_mount_namespace(
const ExecParameters *params,
ExecRuntime *runtime) {
_cleanup_strv_free_ char **rw = NULL;
_cleanup_strv_free_ char **rw = NULL, **empty_directories = NULL;
char *tmp = NULL, *var = NULL;
const char *root_dir = NULL, *root_image = NULL;
NameSpaceInfo ns_info = {
@ -2055,6 +2279,8 @@ static int apply_mount_namespace(
.mount_apivfs = context->mount_apivfs,
};
bool needs_sandboxing;
BindMount *bind_mounts = NULL;
unsigned n_bind_mounts = 0;
int r;
assert(context);
@ -2081,6 +2307,10 @@ static int apply_mount_namespace(
root_dir = context->root_directory;
}
r = compile_bind_mounts(context, params, &bind_mounts, &n_bind_mounts, &empty_directories);
if (r < 0)
return r;
/*
* If DynamicUser=no and RootDirectory= is set then lets pass a relaxed
* sandbox info, otherwise enforce it, don't ignore protected paths and
@ -2095,8 +2325,9 @@ static int apply_mount_namespace(
&ns_info, rw,
needs_sandboxing ? context->read_only_paths : NULL,
needs_sandboxing ? context->inaccessible_paths : NULL,
context->bind_mounts,
context->n_bind_mounts,
empty_directories,
bind_mounts,
n_bind_mounts,
tmp,
var,
needs_sandboxing ? context->protect_home : PROTECT_HOME_NO,
@ -2104,6 +2335,8 @@ static int apply_mount_namespace(
context->mount_flags,
DISSECT_IMAGE_DISCARD_ON_LOOP);
bind_mount_free_many(bind_mounts, n_bind_mounts);
/* If we couldn't set up the namespace this is probably due to a
* missing capability. In this case, silently proceeed. */
if (IN_SET(r, -EPERM, -EACCES)) {
@ -2384,6 +2617,49 @@ static int acquire_home(const ExecContext *c, uid_t uid, const char** home, char
return 1;
}
static int compile_suggested_paths(const ExecContext *c, const ExecParameters *p, char ***ret) {
_cleanup_strv_free_ char ** list = NULL;
ExecDirectoryType t;
int r;
assert(c);
assert(p);
assert(ret);
assert(c->dynamic_user);
/* Compile a list of paths that it might make sense to read the owning UID from to use as initial candidate for
* dynamic UID allocation, in order to save us from doing costly recursive chown()s of the special
* directories. */
for (t = 0; t < _EXEC_DIRECTORY_TYPE_MAX; t++) {
char **i;
if (t == EXEC_DIRECTORY_CONFIGURATION)
continue;
if (!p->prefix[t])
continue;
STRV_FOREACH(i, c->directories[t].paths) {
char *e;
e = strjoin(p->prefix[t], "/private/", *i);
if (!e)
return -ENOMEM;
r = strv_consume(&list, e);
if (r < 0)
return r;
}
}
*ret = list;
list = NULL;
return 0;
}
static int exec_child(
Unit *unit,
ExecCommand *command,
@ -2505,6 +2781,7 @@ static int exec_child(
}
if (context->dynamic_user && dcreds) {
_cleanup_strv_free_ char **suggested_paths = NULL;
/* Make sure we bypass our own NSS module for any NSS checks */
if (putenv((char*) "SYSTEMD_NSS_DYNAMIC_BYPASS=1") != 0) {
@ -2512,7 +2789,13 @@ static int exec_child(
return log_unit_error_errno(unit, errno, "Failed to update environment: %m");
}
r = dynamic_creds_realize(dcreds, &uid, &gid);
r = compile_suggested_paths(context, params, &suggested_paths);
if (r < 0) {
*exit_status = EXIT_MEMORY;
return log_oom();
}
r = dynamic_creds_realize(dcreds, suggested_paths, &uid, &gid);
if (r < 0) {
*exit_status = EXIT_USER;
return log_unit_error_errno(unit, r, "Failed to update dynamic user credentials: %m");
@ -2699,7 +2982,7 @@ static int exec_child(
}
}
for (dt = 0; dt < _EXEC_DIRECTORY_MAX; dt++) {
for (dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) {
r = setup_exec_directory(context, params, uid, gid, dt, exit_status);
if (r < 0)
return log_unit_error_errno(unit, r, "Failed to set up special execution directory in %s: %m", params->prefix[dt]);
@ -3239,7 +3522,7 @@ void exec_context_init(ExecContext *c) {
c->ignore_sigpipe = true;
c->timer_slack_nsec = NSEC_INFINITY;
c->personality = PERSONALITY_INVALID;
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++)
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++)
c->directories[i].mode = 0755;
c->capability_bounding_set = CAP_ALL;
c->restrict_namespaces = NAMESPACE_FLAGS_ALL;
@ -3292,7 +3575,7 @@ void exec_context_done(ExecContext *c) {
c->syscall_archs = set_free(c->syscall_archs);
c->address_families = set_free(c->address_families);
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++)
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++)
c->directories[i].paths = strv_free(c->directories[i].paths);
}
@ -3311,10 +3594,21 @@ int exec_context_destroy_runtime_directory(ExecContext *c, const char *runtime_p
if (!p)
return -ENOMEM;
/* We execute this synchronously, since we need to be
* sure this is gone when we start the service
/* We execute this synchronously, since we need to be sure this is gone when we start the service
* next. */
(void) rm_rf(p, REMOVE_ROOT);
/* Also destroy any matching subdirectory below /private/. This is done to support DynamicUser=1
* setups. Note that we don't conditionalize here on that though, as the namespace is same way, and it
* makes us a bit more robust towards changing unit settings. Or to say this differently: in the worst
* case this is a NOP. */
free(p);
p = strjoin(runtime_prefix, "/private/", *i);
if (!p)
return -ENOMEM;
(void) rm_rf(p, REMOVE_ROOT);
}
return 0;
@ -3622,7 +3916,7 @@ void exec_context_dump(ExecContext *c, FILE* f, const char *prefix) {
fprintf(f, "%sRuntimeDirectoryPreserve: %s\n", prefix, exec_preserve_mode_to_string(c->runtime_directory_preserve_mode));
for (dt = 0; dt < _EXEC_DIRECTORY_MAX; dt++) {
for (dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) {
fprintf(f, "%s%sMode: %04o\n", prefix, exec_directory_type_to_string(dt), c->directories[dt].mode);
STRV_FOREACH(d, c->directories[dt].paths)
@ -4374,7 +4668,7 @@ static const char* const exec_preserve_mode_table[_EXEC_PRESERVE_MODE_MAX] = {
DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(exec_preserve_mode, ExecPreserveMode, EXEC_PRESERVE_YES);
static const char* const exec_directory_type_table[_EXEC_DIRECTORY_MAX] = {
static const char* const exec_directory_type_table[_EXEC_DIRECTORY_TYPE_MAX] = {
[EXEC_DIRECTORY_RUNTIME] = "RuntimeDirectory",
[EXEC_DIRECTORY_STATE] = "StateDirectory",
[EXEC_DIRECTORY_CACHE] = "CacheDirectory",

View file

@ -128,8 +128,8 @@ typedef enum ExecDirectoryType {
EXEC_DIRECTORY_CACHE,
EXEC_DIRECTORY_LOGS,
EXEC_DIRECTORY_CONFIGURATION,
_EXEC_DIRECTORY_MAX,
_EXEC_DIRECTORY_INVALID = -1,
_EXEC_DIRECTORY_TYPE_MAX,
_EXEC_DIRECTORY_TYPE_INVALID = -1,
} ExecDirectoryType;
typedef struct ExecDirectory {
@ -251,7 +251,7 @@ struct ExecContext {
bool address_families_whitelist:1;
ExecPreserveMode runtime_directory_preserve_mode;
ExecDirectory directories[_EXEC_DIRECTORY_MAX];
ExecDirectory directories[_EXEC_DIRECTORY_TYPE_MAX];
bool memory_deny_write_execute;
bool restrict_realtime;

View file

@ -3718,8 +3718,6 @@ int config_parse_exec_directories(
_cleanup_free_ char *word = NULL, *k = NULL;
r = extract_first_word(&p, &word, NULL, EXTRACT_QUOTES);
if (r == 0)
return 0;
if (r == -ENOMEM)
return log_oom();
if (r < 0) {
@ -3727,6 +3725,8 @@ int config_parse_exec_directories(
"Invalid syntax, ignoring: %s", rvalue);
return 0;
}
if (r == 0)
return 0;
r = unit_full_printf(u, word, &k);
if (r < 0) {
@ -3737,7 +3737,7 @@ int config_parse_exec_directories(
if (!path_is_safe(k) || path_is_absolute(k)) {
log_syntax(unit, LOG_ERR, filename, line, 0,
"%s is not valid, ignoring assignment: %s", lvalue, rvalue);
"%s= path is not valid, ignoring assignment: %s", lvalue, rvalue);
continue;
}

View file

@ -564,7 +564,7 @@ static int manager_setup_prefix(Manager *m) {
const char *suffix;
};
static const struct table_entry paths_system[_EXEC_DIRECTORY_MAX] = {
static const struct table_entry paths_system[_EXEC_DIRECTORY_TYPE_MAX] = {
[EXEC_DIRECTORY_RUNTIME] = { SD_PATH_SYSTEM_RUNTIME, NULL },
[EXEC_DIRECTORY_STATE] = { SD_PATH_SYSTEM_STATE_PRIVATE, NULL },
[EXEC_DIRECTORY_CACHE] = { SD_PATH_SYSTEM_STATE_CACHE, NULL },
@ -572,12 +572,12 @@ static int manager_setup_prefix(Manager *m) {
[EXEC_DIRECTORY_CONFIGURATION] = { SD_PATH_SYSTEM_CONFIGURATION, NULL },
};
static const struct table_entry paths_user[_EXEC_DIRECTORY_MAX] = {
static const struct table_entry paths_user[_EXEC_DIRECTORY_TYPE_MAX] = {
[EXEC_DIRECTORY_RUNTIME] = { SD_PATH_USER_RUNTIME, NULL },
[EXEC_DIRECTORY_STATE] = { SD_PATH_USER_CONFIGURATION, NULL },
[EXEC_DIRECTORY_CACHE] = { SD_PATH_SYSTEM_STATE_CACHE, NULL },
[EXEC_DIRECTORY_LOGS] = { SD_PATH_SYSTEM_CONFIGURATION, "log" },
[EXEC_DIRECTORY_CONFIGURATION] = { SD_PATH_SYSTEM_CONFIGURATION, NULL },
[EXEC_DIRECTORY_CACHE] = { SD_PATH_USER_STATE_CACHE, NULL },
[EXEC_DIRECTORY_LOGS] = { SD_PATH_USER_CONFIGURATION, "log" },
[EXEC_DIRECTORY_CONFIGURATION] = { SD_PATH_USER_CONFIGURATION, NULL },
};
const struct table_entry *p;
@ -591,7 +591,7 @@ static int manager_setup_prefix(Manager *m) {
else
p = paths_user;
for (i = 0; i < _EXEC_DIRECTORY_MAX; i++) {
for (i = 0; i < _EXEC_DIRECTORY_TYPE_MAX; i++) {
r = sd_path_home(p[i].type, p[i].suffix, &m->prefix[i]);
if (r < 0)
return r;
@ -1191,7 +1191,7 @@ Manager* manager_free(Manager *m) {
hashmap_free(m->uid_refs);
hashmap_free(m->gid_refs);
for (dt = 0; dt < _EXEC_DIRECTORY_MAX; dt++)
for (dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++)
m->prefix[dt] = mfree(m->prefix[dt]);
return mfree(m);

View file

@ -335,7 +335,7 @@ struct Manager {
int first_boot; /* tri-state */
/* prefixes of e.g. RuntimeDirectory= */
char *prefix[_EXEC_DIRECTORY_MAX];
char *prefix[_EXEC_DIRECTORY_TYPE_MAX];
};
#define MANAGER_IS_SYSTEM(m) ((m)->unit_file_scope == UNIT_FILE_SYSTEM)

View file

@ -7,6 +7,8 @@ libcore_la_sources = '''
bpf-firewall.h
cgroup.c
cgroup.h
chown-recursive.c
chown-recursive.h
dbus-automount.c
dbus-automount.h
dbus-cgroup.c

View file

@ -31,6 +31,7 @@
#include "dev-setup.h"
#include "fd-util.h"
#include "fs-util.h"
#include "label.h"
#include "loop-util.h"
#include "loopback-setup.h"
#include "missing.h"
@ -58,6 +59,7 @@ typedef enum MountMode {
PRIVATE_VAR_TMP,
PRIVATE_DEV,
BIND_DEV,
EMPTY_DIR,
SYSFS,
PROCFS,
READONLY,
@ -224,6 +226,28 @@ static int append_access_mounts(MountEntry **p, char **strv, MountMode mode) {
return 0;
}
static int append_empty_dir_mounts(MountEntry **p, char **strv) {
char **i;
assert(p);
/* Adds tmpfs mounts to provide readable but empty directories. This is primarily used to implement the
* "/private/" boundary directories for DynamicUser=1. */
STRV_FOREACH(i, strv) {
*((*p)++) = (MountEntry) {
.path_const = *i,
.mode = EMPTY_DIR,
.ignore = false,
.has_prefix = false,
.read_only = true,
};
}
return 0;
}
static int append_bind_mounts(MountEntry **p, const BindMount *binds, unsigned n) {
unsigned i;
@ -618,6 +642,8 @@ static int mount_bind_dev(MountEntry *m) {
/* Implements the little brother of mount_private_dev(): simply bind mounts the host's /dev into the service's
* /dev. This is only used when RootDirectory= is set. */
(void) mkdir_p_label(mount_entry_path(m), 0755);
r = path_is_mount_point(mount_entry_path(m), NULL, 0);
if (r < 0)
return log_debug_errno(r, "Unable to determine whether /dev is already mounted: %m");
@ -635,6 +661,8 @@ static int mount_sysfs(MountEntry *m) {
assert(m);
(void) mkdir_p_label(mount_entry_path(m), 0755);
r = path_is_mount_point(mount_entry_path(m), NULL, 0);
if (r < 0)
return log_debug_errno(r, "Unable to determine whether /sys is already mounted: %m");
@ -653,6 +681,8 @@ static int mount_procfs(MountEntry *m) {
assert(m);
(void) mkdir_p_label(mount_entry_path(m), 0755);
r = path_is_mount_point(mount_entry_path(m), NULL, 0);
if (r < 0)
return log_debug_errno(r, "Unable to determine whether /proc is already mounted: %m");
@ -666,6 +696,20 @@ static int mount_procfs(MountEntry *m) {
return 1;
}
static int mount_empty_dir(MountEntry *m) {
assert(m);
/* First, get rid of everything that is below if there is anything. Then, overmount with our new empty dir */
(void) mkdir_p_label(mount_entry_path(m), 0755);
(void) umount_recursive(mount_entry_path(m), 0);
if (mount("tmpfs", mount_entry_path(m), "tmpfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME, "mode=755") < 0)
return log_debug_errno(errno, "Failed to mount %s: %m", mount_entry_path(m));
return 1;
}
static int mount_entry_chase(
const char *root_directory,
MountEntry *m,
@ -681,7 +725,9 @@ static int mount_entry_chase(
* chase the symlinks on our own first. This is called for the destination path, as well as the source path (if
* that applies). The result is stored in "location". */
r = chase_symlinks(path, root_directory, 0, &chased);
r = chase_symlinks(path, root_directory,
IN_SET(m->mode, BIND_MOUNT, BIND_MOUNT_RECURSIVE, PRIVATE_TMP, PRIVATE_VAR_TMP, PRIVATE_DEV, BIND_DEV, EMPTY_DIR, SYSFS, PROCFS) ? CHASE_NONEXISTENT : 0,
&chased);
if (r == -ENOENT && m->ignore) {
log_debug_errno(r, "Path %s does not exist, ignoring.", path);
return 0;
@ -703,8 +749,8 @@ static int apply_mount(
const char *tmp_dir,
const char *var_tmp_dir) {
bool rbind = true, make = false;
const char *what;
bool rbind = true;
int r;
assert(m);
@ -759,14 +805,20 @@ static int apply_mount(
return r;
what = mount_entry_source(m);
make = true;
break;
case EMPTY_DIR:
return mount_empty_dir(m);
case PRIVATE_TMP:
what = tmp_dir;
make = true;
break;
case PRIVATE_VAR_TMP:
what = var_tmp_dir;
make = true;
break;
case PRIVATE_DEV:
@ -787,8 +839,36 @@ static int apply_mount(
assert(what);
if (mount(what, mount_entry_path(m), NULL, MS_BIND|(rbind ? MS_REC : 0), NULL) < 0)
return log_debug_errno(errno, "Failed to mount %s to %s: %m", what, mount_entry_path(m));
if (mount(what, mount_entry_path(m), NULL, MS_BIND|(rbind ? MS_REC : 0), NULL) < 0) {
bool try_again = false;
r = -errno;
if (r == -ENOENT && make) {
struct stat st;
/* Hmm, either the source or the destination are missing. Let's see if we can create the destination, then try again */
if (stat(what, &st) >= 0) {
(void) mkdir_parents(mount_entry_path(m), 0755);
if (S_ISDIR(st.st_mode))
try_again = mkdir(mount_entry_path(m), 0755) >= 0;
else
try_again = touch(mount_entry_path(m)) >= 0;
}
}
if (try_again) {
if (mount(what, mount_entry_path(m), NULL, MS_BIND|(rbind ? MS_REC : 0), NULL) < 0)
r = -errno;
else
r = 0;
}
if (r < 0)
return log_debug_errno(r, "Failed to mount %s to %s: %m", what, mount_entry_path(m));
}
log_debug("Successfully mounted %s to %s", what, mount_entry_path(m));
return 0;
@ -840,6 +920,7 @@ static unsigned namespace_calculate_mounts(
char** read_write_paths,
char** read_only_paths,
char** inaccessible_paths,
char** empty_directories,
const BindMount *bind_mounts,
unsigned n_bind_mounts,
const char* tmp_dir,
@ -866,6 +947,7 @@ static unsigned namespace_calculate_mounts(
strv_length(read_write_paths) +
strv_length(read_only_paths) +
strv_length(inaccessible_paths) +
strv_length(empty_directories) +
n_bind_mounts +
ns_info->private_dev +
(ns_info->protect_kernel_tunables ? ELEMENTSOF(protect_kernel_tunables_table) : 0) +
@ -882,6 +964,7 @@ int setup_namespace(
char** read_write_paths,
char** read_only_paths,
char** inaccessible_paths,
char** empty_directories,
const BindMount *bind_mounts,
unsigned n_bind_mounts,
const char* tmp_dir,
@ -898,6 +981,7 @@ int setup_namespace(
MountEntry *m, *mounts = NULL;
size_t root_hash_size = 0;
bool make_slave = false;
const char *root;
unsigned n_mounts;
int r = 0;
@ -929,28 +1013,36 @@ int setup_namespace(
r = dissected_image_decrypt(dissected_image, NULL, root_hash, root_hash_size, dissect_image_flags, &decrypted_image);
if (r < 0)
return r;
if (!root_directory) {
/* Create a mount point for the image, if it's still missing. We use the same mount point for
* all images, which is safe, since they all live in their own namespaces after all, and hence
* won't see each other. */
root_directory = "/run/systemd/unit-root";
(void) mkdir(root_directory, 0700);
}
}
if (root_directory)
root = root_directory;
else if (root_image || n_bind_mounts > 0) {
/* If we are booting from an image, create a mount point for the image, if it's still missing. We use
* the same mount point for all images, which is safe, since they all live in their own namespaces
* after all, and hence won't see each other. We also use such a root directory whenever there are bind
* mounts configured, so that their source mounts are never obstructed by mounts we already applied
* while we are applying them. */
root = "/run/systemd/unit-root";
(void) mkdir_label(root, 0700);
} else
root = NULL;
n_mounts = namespace_calculate_mounts(
root_directory,
root,
ns_info,
read_write_paths,
read_only_paths,
inaccessible_paths,
empty_directories,
bind_mounts, n_bind_mounts,
tmp_dir, var_tmp_dir,
protect_home, protect_system);
/* Set mount slave mode */
if (root_directory || n_mounts > 0)
if (root || n_mounts > 0)
make_slave = true;
if (n_mounts > 0) {
@ -967,6 +1059,10 @@ int setup_namespace(
if (r < 0)
goto finish;
r = append_empty_dir_mounts(&m, empty_directories);
if (r < 0)
goto finish;
r = append_bind_mounts(&m, bind_mounts, n_bind_mounts);
if (r < 0)
goto finish;
@ -1019,7 +1115,7 @@ int setup_namespace(
if (r < 0)
goto finish;
if (namespace_info_mount_apivfs(root_directory, ns_info)) {
if (namespace_info_mount_apivfs(root, ns_info)) {
r = append_static_mounts(&m, apivfs_table, ELEMENTSOF(apivfs_table), ns_info->ignore_protect_paths);
if (r < 0)
goto finish;
@ -1028,14 +1124,14 @@ int setup_namespace(
assert(mounts + n_mounts == m);
/* Prepend the root directory where that's necessary */
r = prefix_where_needed(mounts, n_mounts, root_directory);
r = prefix_where_needed(mounts, n_mounts, root);
if (r < 0)
goto finish;
qsort(mounts, n_mounts, sizeof(MountEntry), mount_path_compare);
drop_duplicates(mounts, &n_mounts);
drop_outside_root(root_directory, mounts, &n_mounts);
drop_outside_root(root, mounts, &n_mounts);
drop_inaccessible(mounts, &n_mounts);
drop_nop(mounts, &n_mounts);
}
@ -1055,11 +1151,12 @@ int setup_namespace(
}
/* Try to set up the new root directory before mounting anything there */
if (root_directory)
(void) base_filesystem_create(root_directory, UID_INVALID, GID_INVALID);
if (root)
(void) base_filesystem_create(root, UID_INVALID, GID_INVALID);
if (root_image) {
r = dissected_image_mount(dissected_image, root_directory, dissect_image_flags);
/* A root image is specified, mount it to the right place */
r = dissected_image_mount(dissected_image, root, dissect_image_flags);
if (r < 0)
goto finish;
@ -1073,16 +1170,24 @@ int setup_namespace(
} else if (root_directory) {
/* Turn directory into bind mount, if it isn't one yet */
r = path_is_mount_point(root_directory, NULL, AT_SYMLINK_FOLLOW);
/* A root directory is specified. Turn its directory into bind mount, if it isn't one yet. */
r = path_is_mount_point(root, NULL, AT_SYMLINK_FOLLOW);
if (r < 0)
goto finish;
if (r == 0) {
if (mount(root_directory, root_directory, NULL, MS_BIND|MS_REC, NULL) < 0) {
if (mount(root, root, NULL, MS_BIND|MS_REC, NULL) < 0) {
r = -errno;
goto finish;
}
}
} else if (root) {
/* Let's mount the main root directory to the root directory to use */
if (mount("/", root, NULL, MS_BIND|MS_REC, NULL) < 0) {
r = -errno;
goto finish;
}
}
if (n_mounts > 0) {
@ -1100,7 +1205,7 @@ int setup_namespace(
/* First round, add in all special mounts we need */
for (m = mounts; m < mounts + n_mounts; ++m) {
r = apply_mount(root_directory, m, tmp_dir, var_tmp_dir);
r = apply_mount(root, m, tmp_dir, var_tmp_dir);
if (r < 0)
goto finish;
}
@ -1119,9 +1224,9 @@ int setup_namespace(
}
}
if (root_directory) {
if (root) {
/* MS_MOVE does not work on MS_SHARED so the remount MS_SHARED will be done later */
r = mount_move_root(root_directory);
r = mount_move_root(root);
if (r < 0)
goto finish;
}

View file

@ -69,6 +69,7 @@ int setup_namespace(
char **read_write_paths,
char **read_only_paths,
char **inaccessible_paths,
char **empty_directories,
const BindMount *bind_mounts,
unsigned n_bind_mounts,
const char *tmp_dir,

View file

@ -893,7 +893,7 @@ int unit_add_exec_dependencies(Unit *u, ExecContext *c) {
return r;
}
for (dt = 0; dt < _EXEC_DIRECTORY_MAX; dt++) {
for (dt = 0; dt < _EXEC_DIRECTORY_TYPE_MAX; dt++) {
if (!u->manager->prefix[dt])
continue;

View file

@ -347,6 +347,8 @@ static int recurse_fd(int fd, bool donate_fd, const struct stat *st, uid_t shift
}
if (r < 0)
goto finish;
if (r > 0)
changed = true;
if (S_ISDIR(st->st_mode)) {
_cleanup_closedir_ DIR *d = NULL;

View file

@ -959,9 +959,10 @@ int bus_append_unit_property_assignment(sd_bus_message *m, const char *assignmen
_cleanup_free_ char *word = NULL;
r = extract_first_word(&p, &word, NULL, EXTRACT_QUOTES);
if (r == -ENOMEM)
return log_oom();
if (r < 0)
return log_error_errno(r, "Failed to parse %s value %s", field, eq);
if (r == 0)
break;

View file

@ -45,7 +45,7 @@ typedef void (*test_function_t)(Manager *m);
static void check(Manager *m, Unit *unit, int status_expected, int code_expected) {
Service *service = NULL;
usec_t ts;
usec_t timeout = 2 * USEC_PER_SEC;
usec_t timeout = 2 * USEC_PER_MINUTE;
assert_se(m);
assert_se(unit);
@ -317,6 +317,7 @@ static void test_exec_dynamic_user(Manager *m) {
test(m, "exec-dynamicuser-fixeduser.service", 0, CLD_EXITED);
test(m, "exec-dynamicuser-fixeduser-one-supplementarygroup.service", 0, CLD_EXITED);
test(m, "exec-dynamicuser-supplementarygroups.service", 0, CLD_EXITED);
test(m, "exec-dynamicuser-state-dir.service", 0, CLD_EXITED);
}
static void test_exec_environment(Manager *m) {
@ -500,7 +501,6 @@ int main(int argc, char *argv[]) {
test_exec_user,
test_exec_group,
test_exec_supplementary_groups,
test_exec_dynamic_user,
test_exec_environment,
test_exec_environmentfile,
test_exec_passenvironment,
@ -517,6 +517,7 @@ int main(int argc, char *argv[]) {
};
static const test_function_t system_tests[] = {
test_exec_systemcall_system_mode_with_user,
test_exec_dynamic_user,
NULL,
};
int r;

View file

@ -82,6 +82,7 @@ int main(int argc, char *argv[]) {
(char **) writable,
(char **) readonly,
(char **) inaccessible,
NULL,
&(BindMount) { .source = (char*) "/usr/bin", .destination = (char*) "/etc/systemd", .read_only = true }, 1,
tmp_dir,
var_tmp_dir,

View file

@ -261,6 +261,7 @@ static void test_make_relative(void) {
assert_se(path_make_relative("some/relative/path", "/some/path", &result) < 0);
assert_se(path_make_relative("/some/path", "some/relative/path", &result) < 0);
assert_se(path_make_relative("/some/dotdot/../path", "/some/path", &result) < 0);
#define test(from_dir, to_path, expected) { \
_cleanup_free_ char *z = NULL; \
@ -274,6 +275,7 @@ static void test_make_relative(void) {
test("/some/path", "/some/path/in/subdir", "in/subdir");
test("/some/path", "/", "../..");
test("/some/path", "/some/other/path", "../other/path");
test("/some/path/./dot", "/some/further/path", "../../further/path");
test("//extra/////slashes///won't////fool///anybody//", "////extra///slashes////are/just///fine///", "../../../are/just/fine");
}

View file

@ -64,6 +64,7 @@ test_data_files = '''
test-execute/exec-dynamicuser-fixeduser.service
test-execute/exec-dynamicuser-fixeduser-one-supplementarygroup.service
test-execute/exec-dynamicuser-supplementarygroups.service
test-execute/exec-dynamicuser-state-dir.service
test-execute/exec-ignoresigpipe-no.service
test-execute/exec-ignoresigpipe-yes.service
test-execute/exec-personality-x86-64.service

View file

@ -0,0 +1,19 @@
[Unit]
Description=Test DynamicUser= with StateDirectory=
[Service]
ExecStart=/usr/bin/test -w /var/lib/waldo
ExecStart=/usr/bin/test -w /var/lib/quux/pief
ExecStart=/bin/touch /var/lib/waldo/yay
ExecStart=/bin/touch /var/lib/quux/pief/yayyay
ExecStart=/usr/bin/test -f /var/lib/waldo/yay
ExecStart=/usr/bin/test -f /var/lib/quux/pief/yayyay
ExecStart=/usr/bin/test -f /var/lib/private/waldo/yay
ExecStart=/usr/bin/test -f /var/lib/private/quux/pief/yayyay
# Make sure that /var/lib/private/waldo is really the only writable directory besides the obvious candidates
ExecStart=/bin/sh -x -c 'test $$(find / -type d -writable 2> /dev/null | egrep -v -e \'^(/var/tmp$$|/tmp$$|/proc/|/dev/mqueue$$|/dev/shm$$)\' | sort -u | tr -d '\\\\n') = /var/lib/private/quux/pief/var/lib/private/waldo'
Type=oneshot
DynamicUser=yes
StateDirectory=waldo quux/pief