Merge pull request #6577 from poettering/more-exec-flags

add ! and !! ExecStart= flags to make ambient caps useful
This commit is contained in:
Yu Watanabe 2017-08-26 21:49:05 +09:00 committed by GitHub
commit 2c5ad0fd6d
24 changed files with 432 additions and 216 deletions

View file

@ -1505,6 +1505,10 @@
<entry>@resources</entry>
<entry>System calls for changing resource limits, memory and scheduling parameters (<citerefentry project='man-pages'><refentrytitle>setrlimit</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setpriority</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry>
</row>
<row>
<entry>@setuid</entry>
<entry>System calls for changing user ID and group ID credentials, (<citerefentry project='man-pages'><refentrytitle>setuid</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setgid</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>setresuid</refentrytitle><manvolnum>2</manvolnum></citerefentry>, …)</entry>
</row>
<row>
<entry>@swap</entry>
<entry>System calls for enabling/disabling swap devices (<citerefentry project='man-pages'><refentrytitle>swapon</refentrytitle><manvolnum>2</manvolnum></citerefentry>, <citerefentry project='man-pages'><refentrytitle>swapoff</refentrytitle><manvolnum>2</manvolnum></citerefentry>)</entry>

View file

@ -290,13 +290,58 @@
<varname>ExecStop=</varname> are not valid.)</para>
<para>For each of the specified commands, the first argument must be an absolute path to an
executable. Optionally, if this file name is prefixed with <literal>@</literal>, the second token will be
passed as <literal>argv[0]</literal> to the executed process, followed by the further arguments specified. If
the absolute filename is prefixed with <literal>-</literal>, an exit code of the command normally considered a
failure (i.e. non-zero exit status or abnormal exit due to signal) is ignored and considered success. If the
absolute path is prefixed with <literal>+</literal> then it is executed with full
privileges. <literal>@</literal>, <literal>-</literal>, and <literal>+</literal> may be used together and they
can appear in any order.</para>
executable. Optionally, this file name may be prefixed with a number of special characters:</para>
<table>
<title>Special executable prefixes</title>
<tgroup cols='2'>
<colspec colname='prefix'/>
<colspec colname='meaning'/>
<thead>
<row>
<entry>Prefix</entry>
<entry>Effect</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>@</literal></entry>
<entry>If the executable path is prefixed with <literal>@</literal>, the second specified token will be passed as <literal>argv[0]</literal> to the executed process (instead of the actual filename), followed by the further arguments specified.</entry>
</row>
<row>
<entry><literal>-</literal></entry>
<entry>If the executable path is prefixed with <literal>-</literal>, an exit code of the command normally considered a failure (i.e. non-zero exit status or abnormal exit due to signal) is ignored and considered success.</entry>
</row>
<row>
<entry><literal>+</literal></entry>
<entry>If the executable path is prefixed with <literal>+</literal> then the process is executed with full privileges. In this mode privilege restrictions configured with <varname>User=</varname>, <varname>Group=</varname>, <varname>CapabilityBoundingSet=</varname> or the various file system namespacing options (such as <varname>PrivateDevices=</varname>, <varname>PrivateTmp=</varname>) are not applied to the invoked command line (but still affect any other <varname>ExecStart=</varname>, <varname>ExecStop=</varname>, … lines).</entry>
</row>
<row>
<entry><literal>!</literal></entry>
<entry>Similar to the <literal>+</literal> character discussed above this permits invoking command lines with elevated privileges. However, unlike <literal>+</literal> the <literal>!</literal> character exclusively alters the effect of <varname>User=</varname>, <varname>Group=</varname> and <varname>SupplementaryGroups=</varname>, i.e. only the stanzas the affect user and group credentials. Note that this setting may be combined with <varname>DynamicUser=</varname>, in which case a dynamic user/group pair is allocated before the command is invoked, but credential changing is left to the executed process itself.</entry>
</row>
<row>
<entry><literal>!!</literal></entry>
<entry>This prefix is very similar to <literal>!!</literal>, however it only has an effect on systems lacking support for ambient process capabilities, i.e. without support for <varname>AmbientCapabilities=</varname>. It's intended to be used for unit files that take benefit of ambient capabilities to run processes with minimal privileges wherever possible while remaining compatible with systems that lack ambient capabilities support. Note that when <literal>!!</literal> is used, and a system lacking ambient capability support is detected any configured <varname>SystemCallFilter=</varname> and <varname>CapabilityBoundingSet=</varname> stanzas are implicitly modified, in order to permit spawned processes to drop credentials and capabilites themselves, even if this is configured to not be allowed. Moreover, if this prefix is used and a system lacking ambient capability support is detected <varname>AmbientCapabilities=</varname> will be skipped and not be applied. On systems supporting ambient capabilities, <literal>!!</literal> has no effect and is redundant.</entry>
</row>
</tbody>
</tgroup>
</table>
<para><literal>@</literal>, <literal>-</literal>, and one of
<literal>+</literal>/<literal>!</literal>/<literal>!!</literal> may be used together and they can appear in any
order. However, only one of <literal>+</literal>, <literal>!</literal>, <literal>!!</literal> may be used a at
time. Note that these prefixes are also supported for the other command line settings,
i.e. <varname>ExecStartPre=</varname>, <varname>ExecStartPost=</varname>, <varname>ExecReload</varname>,
<varname>ExecStop=</varname> and <varname>ExecStopPost=</varname>.</para>
<para>If more than one command is specified, the commands are
invoked sequentially in the order they appear in the unit

View file

@ -151,7 +151,7 @@ int capability_ambient_set_apply(uint64_t set, bool also_inherit) {
}
int capability_bounding_set_drop(uint64_t keep, bool right_now) {
_cleanup_cap_free_ cap_t after_cap = NULL;
_cleanup_cap_free_ cap_t before_cap = NULL, after_cap = NULL;
cap_flag_value_t fv;
unsigned long i;
int r;
@ -161,71 +161,80 @@ int capability_bounding_set_drop(uint64_t keep, bool right_now) {
* executing init!), so get it back temporarily so that we can
* call PR_CAPBSET_DROP. */
after_cap = cap_get_proc();
if (!after_cap)
before_cap = cap_get_proc();
if (!before_cap)
return -errno;
if (cap_get_flag(after_cap, CAP_SETPCAP, CAP_EFFECTIVE, &fv) < 0)
if (cap_get_flag(before_cap, CAP_SETPCAP, CAP_EFFECTIVE, &fv) < 0)
return -errno;
if (fv != CAP_SET) {
_cleanup_cap_free_ cap_t temp_cap = NULL;
static const cap_value_t v = CAP_SETPCAP;
temp_cap = cap_dup(after_cap);
if (!temp_cap) {
r = -errno;
goto finish;
}
temp_cap = cap_dup(before_cap);
if (!temp_cap)
return -errno;
if (cap_set_flag(temp_cap, CAP_EFFECTIVE, 1, &v, CAP_SET) < 0) {
r = -errno;
goto finish;
}
if (cap_set_flag(temp_cap, CAP_EFFECTIVE, 1, &v, CAP_SET) < 0)
return -errno;
if (cap_set_proc(temp_cap) < 0) {
r = -errno;
goto finish;
}
if (cap_set_proc(temp_cap) < 0)
log_debug_errno(errno, "Can't acquire effective CAP_SETPCAP bit, ignoring: %m");
/* If we didn't manage to acquire the CAP_SETPCAP bit, we continue anyway, after all this just means
* we'll fail later, when we actually intend to drop some capabilities. */
}
after_cap = cap_dup(before_cap);
if (!after_cap)
return -errno;
for (i = 0; i <= cap_last_cap(); i++) {
cap_value_t v;
if (!(keep & (UINT64_C(1) << i))) {
cap_value_t v;
if ((keep & (UINT64_C(1) << i)))
continue;
/* Drop it from the bounding set */
if (prctl(PR_CAPBSET_DROP, i) < 0) {
/* Drop it from the bounding set */
if (prctl(PR_CAPBSET_DROP, i) < 0) {
r = -errno;
/* If dropping the capability failed, let's see if we didn't have it in the first place. If so,
* continue anyway, as dropping a capability we didn't have in the first place doesn't really
* matter anyway. */
if (prctl(PR_CAPBSET_READ, i) != 0)
goto finish;
}
v = (cap_value_t) i;
/* Also drop it from the inheritable set, so
* that anything we exec() loses the
* capability for good. */
if (cap_set_flag(after_cap, CAP_INHERITABLE, 1, &v, CAP_CLEAR) < 0) {
r = -errno;
goto finish;
}
/* If we shall apply this right now drop it
* also from our own capability sets. */
if (right_now) {
if (cap_set_flag(after_cap, CAP_PERMITTED, 1, &v, CAP_CLEAR) < 0 ||
cap_set_flag(after_cap, CAP_EFFECTIVE, 1, &v, CAP_CLEAR) < 0) {
r = -errno;
goto finish;
}
v = (cap_value_t) i;
/* Also drop it from the inheritable set, so
* that anything we exec() loses the
* capability for good. */
if (cap_set_flag(after_cap, CAP_INHERITABLE, 1, &v, CAP_CLEAR) < 0) {
r = -errno;
goto finish;
}
/* If we shall apply this right now drop it
* also from our own capability sets. */
if (right_now) {
if (cap_set_flag(after_cap, CAP_PERMITTED, 1, &v, CAP_CLEAR) < 0 ||
cap_set_flag(after_cap, CAP_EFFECTIVE, 1, &v, CAP_CLEAR) < 0) {
r = -errno;
goto finish;
}
}
}
}
r = 0;
finish:
if (cap_set_proc(after_cap) < 0)
return -errno;
if (cap_set_proc(after_cap) < 0) {
/* If there are no actual changes anyway then let's ignore this error. */
if (cap_compare(before_cap, after_cap) != 0)
r = -errno;
}
return r;
}
@ -361,3 +370,18 @@ int drop_capability(cap_value_t cv) {
return 0;
}
bool ambient_capabilities_supported(void) {
static int cache = -1;
if (cache >= 0)
return cache;
/* If PR_CAP_AMBIENT returns something valid, or an unexpected error code we assume that ambient caps are
* available. */
cache = prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_IS_SET, CAP_KILL, 0, 0) >= 0 ||
!IN_SET(errno, EINVAL, EOPNOTSUPP, ENOSYS);
return cache;
}

View file

@ -55,3 +55,5 @@ static inline bool cap_test_all(uint64_t caps) {
m = (UINT64_C(1) << (cap_last_cap() + 1)) - 1;
return (caps & m) == m;
}
bool ambient_capabilities_supported(void);

View file

@ -31,10 +31,13 @@
int mkdir_safe_internal(const char *path, mode_t mode, uid_t uid, gid_t gid, mkdir_func_t _mkdir) {
struct stat st;
int r;
if (_mkdir(path, mode) >= 0)
if (chmod_and_chown(path, mode, uid, gid) < 0)
return -errno;
if (_mkdir(path, mode) >= 0) {
r = chmod_and_chown(path, mode, uid, gid);
if (r < 0)
return r;
}
if (lstat(path, &st) < 0)
return -errno;

View file

@ -904,7 +904,7 @@ static int append_exec_command(sd_bus_message *reply, ExecCommand *c) {
return r;
r = sd_bus_message_append(reply, "bttttuii",
c->ignore,
!!(c->flags & EXEC_COMMAND_IGNORE_FAILURE),
c->exec_status.start_timestamp.realtime,
c->exec_status.start_timestamp.monotonic,
c->exec_status.exit_timestamp.realtime,

View file

@ -280,7 +280,7 @@ static int bus_service_set_transient_property(
c->argv = argv;
argv = NULL;
c->ignore = b;
c->flags = b ? EXEC_COMMAND_IGNORE_FAILURE : 0;
path_kill_slashes(c->path);
exec_command_append_list(&s->exec_command[SERVICE_EXEC_START], c);
@ -319,7 +319,7 @@ static int bus_service_set_transient_property(
return -ENOMEM;
fprintf(f, "ExecStart=%s@%s %s\n",
c->ignore ? "-" : "",
c->flags & EXEC_COMMAND_IGNORE_FAILURE ? "-" : "",
c->path,
a);
}

View file

@ -321,6 +321,7 @@ static int connect_journal_socket(int fd, uid_t uid, gid_t gid) {
static int connect_logger_as(
Unit *unit,
const ExecContext *context,
const ExecParameters *params,
ExecOutput output,
const char *ident,
int nfd,
@ -330,6 +331,7 @@ static int connect_logger_as(
int fd, r;
assert(context);
assert(params);
assert(output < _EXEC_OUTPUT_MAX);
assert(ident);
assert(nfd >= 0);
@ -358,7 +360,7 @@ static int connect_logger_as(
"%i\n"
"%i\n",
context->syslog_identifier ?: ident,
MANAGER_IS_SYSTEM(unit->manager) ? unit->id : "",
params->flags & EXEC_PASS_LOG_UNIT ? unit->id : "",
context->syslog_priority,
!!context->syslog_level_prefix,
output == EXEC_OUTPUT_SYSLOG || output == EXEC_OUTPUT_SYSLOG_AND_CONSOLE,
@ -572,7 +574,7 @@ static int setup_output(
case EXEC_OUTPUT_KMSG_AND_CONSOLE:
case EXEC_OUTPUT_JOURNAL:
case EXEC_OUTPUT_JOURNAL_AND_CONSOLE:
r = connect_logger_as(unit, context, o, ident, fileno, uid, gid);
r = connect_logger_as(unit, context, params, o, ident, fileno, uid, gid);
if (r < 0) {
log_unit_error_errno(unit, r, "Failed to connect %s to the journal socket, ignoring: %m", fileno == STDOUT_FILENO ? "stdout" : "stderr");
r = open_null_as(O_WRONLY, fileno);
@ -1310,8 +1312,9 @@ static bool skip_seccomp_unavailable(const Unit* u, const char* msg) {
return true;
}
static int apply_syscall_filter(const Unit* u, const ExecContext *c) {
static int apply_syscall_filter(const Unit* u, const ExecContext *c, bool needs_ambient_hack) {
uint32_t negative_action, default_action, action;
int r;
assert(u);
assert(c);
@ -1332,6 +1335,12 @@ static int apply_syscall_filter(const Unit* u, const ExecContext *c) {
action = negative_action;
}
if (needs_ambient_hack) {
r = seccomp_filter_set_add(c->syscall_filter, c->syscall_whitelist, syscall_filter_sets + SYSCALL_FILTER_SET_SETUID);
if (r < 0)
return r;
}
return seccomp_load_syscall_filter_set_raw(default_action, c->syscall_filter, action);
}
@ -1534,7 +1543,7 @@ static int build_environment(
/* If this is D-Bus, tell the nss-systemd module, since it relies on being able to use D-Bus look up dynamic
* users via PID 1, possibly dead-locking the dbus daemon. This way it will not use D-Bus to resolve names, but
* check the database directly. */
if (unit_has_name(u, SPECIAL_DBUS_SERVICE)) {
if (p->flags & EXEC_NSS_BYPASS_BUS) {
x = strdup("SYSTEMD_NSS_BYPASS_BUS=1");
if (!x)
return -ENOMEM;
@ -1841,7 +1850,6 @@ static int setup_exec_directory(
const ExecParameters *params,
uid_t uid,
gid_t gid,
bool manager_is_system,
ExecDirectoryType type,
int *exit_status) {
@ -1863,7 +1871,7 @@ static int setup_exec_directory(
if (!params->prefix[type])
return 0;
if (manager_is_system) {
if (params->flags & EXEC_CHOWN_DIRECTORIES) {
if (!uid_is_valid(uid))
uid = 0;
if (!gid_is_valid(gid))
@ -1887,6 +1895,11 @@ static int setup_exec_directory(
if (r < 0)
goto fail;
/* Don't change the owner of the configuration directory, as in the common case it is not written to by
* a service, and shall not be writable. */
if (type == EXEC_DIRECTORY_CONFIGURATION)
continue;
r = chmod_and_chown(p, context->directories[type].mode, uid, gid);
if (r < 0)
goto fail;
@ -1998,7 +2011,7 @@ static int apply_mount_namespace(
.protect_kernel_modules = context->protect_kernel_modules,
.mount_apivfs = context->mount_apivfs,
};
bool apply_restrictions;
bool needs_sandboxing;
int r;
assert(context);
@ -2033,18 +2046,18 @@ static int apply_mount_namespace(
if (!context->dynamic_user && root_dir)
ns_info.ignore_protect_paths = true;
apply_restrictions = (params->flags & EXEC_APPLY_PERMISSIONS) && !command->privileged;
needs_sandboxing = (params->flags & EXEC_APPLY_SANDBOXING) && !(command->flags & EXEC_COMMAND_FULLY_PRIVILEGED);
r = setup_namespace(root_dir, root_image,
&ns_info, rw,
apply_restrictions ? context->read_only_paths : NULL,
apply_restrictions ? context->inaccessible_paths : NULL,
needs_sandboxing ? context->read_only_paths : NULL,
needs_sandboxing ? context->inaccessible_paths : NULL,
context->bind_mounts,
context->n_bind_mounts,
tmp,
var,
apply_restrictions ? context->protect_home : PROTECT_HOME_NO,
apply_restrictions ? context->protect_system : PROTECT_SYSTEM_NO,
needs_sandboxing ? context->protect_home : PROTECT_HOME_NO,
needs_sandboxing ? context->protect_system : PROTECT_SYSTEM_NO,
context->mount_flags,
DISSECT_IMAGE_DISCARD_ON_LOOP);
@ -2296,21 +2309,25 @@ static int exec_child(
const char *home = NULL, *shell = NULL;
dev_t journal_stream_dev = 0;
ino_t journal_stream_ino = 0;
bool needs_exec_restrictions, needs_mount_namespace;
bool needs_sandboxing, /* Do we need to set up full sandboxing? (i.e. all namespacing, all MAC stuff, caps, yadda yadda */
needs_setuid, /* Do we need to do the actual setresuid()/setresgid() calls? */
needs_mount_namespace, /* Do we need to set up a mount namespace for this kernel? */
needs_ambient_hack; /* Do we need to apply the ambient capabilities hack? */
#ifdef HAVE_SELINUX
bool needs_selinux = false;
bool use_selinux = false;
#endif
#ifdef HAVE_SMACK
bool needs_smack = false;
bool use_smack = false;
#endif
#ifdef HAVE_APPARMOR
bool needs_apparmor = false;
bool use_apparmor = false;
#endif
uid_t uid = UID_INVALID;
gid_t gid = GID_INVALID;
int i, r, ngids = 0;
unsigned n_fds;
ExecDirectoryType dt;
int secure_bits;
assert(unit);
assert(command);
@ -2583,7 +2600,7 @@ static int exec_child(
/* If delegation is enabled we'll pass ownership of the cgroup
* (but only in systemd's own controller hierarchy!) to the
* user of the new process. */
if (params->cgroup_path && context->user && params->cgroup_delegate) {
if (params->cgroup_path && context->user && (params->flags & EXEC_CGROUP_DELEGATE)) {
r = cg_set_task_access(SYSTEMD_CGROUP_CONTROLLER, params->cgroup_path, 0644, uid, gid);
if (r < 0) {
*exit_status = EXIT_CGROUP;
@ -2599,7 +2616,7 @@ static int exec_child(
}
for (dt = 0; dt < _EXEC_DIRECTORY_MAX; dt++) {
r = setup_exec_directory(context, params, uid, gid, MANAGER_IS_SYSTEM(unit->manager), dt, exit_status);
r = setup_exec_directory(context, params, uid, gid, dt, exit_status);
if (r < 0)
return r;
}
@ -2647,9 +2664,35 @@ static int exec_child(
return r;
}
needs_exec_restrictions = (params->flags & EXEC_APPLY_PERMISSIONS) && !command->privileged;
/* We need sandboxing if the caller asked us to apply it and the command isn't explicitly excepted from it */
needs_sandboxing = (params->flags & EXEC_APPLY_SANDBOXING) && !(command->flags & EXEC_COMMAND_FULLY_PRIVILEGED);
if (needs_exec_restrictions) {
/* We need the ambient capability hack, if the caller asked us to apply it and the command is marked for it, and the kernel doesn't actually support ambient caps */
needs_ambient_hack = (params->flags & EXEC_APPLY_SANDBOXING) && (command->flags & EXEC_COMMAND_AMBIENT_MAGIC) && !ambient_capabilities_supported();
/* We need setresuid() if the caller asked us to apply sandboxing and the command isn't explicitly excepted from either whole sandboxing or just setresuid() itself, and the ambient hack is not desired */
if (needs_ambient_hack)
needs_setuid = false;
else
needs_setuid = (params->flags & EXEC_APPLY_SANDBOXING) && !(command->flags & (EXEC_COMMAND_FULLY_PRIVILEGED|EXEC_COMMAND_NO_SETUID));
if (needs_sandboxing) {
/* MAC enablement checks need to be done before a new mount ns is created, as they rely on /sys being
* present. The actual MAC context application will happen later, as late as possible, to avoid
* impacting our own code paths. */
#ifdef HAVE_SELINUX
use_selinux = mac_selinux_use();
#endif
#ifdef HAVE_SMACK
use_smack = mac_smack_use();
#endif
#ifdef HAVE_APPARMOR
use_apparmor = mac_apparmor_use();
#endif
}
if (needs_setuid) {
if (context->pam_name && username) {
r = setup_pam(context->pam_name, username, uid, gid, context->tty_path, &accum_env, fds, n_fds);
if (r < 0) {
@ -2657,23 +2700,6 @@ static int exec_child(
return r;
}
}
/* MAC enablement checks need to be done before a new mount ns is created, as they rely on /sys being
* present. The actual MAC context application will happen later, as late as possible, to avoid
* impacting our own code paths. */
#ifdef HAVE_SELINUX
needs_selinux = mac_selinux_use();
#endif
#ifdef HAVE_SMACK
needs_smack = mac_smack_use();
#endif
#ifdef HAVE_APPARMOR
needs_apparmor = context->apparmor_profile && mac_apparmor_use();
#endif
}
if (context->private_network && runtime && runtime->netns_storage_socket[0] >= 0) {
@ -2699,7 +2725,7 @@ static int exec_child(
return r;
/* Drop groups as early as possbile */
if (needs_exec_restrictions) {
if (needs_setuid) {
r = enforce_groups(context, gid, supplementary_gids, ngids);
if (r < 0) {
*exit_status = EXIT_GROUP;
@ -2707,30 +2733,29 @@ static int exec_child(
}
}
if (needs_sandboxing) {
#ifdef HAVE_SELINUX
if (needs_exec_restrictions && needs_selinux && params->selinux_context_net && socket_fd >= 0) {
r = mac_selinux_get_child_mls_label(socket_fd, command->path, context->selinux_context, &mac_selinux_context_net);
if (r < 0) {
*exit_status = EXIT_SELINUX_CONTEXT;
return r;
if (use_selinux && params->selinux_context_net && socket_fd >= 0) {
r = mac_selinux_get_child_mls_label(socket_fd, command->path, context->selinux_context, &mac_selinux_context_net);
if (r < 0) {
*exit_status = EXIT_SELINUX_CONTEXT;
return r;
}
}
}
#endif
if ((params->flags & EXEC_APPLY_PERMISSIONS) && context->private_users) {
r = setup_private_users(uid, gid);
if (r < 0) {
*exit_status = EXIT_USER;
return r;
if (context->private_users) {
r = setup_private_users(uid, gid);
if (r < 0) {
*exit_status = EXIT_USER;
return r;
}
}
}
/* We repeat the fd closing here, to make sure that
* nothing is leaked from the PAM modules. Note that
* we are more aggressive this time since socket_fd
* and the netns fds we don't need anymore. The custom
* endpoint fd was needed to upload the policy and can
* now be closed as well. */
/* We repeat the fd closing here, to make sure that nothing is leaked from the PAM modules. Note that we are
* more aggressive this time since socket_fd and the netns fds we don't need anymore. The custom endpoint fd
* was needed to upload the policy and can now be closed as well. */
r = close_all_fds(fds, n_fds);
if (r >= 0)
r = shift_fds(fds, n_fds);
@ -2741,9 +2766,10 @@ static int exec_child(
return r;
}
if (needs_exec_restrictions) {
secure_bits = context->secure_bits;
int secure_bits = context->secure_bits;
if (needs_sandboxing) {
uint64_t bset;
for (i = 0; i < _RLIMIT_MAX; i++) {
@ -2765,8 +2791,17 @@ static int exec_child(
}
}
if (!cap_test_all(context->capability_bounding_set)) {
r = capability_bounding_set_drop(context->capability_bounding_set, false);
bset = context->capability_bounding_set;
/* If the ambient caps hack is enabled (which means the kernel can't do them, and the user asked for
* our magic fallback), then let's add some extra caps, so that the service can drop privs of its own,
* instead of us doing that */
if (needs_ambient_hack)
bset |= (UINT64_C(1) << CAP_SETPCAP) |
(UINT64_C(1) << CAP_SETUID) |
(UINT64_C(1) << CAP_SETGID);
if (!cap_test_all(bset)) {
r = capability_bounding_set_drop(bset, false);
if (r < 0) {
*exit_status = EXIT_CAPABILITIES;
*error_message = strdup("Failed to drop capabilities");
@ -2776,7 +2811,8 @@ static int exec_child(
/* This is done before enforce_user, but ambient set
* does not survive over setresuid() if keep_caps is not set. */
if (context->capability_ambient_set != 0) {
if (!needs_ambient_hack &&
context->capability_ambient_set != 0) {
r = capability_ambient_set_apply(context->capability_ambient_set, true);
if (r < 0) {
*exit_status = EXIT_CAPABILITIES;
@ -2784,7 +2820,9 @@ static int exec_child(
return r;
}
}
}
if (needs_setuid) {
if (context->user) {
r = enforce_user(context, uid);
if (r < 0) {
@ -2792,7 +2830,9 @@ static int exec_child(
(void) asprintf(error_message, "Failed to change UID to "UID_FMT, uid);
return r;
}
if (context->capability_ambient_set != 0) {
if (!needs_ambient_hack &&
context->capability_ambient_set != 0) {
/* Fix the ambient capabilities after user change. */
r = capability_ambient_set_apply(context->capability_ambient_set, false);
@ -2812,14 +2852,16 @@ static int exec_child(
secure_bits |= 1<<SECURE_KEEP_CAPS;
}
}
}
if (needs_sandboxing) {
/* Apply the MAC contexts late, but before seccomp syscall filtering, as those should really be last to
* influence our own codepaths as little as possible. Moreover, applying MAC contexts usually requires
* syscalls that are subject to seccomp filtering, hence should probably be applied before the syscalls
* are restricted. */
#ifdef HAVE_SELINUX
if (needs_selinux) {
if (use_selinux) {
char *exec_context = mac_selinux_context_net ?: context->selinux_context;
if (exec_context) {
@ -2834,7 +2876,7 @@ static int exec_child(
#endif
#ifdef HAVE_SMACK
if (needs_smack) {
if (use_smack) {
r = setup_smack(context, command);
if (r < 0) {
*exit_status = EXIT_SMACK_PROCESS_LABEL;
@ -2845,7 +2887,7 @@ static int exec_child(
#endif
#ifdef HAVE_APPARMOR
if (needs_apparmor) {
if (use_apparmor && context->apparmor_profile) {
r = aa_change_onexec(context->apparmor_profile);
if (r < 0 && !context->apparmor_profile_ignore) {
*exit_status = EXIT_APPARMOR_PROFILE;
@ -2857,10 +2899,8 @@ static int exec_child(
}
#endif
/* PR_GET_SECUREBITS is not privileged, while
* PR_SET_SECUREBITS is. So to suppress
* potential EPERMs we'll try not to call
* PR_SET_SECUREBITS unless necessary. */
/* PR_GET_SECUREBITS is not privileged, while PR_SET_SECUREBITS is. So to suppress potential EPERMs
* we'll try not to call PR_SET_SECUREBITS unless necessary. */
if (prctl(PR_GET_SECUREBITS) != secure_bits)
if (prctl(PR_SET_SECUREBITS, secure_bits) < 0) {
*exit_status = EXIT_SECUREBITS;
@ -2934,7 +2974,7 @@ static int exec_child(
/* This really should remain the last step before the execve(), to make sure our own code is unaffected
* by the filter as little as possible. */
r = apply_syscall_filter(unit, context);
r = apply_syscall_filter(unit, context, needs_ambient_hack);
if (r < 0) {
*exit_status = EXIT_SECCOMP;
*error_message = strdup("Failed to apply syscall filters");
@ -3068,7 +3108,7 @@ int exec_spawn(Unit *unit,
error_message),
"EXECUTABLE=%s", command->path,
NULL);
else if (r == -ENOENT && command->ignore)
else if (r == -ENOENT && (command->flags & EXEC_COMMAND_IGNORE_FAILURE))
log_struct_errno(LOG_INFO, r,
"MESSAGE_ID=" SD_MESSAGE_SPAWN_FAILED_STR,
LOG_UNIT_ID(unit),
@ -3576,18 +3616,20 @@ void exec_context_dump(ExecContext *c, FILE* f, const char *prefix) {
prefix, yes_no(c->tty_vhangup),
prefix, yes_no(c->tty_vt_disallocate));
if (c->std_output == EXEC_OUTPUT_SYSLOG ||
c->std_output == EXEC_OUTPUT_KMSG ||
c->std_output == EXEC_OUTPUT_JOURNAL ||
c->std_output == EXEC_OUTPUT_SYSLOG_AND_CONSOLE ||
c->std_output == EXEC_OUTPUT_KMSG_AND_CONSOLE ||
c->std_output == EXEC_OUTPUT_JOURNAL_AND_CONSOLE ||
c->std_error == EXEC_OUTPUT_SYSLOG ||
c->std_error == EXEC_OUTPUT_KMSG ||
c->std_error == EXEC_OUTPUT_JOURNAL ||
c->std_error == EXEC_OUTPUT_SYSLOG_AND_CONSOLE ||
c->std_error == EXEC_OUTPUT_KMSG_AND_CONSOLE ||
c->std_error == EXEC_OUTPUT_JOURNAL_AND_CONSOLE) {
if (IN_SET(c->std_output,
EXEC_OUTPUT_SYSLOG,
EXEC_OUTPUT_KMSG,
EXEC_OUTPUT_JOURNAL,
EXEC_OUTPUT_SYSLOG_AND_CONSOLE,
EXEC_OUTPUT_KMSG_AND_CONSOLE,
EXEC_OUTPUT_JOURNAL_AND_CONSOLE) ||
IN_SET(c->std_error,
EXEC_OUTPUT_SYSLOG,
EXEC_OUTPUT_KMSG,
EXEC_OUTPUT_JOURNAL,
EXEC_OUTPUT_SYSLOG_AND_CONSOLE,
EXEC_OUTPUT_KMSG_AND_CONSOLE,
EXEC_OUTPUT_JOURNAL_AND_CONSOLE)) {
_cleanup_free_ char *fac_str = NULL, *lvl_str = NULL;

View file

@ -88,13 +88,19 @@ struct ExecStatus {
int status; /* as in sigingo_t::si_status */
};
typedef enum ExecCommandFlags {
EXEC_COMMAND_IGNORE_FAILURE = 1,
EXEC_COMMAND_FULLY_PRIVILEGED = 2,
EXEC_COMMAND_NO_SETUID = 4,
EXEC_COMMAND_AMBIENT_MAGIC = 8,
} ExecCommandFlags;
struct ExecCommand {
char *path;
char **argv;
ExecStatus exec_status;
ExecCommandFlags flags;
LIST_FIELDS(ExecCommand, command); /* useful for chaining commands */
bool ignore:1;
bool privileged:1;
};
struct ExecRuntime {
@ -251,16 +257,20 @@ static inline bool exec_context_restrict_namespaces_set(const ExecContext *c) {
}
typedef enum ExecFlags {
EXEC_APPLY_PERMISSIONS = 1U << 0,
EXEC_APPLY_SANDBOXING = 1U << 0,
EXEC_APPLY_CHROOT = 1U << 1,
EXEC_APPLY_TTY_STDIN = 1U << 2,
EXEC_NEW_KEYRING = 1U << 3,
EXEC_PASS_LOG_UNIT = 1U << 4, /* Whether to pass the unit name to the service's journal stream connection */
EXEC_CHOWN_DIRECTORIES = 1U << 5, /* chown() the runtime/state/cache/log directories to the user we run as, under all conditions */
EXEC_NSS_BYPASS_BUS = 1U << 6, /* Set the SYSTEMD_NSS_BYPASS_BUS environment variable, to disable nss-systemd for dbus */
EXEC_CGROUP_DELEGATE = 1U << 7,
/* The following are not used by execute.c, but by consumers internally */
EXEC_PASS_FDS = 1U << 4,
EXEC_IS_CONTROL = 1U << 5,
EXEC_SETENV_RESULT = 1U << 6,
EXEC_SET_WATCHDOG = 1U << 7,
EXEC_PASS_FDS = 1U << 8,
EXEC_IS_CONTROL = 1U << 9,
EXEC_SETENV_RESULT = 1U << 10,
EXEC_SET_WATCHDOG = 1U << 11,
} ExecFlags;
struct ExecParameters {
@ -275,7 +285,6 @@ struct ExecParameters {
ExecFlags flags;
bool selinux_context_net:1;
bool cgroup_delegate:1;
CGroupMask cgroup_supported;
const char *cgroup_path;

View file

@ -608,7 +608,8 @@ int config_parse_exec(
p = rvalue;
do {
_cleanup_free_ char *path = NULL, *firstword = NULL;
bool separate_argv0 = false, ignore = false, privileged = false;
ExecCommandFlags flags = 0;
bool ignore = false, separate_argv0 = false;
_cleanup_free_ ExecCommand *nce = NULL;
_cleanup_strv_free_ char **n = NULL;
size_t nlen = 0, nbufsize = 0;
@ -622,18 +623,31 @@ int config_parse_exec(
f = firstword;
for (;;) {
/* We accept an absolute path as first argument.
* If it's prefixed with - and the path doesn't exist,
* we ignore it instead of erroring out;
* if it's prefixed with @, we allow overriding of argv[0];
* and if it's prefixed with +, it will be run with full privileges */
if (*f == '-' && !ignore)
/* We accept an absolute path as first argument. If it's prefixed with - and the path doesn't
* exist, we ignore it instead of erroring out; if it's prefixed with @, we allow overriding of
* argv[0]; if it's prefixed with +, it will be run with full privileges and no sandboxing; if
* it's prefixed with '!' we apply sandboxing, but do not change user/group credentials; if
* it's prefixed with '!!', then we apply user/group credentials if the kernel supports ambient
* capabilities -- if it doesn't we don't apply the credentials themselves, but do apply most
* other sandboxing, with some special exceptions for changing UID.
*
* The idea is that '!!' may be used to write services that can take benefit of systemd's
* UID/GID dropping if the kernel supports ambient creds, but provide an automatic fallback to
* privilege dropping within the daemon if the kernel does not offer that. */
if (*f == '-' && !(flags & EXEC_COMMAND_IGNORE_FAILURE)) {
flags |= EXEC_COMMAND_IGNORE_FAILURE;
ignore = true;
else if (*f == '@' && !separate_argv0)
} else if (*f == '@' && !separate_argv0)
separate_argv0 = true;
else if (*f == '+' && !privileged)
privileged = true;
else
else if (*f == '+' && !(flags & (EXEC_COMMAND_FULLY_PRIVILEGED|EXEC_COMMAND_NO_SETUID|EXEC_COMMAND_AMBIENT_MAGIC)))
flags |= EXEC_COMMAND_FULLY_PRIVILEGED;
else if (*f == '!' && !(flags & (EXEC_COMMAND_FULLY_PRIVILEGED|EXEC_COMMAND_NO_SETUID|EXEC_COMMAND_AMBIENT_MAGIC)))
flags |= EXEC_COMMAND_NO_SETUID;
else if (*f == '!' && !(flags & (EXEC_COMMAND_FULLY_PRIVILEGED|EXEC_COMMAND_AMBIENT_MAGIC))) {
flags &= ~EXEC_COMMAND_NO_SETUID;
flags |= EXEC_COMMAND_AMBIENT_MAGIC;
} else
break;
f++;
}
@ -752,8 +766,7 @@ int config_parse_exec(
nce->argv = n;
nce->path = path;
nce->ignore = ignore;
nce->privileged = privileged;
nce->flags = flags;
exec_command_append_list(e, nce);

View file

@ -3413,7 +3413,7 @@ Set *manager_get_units_requiring_mounts_for(Manager *m, const char *path) {
return hashmap_get(m->units_requiring_mounts_for, streq(p, "/") ? "" : p);
}
int manager_set_exec_params(Manager *m, ExecParameters *p) {
void manager_set_exec_params(Manager *m, ExecParameters *p) {
assert(m);
assert(p);
@ -3422,7 +3422,7 @@ int manager_set_exec_params(Manager *m, ExecParameters *p) {
p->cgroup_supported = m->cgroup_supported;
p->prefix = m->prefix;
return 0;
SET_FLAG(p->flags, EXEC_PASS_LOG_UNIT|EXEC_CHOWN_DIRECTORIES, MANAGER_IS_SYSTEM(m));
}
int manager_update_failed_units(Manager *m, Unit *u, bool failed) {

View file

@ -384,7 +384,7 @@ void manager_flip_auto_status(Manager *m, bool enable);
Set *manager_get_units_requiring_mounts_for(Manager *m, const char *path);
int manager_set_exec_params(Manager *m, ExecParameters *p);
void manager_set_exec_params(Manager *m, ExecParameters *p);
ManagerState manager_state(Manager *m);

View file

@ -742,7 +742,7 @@ static int mount_spawn(Mount *m, ExecCommand *c, pid_t *_pid) {
pid_t pid;
int r;
ExecParameters exec_params = {
.flags = EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN,
.flags = EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN,
.stdin_fd = -1,
.stdout_fd = -1,
.stderr_fd = -1,
@ -770,12 +770,8 @@ static int mount_spawn(Mount *m, ExecCommand *c, pid_t *_pid) {
if (r < 0)
return r;
r = manager_set_exec_params(UNIT(m)->manager, &exec_params);
if (r < 0)
return r;
exec_params.cgroup_path = UNIT(m)->cgroup_path;
exec_params.cgroup_delegate = m->cgroup_context.delegate;
manager_set_exec_params(UNIT(m)->manager, &exec_params);
unit_set_exec_params(UNIT(m), &exec_params);
r = exec_spawn(UNIT(m),
c,

View file

@ -1218,7 +1218,6 @@ static int service_spawn(
_cleanup_strv_free_ char **final_env = NULL, **our_env = NULL, **fd_names = NULL;
_cleanup_free_ int *fds = NULL;
unsigned n_storage_fds = 0, n_socket_fds = 0, n_env = 0;
const char *path;
pid_t pid;
ExecParameters exec_params = {
@ -1237,7 +1236,7 @@ static int service_spawn(
if (flags & EXEC_IS_CONTROL) {
/* If this is a control process, mask the permissions/chroot application if this is requested. */
if (s->permissions_start_only)
exec_params.flags &= ~EXEC_APPLY_PERMISSIONS;
exec_params.flags &= ~EXEC_APPLY_SANDBOXING;
if (s->root_directory_start_only)
exec_params.flags &= ~EXEC_APPLY_CHROOT;
}
@ -1344,29 +1343,31 @@ static int service_spawn(
}
}
r = manager_set_exec_params(UNIT(s)->manager, &exec_params);
if (r < 0)
return r;
manager_set_exec_params(UNIT(s)->manager, &exec_params);
unit_set_exec_params(UNIT(s), &exec_params);
final_env = strv_env_merge(2, exec_params.environment, our_env, NULL);
if (!final_env)
return -ENOMEM;
if ((flags & EXEC_IS_CONTROL) && UNIT(s)->cgroup_path) {
path = strjoina(UNIT(s)->cgroup_path, "/control");
(void) cg_create(SYSTEMD_CGROUP_CONTROLLER, path);
} else
path = UNIT(s)->cgroup_path;
exec_params.cgroup_path = strjoina(UNIT(s)->cgroup_path, "/control");
(void) cg_create(SYSTEMD_CGROUP_CONTROLLER, exec_params.cgroup_path);
}
/* System services should get a new keyring by default. */
SET_FLAG(exec_params.flags, EXEC_NEW_KEYRING, MANAGER_IS_SYSTEM(UNIT(s)->manager));
/* System D-Bus needs nss-systemd disabled, so that we don't deadlock */
SET_FLAG(exec_params.flags, EXEC_NSS_BYPASS_BUS,
MANAGER_IS_SYSTEM(UNIT(s)->manager) && unit_has_name(UNIT(s), SPECIAL_DBUS_SERVICE));
exec_params.flags |= MANAGER_IS_SYSTEM(UNIT(s)->manager) ? EXEC_NEW_KEYRING : 0;
exec_params.argv = c->argv;
exec_params.environment = final_env;
exec_params.fds = fds;
exec_params.fd_names = fd_names;
exec_params.n_storage_fds = n_storage_fds;
exec_params.n_socket_fds = n_socket_fds;
exec_params.cgroup_path = path;
exec_params.cgroup_delegate = s->cgroup_context.delegate;
exec_params.watchdog_usec = s->watchdog_usec;
exec_params.selinux_context_net = s->socket_fd_selinux_context_net;
if (s->type == SERVICE_IDLE)
@ -1569,7 +1570,7 @@ static void service_enter_stop_post(Service *s, ServiceResult f) {
r = service_spawn(s,
s->control_command,
s->timeout_stop_usec,
EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN|EXEC_IS_CONTROL|EXEC_SETENV_RESULT,
EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN|EXEC_IS_CONTROL|EXEC_SETENV_RESULT,
&s->control_pid);
if (r < 0)
goto fail;
@ -1680,7 +1681,7 @@ static void service_enter_stop(Service *s, ServiceResult f) {
r = service_spawn(s,
s->control_command,
s->timeout_stop_usec,
EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL|EXEC_SETENV_RESULT,
EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL|EXEC_SETENV_RESULT,
&s->control_pid);
if (r < 0)
goto fail;
@ -1759,7 +1760,7 @@ static void service_enter_start_post(Service *s) {
r = service_spawn(s,
s->control_command,
s->timeout_start_usec,
EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL,
EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL,
&s->control_pid);
if (r < 0)
goto fail;
@ -1837,7 +1838,7 @@ static void service_enter_start(Service *s) {
r = service_spawn(s,
c,
timeout,
EXEC_PASS_FDS|EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN|EXEC_SET_WATCHDOG,
EXEC_PASS_FDS|EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN|EXEC_SET_WATCHDOG,
&pid);
if (r < 0)
goto fail;
@ -1896,7 +1897,7 @@ static void service_enter_start_pre(Service *s) {
r = service_spawn(s,
s->control_command,
s->timeout_start_usec,
EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL|EXEC_APPLY_TTY_STDIN,
EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL|EXEC_APPLY_TTY_STDIN,
&s->control_pid);
if (r < 0)
goto fail;
@ -1994,7 +1995,7 @@ static void service_enter_reload(Service *s) {
r = service_spawn(s,
s->control_command,
s->timeout_start_usec,
EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL,
EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL,
&s->control_pid);
if (r < 0)
goto fail;
@ -2032,7 +2033,7 @@ static void service_run_next_control(Service *s) {
r = service_spawn(s,
s->control_command,
timeout,
EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL|
EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_IS_CONTROL|
(IN_SET(s->control_command_id, SERVICE_EXEC_START_PRE, SERVICE_EXEC_STOP_POST) ? EXEC_APPLY_TTY_STDIN : 0)|
(IN_SET(s->control_command_id, SERVICE_EXEC_STOP, SERVICE_EXEC_STOP_POST) ? EXEC_SETENV_RESULT : 0),
&s->control_pid);
@ -2070,7 +2071,7 @@ static void service_run_next_main(Service *s) {
r = service_spawn(s,
s->main_command,
s->timeout_start_usec,
EXEC_PASS_FDS|EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN|EXEC_SET_WATCHDOG,
EXEC_PASS_FDS|EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN|EXEC_SET_WATCHDOG,
&pid);
if (r < 0)
goto fail;
@ -2913,7 +2914,7 @@ static void service_sigchld_event(Unit *u, pid_t pid, int code, int status) {
s->main_command->exec_status = s->main_exec_status;
if (s->main_command->ignore)
if (s->main_command->flags & EXEC_COMMAND_IGNORE_FAILURE)
f = SERVICE_SUCCESS;
} else if (s->exec_command[SERVICE_EXEC_START]) {
@ -2921,7 +2922,7 @@ static void service_sigchld_event(Unit *u, pid_t pid, int code, int status) {
* ignore the return value if this was
* configured for the starter process */
if (s->exec_command[SERVICE_EXEC_START]->ignore)
if (s->exec_command[SERVICE_EXEC_START]->flags & EXEC_COMMAND_IGNORE_FAILURE)
f = SERVICE_SUCCESS;
}
@ -3027,7 +3028,7 @@ static void service_sigchld_event(Unit *u, pid_t pid, int code, int status) {
if (s->control_command) {
exec_status_exit(&s->control_command->exec_status, &s->exec_context, pid, code, status);
if (s->control_command->ignore)
if (s->control_command->flags & EXEC_COMMAND_IGNORE_FAILURE)
f = SERVICE_SUCCESS;
}

View file

@ -1762,7 +1762,7 @@ static int socket_spawn(Socket *s, ExecCommand *c, pid_t *_pid) {
pid_t pid;
int r;
ExecParameters exec_params = {
.flags = EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN,
.flags = EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN,
.stdin_fd = -1,
.stdout_fd = -1,
.stderr_fd = -1,
@ -1790,13 +1790,10 @@ static int socket_spawn(Socket *s, ExecCommand *c, pid_t *_pid) {
if (r < 0)
return r;
r = manager_set_exec_params(UNIT(s)->manager, &exec_params);
if (r < 0)
return r;
manager_set_exec_params(UNIT(s)->manager, &exec_params);
unit_set_exec_params(UNIT(s), &exec_params);
exec_params.argv = c->argv;
exec_params.cgroup_path = UNIT(s)->cgroup_path;
exec_params.cgroup_delegate = s->cgroup_context.delegate;
r = exec_spawn(UNIT(s),
c,
@ -2776,7 +2773,7 @@ static void socket_sigchld_event(Unit *u, pid_t pid, int code, int status) {
if (s->control_command) {
exec_status_exit(&s->control_command->exec_status, &s->exec_context, pid, code, status);
if (s->control_command->ignore)
if (s->control_command->flags & EXEC_COMMAND_IGNORE_FAILURE)
f = SOCKET_SUCCESS;
}

View file

@ -608,7 +608,7 @@ static int swap_spawn(Swap *s, ExecCommand *c, pid_t *_pid) {
pid_t pid;
int r;
ExecParameters exec_params = {
.flags = EXEC_APPLY_PERMISSIONS|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN,
.flags = EXEC_APPLY_SANDBOXING|EXEC_APPLY_CHROOT|EXEC_APPLY_TTY_STDIN,
.stdin_fd = -1,
.stdout_fd = -1,
.stderr_fd = -1,
@ -636,12 +636,8 @@ static int swap_spawn(Swap *s, ExecCommand *c, pid_t *_pid) {
if (r < 0)
goto fail;
r = manager_set_exec_params(UNIT(s)->manager, &exec_params);
if (r < 0)
goto fail;
exec_params.cgroup_path = UNIT(s)->cgroup_path;
exec_params.cgroup_delegate = s->cgroup_context.delegate;
manager_set_exec_params(UNIT(s)->manager, &exec_params);
unit_set_exec_params(UNIT(s), &exec_params);
r = exec_spawn(UNIT(s),
c,

View file

@ -4399,3 +4399,15 @@ int unit_acquire_invocation_id(Unit *u) {
return 0;
}
void unit_set_exec_params(Unit *s, ExecParameters *p) {
CGroupContext *c;
assert(s);
assert(s);
p->cgroup_path = s->cgroup_path;
c = unit_get_cgroup_context(s);
SET_FLAG(p->flags, EXEC_CGROUP_DELEGATE, c && c->delegate);
}

View file

@ -659,6 +659,8 @@ int unit_acquire_invocation_id(Unit *u);
bool unit_shall_confirm_spawn(Unit *u);
void unit_set_exec_params(Unit *s, ExecParameters *p);
/* Macros which append UNIT= or USER_UNIT= to the message */
#define log_unit_full(unit, level, error, ...) \

View file

@ -67,13 +67,18 @@ int main(int argc, char *argv[]) {
goto finish;
}
/* Drop privileges, but keep three caps. Note that we drop those too, later on (see below) */
r = drop_privileges(uid, gid,
(UINT64_C(1) << CAP_NET_RAW)| /* needed for SO_BINDTODEVICE */
(UINT64_C(1) << CAP_NET_BIND_SERVICE)| /* needed to bind on port 53 */
(UINT64_C(1) << CAP_SETPCAP) /* needed in order to drop the caps later */);
if (r < 0)
goto finish;
/* Drop privileges, but only if we have been started as root. If we are not running as root we assume all
* privileges are already dropped. */
if (getuid() == 0) {
/* Drop privileges, but keep three caps. Note that we drop those too, later on (see below) */
r = drop_privileges(uid, gid,
(UINT64_C(1) << CAP_NET_RAW)| /* needed for SO_BINDTODEVICE */
(UINT64_C(1) << CAP_NET_BIND_SERVICE)| /* needed to bind on port 53 */
(UINT64_C(1) << CAP_SETPCAP) /* needed in order to drop the caps later */);
if (r < 0)
goto finish;
}
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, SIGUSR1, SIGUSR2, -1) >= 0);

View file

@ -639,6 +639,25 @@ const SyscallFilterSet syscall_filter_sets[_SYSCALL_FILTER_SET_MAX] = {
"sched_setattr\0"
"prlimit64\0"
},
[SYSCALL_FILTER_SET_SETUID] = {
.name = "@setuid",
.help = "Operations for changing user/group credentials",
.value =
"setgid32\0"
"setgid\0"
"setgroups32\0"
"setgroups\0"
"setregid32\0"
"setregid\0"
"setresgid32\0"
"setresgid\0"
"setresuid32\0"
"setresuid\0"
"setreuid32\0"
"setreuid\0"
"setuid32\0"
"setuid\0"
},
[SYSCALL_FILTER_SET_SWAP] = {
.name = "@swap",
.help = "Enable/disable swap devices",
@ -1345,3 +1364,41 @@ int parse_syscall_archs(char **l, Set **archs) {
return 0;
}
int seccomp_filter_set_add(Set *filter, bool add, const SyscallFilterSet *set) {
const char *i;
int r;
assert(set);
NULSTR_FOREACH(i, set->value) {
if (i[0] == '@') {
const SyscallFilterSet *more;
more = syscall_filter_set_find(i);
if (!more)
return -ENXIO;
r = seccomp_filter_set_add(filter, add, more);
if (r < 0)
return r;
} else {
int id;
id = seccomp_syscall_resolve_name(i);
if (id == __NR_SCMP_ERROR)
return -ENXIO;
if (add) {
r = set_put(filter, INT_TO_PTR(id + 1));
if (r < 0)
return r;
} else
(void) set_remove(filter, INT_TO_PTR(id + 1));
}
}
return 0;
}

View file

@ -58,6 +58,7 @@ enum {
SYSCALL_FILTER_SET_RAW_IO,
SYSCALL_FILTER_SET_REBOOT,
SYSCALL_FILTER_SET_RESOURCES,
SYSCALL_FILTER_SET_SETUID,
SYSCALL_FILTER_SET_SWAP,
_SYSCALL_FILTER_SET_MAX
};
@ -66,6 +67,8 @@ extern const SyscallFilterSet syscall_filter_sets[];
const SyscallFilterSet *syscall_filter_set_find(const char *name);
int seccomp_filter_set_add(Set *s, bool b, const SyscallFilterSet *set);
int seccomp_load_syscall_filter_set(uint32_t default_action, const SyscallFilterSet *set, uint32_t action);
int seccomp_load_syscall_filter_set_raw(uint32_t default_action, Set* set, uint32_t action);

View file

@ -205,6 +205,8 @@ int main(int argc, char *argv[]) {
log_parse_environment();
log_open();
log_info("have ambient caps: %s", yes_no(ambient_capabilities_supported()));
if (getuid() != 0)
return EXIT_TEST_SKIP;

View file

@ -93,7 +93,7 @@ static void check_execcommand(ExecCommand *c,
assert_se(streq_ptr(c->argv[1], argv1));
if (n > 1)
assert_se(streq_ptr(c->argv[2], argv2));
assert_se(c->ignore == ignore);
assert_se(!!(c->flags & EXEC_COMMAND_IGNORE_FAILURE) == ignore);
}
static void test_config_parse_exec(void) {

View file

@ -19,9 +19,11 @@ Wants=nss-lookup.target
Type=notify
Restart=always
RestartSec=0
ExecStart=@rootlibexecdir@/systemd-resolved
ExecStart=!!@rootlibexecdir@/systemd-resolved
WatchdogSec=3min
CapabilityBoundingSet=CAP_SETUID CAP_SETGID CAP_SETPCAP CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_NET_RAW CAP_NET_BIND_SERVICE
User=systemd-resolve
CapabilityBoundingSet=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE
PrivateTmp=yes
PrivateDevices=yes
ProtectSystem=strict
@ -34,7 +36,8 @@ RestrictRealtime=yes
RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6
SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @mount @obsolete @raw-io @reboot @swap
SystemCallArchitectures=native
ReadWritePaths=/run/systemd
RuntimeDirectory=systemd/resolve
RuntimeDirectoryPreserve=yes
[Install]
WantedBy=multi-user.target