2020-11-09 05:23:58 +01:00
|
|
|
/* SPDX-License-Identifier: LGPL-2.1-or-later */
|
2016-05-26 22:42:29 +02:00
|
|
|
|
|
|
|
#include <errno.h>
|
|
|
|
#include <linux/netlink.h>
|
|
|
|
#include <sys/capability.h>
|
2019-03-27 11:32:41 +01:00
|
|
|
#include <sys/socket.h>
|
2016-05-26 22:42:29 +02:00
|
|
|
#include <sys/types.h>
|
|
|
|
|
2017-10-03 10:41:51 +02:00
|
|
|
#if HAVE_SECCOMP
|
2016-05-26 22:42:29 +02:00
|
|
|
#include <seccomp.h>
|
|
|
|
#endif
|
|
|
|
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
#include "alloc-util.h"
|
2016-05-26 22:42:29 +02:00
|
|
|
#include "log.h"
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
#include "nspawn-seccomp.h"
|
2017-10-03 10:41:51 +02:00
|
|
|
#if HAVE_SECCOMP
|
2016-05-26 22:42:29 +02:00
|
|
|
#include "seccomp-util.h"
|
|
|
|
#endif
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
#include "string-util.h"
|
2017-09-11 17:45:21 +02:00
|
|
|
#include "strv.h"
|
2016-05-26 22:42:29 +02:00
|
|
|
|
2017-10-03 10:41:51 +02:00
|
|
|
#if HAVE_SECCOMP
|
2016-05-26 22:42:29 +02:00
|
|
|
|
2020-08-21 17:23:48 +02:00
|
|
|
static int add_syscall_filters(
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
scmp_filter_ctx ctx,
|
|
|
|
uint32_t arch,
|
2017-09-11 17:45:21 +02:00
|
|
|
uint64_t cap_list_retain,
|
2020-06-23 08:31:16 +02:00
|
|
|
char **syscall_allow_list,
|
|
|
|
char **syscall_deny_list) {
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
|
2016-05-26 22:42:29 +02:00
|
|
|
static const struct {
|
|
|
|
uint64_t capability;
|
2017-09-10 19:16:09 +02:00
|
|
|
const char* name;
|
2020-06-23 08:31:16 +02:00
|
|
|
} allow_list[] = {
|
2017-09-14 10:18:57 +02:00
|
|
|
/* Let's use set names where we can */
|
2017-09-30 14:34:50 +02:00
|
|
|
{ 0, "@aio" },
|
2017-09-14 10:18:57 +02:00
|
|
|
{ 0, "@basic-io" },
|
2017-09-30 14:34:50 +02:00
|
|
|
{ 0, "@chown" },
|
2017-09-14 10:18:57 +02:00
|
|
|
{ 0, "@default" },
|
|
|
|
{ 0, "@file-system" },
|
|
|
|
{ 0, "@io-event" },
|
|
|
|
{ 0, "@ipc" },
|
|
|
|
{ 0, "@mount" },
|
|
|
|
{ 0, "@network-io" },
|
|
|
|
{ 0, "@process" },
|
|
|
|
{ 0, "@resources" },
|
|
|
|
{ 0, "@setuid" },
|
|
|
|
{ 0, "@signal" },
|
2017-09-30 14:34:50 +02:00
|
|
|
{ 0, "@sync" },
|
2017-09-14 10:18:57 +02:00
|
|
|
{ 0, "@timer" },
|
|
|
|
|
|
|
|
/* The following four are sets we optionally enable, in case the caps have been configured for it */
|
|
|
|
{ CAP_SYS_TIME, "@clock" },
|
|
|
|
{ CAP_SYS_MODULE, "@module" },
|
|
|
|
{ CAP_SYS_RAWIO, "@raw-io" },
|
|
|
|
{ CAP_IPC_LOCK, "@memlock" },
|
|
|
|
|
|
|
|
/* Plus a good set of additional syscalls which are not part of any of the groups above */
|
|
|
|
{ 0, "brk" },
|
2017-10-03 07:20:05 +02:00
|
|
|
{ 0, "capget" },
|
2017-09-14 10:18:57 +02:00
|
|
|
{ 0, "capset" },
|
|
|
|
{ 0, "copy_file_range" },
|
|
|
|
{ 0, "fadvise64" },
|
|
|
|
{ 0, "fadvise64_64" },
|
|
|
|
{ 0, "flock" },
|
|
|
|
{ 0, "get_mempolicy" },
|
|
|
|
{ 0, "getcpu" },
|
|
|
|
{ 0, "getpriority" },
|
|
|
|
{ 0, "getrandom" },
|
|
|
|
{ 0, "ioctl" },
|
|
|
|
{ 0, "ioprio_get" },
|
|
|
|
{ 0, "kcmp" },
|
|
|
|
{ 0, "madvise" },
|
|
|
|
{ 0, "mincore" },
|
|
|
|
{ 0, "mprotect" },
|
|
|
|
{ 0, "mremap" },
|
|
|
|
{ 0, "name_to_handle_at" },
|
|
|
|
{ 0, "oldolduname" },
|
|
|
|
{ 0, "olduname" },
|
|
|
|
{ 0, "personality" },
|
|
|
|
{ 0, "readahead" },
|
|
|
|
{ 0, "readdir" },
|
|
|
|
{ 0, "remap_file_pages" },
|
|
|
|
{ 0, "sched_get_priority_max" },
|
|
|
|
{ 0, "sched_get_priority_min" },
|
|
|
|
{ 0, "sched_getaffinity" },
|
|
|
|
{ 0, "sched_getattr" },
|
|
|
|
{ 0, "sched_getparam" },
|
|
|
|
{ 0, "sched_getscheduler" },
|
|
|
|
{ 0, "sched_rr_get_interval" },
|
|
|
|
{ 0, "sched_yield" },
|
|
|
|
{ 0, "seccomp" },
|
|
|
|
{ 0, "sendfile" },
|
|
|
|
{ 0, "sendfile64" },
|
|
|
|
{ 0, "setdomainname" },
|
|
|
|
{ 0, "setfsgid" },
|
|
|
|
{ 0, "setfsgid32" },
|
|
|
|
{ 0, "setfsuid" },
|
|
|
|
{ 0, "setfsuid32" },
|
|
|
|
{ 0, "sethostname" },
|
|
|
|
{ 0, "setpgid" },
|
|
|
|
{ 0, "setsid" },
|
|
|
|
{ 0, "splice" },
|
|
|
|
{ 0, "sysinfo" },
|
|
|
|
{ 0, "tee" },
|
|
|
|
{ 0, "umask" },
|
|
|
|
{ 0, "uname" },
|
|
|
|
{ 0, "userfaultfd" },
|
|
|
|
{ 0, "vmsplice" },
|
|
|
|
|
|
|
|
/* The following individual syscalls are added depending on specified caps */
|
|
|
|
{ CAP_SYS_PACCT, "acct" },
|
|
|
|
{ CAP_SYS_PTRACE, "process_vm_readv" },
|
|
|
|
{ CAP_SYS_PTRACE, "process_vm_writev" },
|
|
|
|
{ CAP_SYS_PTRACE, "ptrace" },
|
|
|
|
{ CAP_SYS_BOOT, "reboot" },
|
|
|
|
{ CAP_SYSLOG, "syslog" },
|
|
|
|
{ CAP_SYS_TTY_CONFIG, "vhangup" },
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The following syscalls and groups are knowingly excluded:
|
|
|
|
*
|
|
|
|
* @cpu-emulation
|
|
|
|
* @keyring (NB: keyring is not namespaced!)
|
|
|
|
* @obsolete
|
2019-11-08 12:56:56 +01:00
|
|
|
* @pkey
|
2017-09-14 10:18:57 +02:00
|
|
|
* @swap
|
|
|
|
*
|
2020-12-14 10:35:08 +01:00
|
|
|
* bpf
|
2017-09-14 10:18:57 +02:00
|
|
|
* fanotify_init
|
|
|
|
* fanotify_mark
|
|
|
|
* kexec_file_load
|
|
|
|
* kexec_load
|
|
|
|
* lookup_dcookie
|
|
|
|
* nfsservctl
|
|
|
|
* open_by_handle_at
|
|
|
|
* perf_event_open
|
|
|
|
* quotactl
|
|
|
|
*/
|
2016-05-26 22:42:29 +02:00
|
|
|
};
|
2017-09-10 19:16:09 +02:00
|
|
|
|
2020-08-21 17:23:48 +02:00
|
|
|
_cleanup_strv_free_ char **added = NULL;
|
2017-09-11 17:45:21 +02:00
|
|
|
char **p;
|
2019-11-20 19:02:36 +01:00
|
|
|
int r;
|
2016-05-26 22:42:29 +02:00
|
|
|
|
2020-06-23 08:31:16 +02:00
|
|
|
for (size_t i = 0; i < ELEMENTSOF(allow_list); i++) {
|
|
|
|
if (allow_list[i].capability != 0 && (cap_list_retain & (1ULL << allow_list[i].capability)) == 0)
|
2016-05-26 22:42:29 +02:00
|
|
|
continue;
|
|
|
|
|
2020-08-21 17:21:04 +02:00
|
|
|
r = seccomp_add_syscall_filter_item(ctx,
|
|
|
|
allow_list[i].name,
|
|
|
|
SCMP_ACT_ALLOW,
|
|
|
|
syscall_deny_list,
|
|
|
|
false,
|
2020-08-21 17:23:48 +02:00
|
|
|
&added);
|
2017-09-10 19:16:09 +02:00
|
|
|
if (r < 0)
|
2020-06-23 08:31:16 +02:00
|
|
|
return log_error_errno(r, "Failed to add syscall filter item %s: %m", allow_list[i].name);
|
2016-05-26 22:42:29 +02:00
|
|
|
}
|
|
|
|
|
2020-06-23 08:31:16 +02:00
|
|
|
STRV_FOREACH(p, syscall_allow_list) {
|
2020-08-21 17:23:48 +02:00
|
|
|
r = seccomp_add_syscall_filter_item(ctx, *p, SCMP_ACT_ALLOW, syscall_deny_list, true, &added);
|
2017-09-11 17:45:21 +02:00
|
|
|
if (r < 0)
|
seccomp: tighten checking of seccomp filter creation
In seccomp code, the code is changed to propagate errors which are about
anything other than unknown/unimplemented syscalls. I *think* such errors
should not happen in normal usage, but so far we would summarilly ignore all
errors, so that part is uncertain. If it turns out that other errors occur and
should be ignored, this should be added later.
In nspawn, we would count the number of added filters, but didn't use this for
anything. Drop that part.
The comments suggested that seccomp_add_syscall_filter_item() returned negative
if the syscall is unknown, but this wasn't true: it returns 0.
The error at this point can only be if the syscall was known but couldn't be
added. If the error comes from our internal whitelist in nspawn, treat this as
error, because it means that our internal table is wrong. If the error comes
from user arguments, warn and ignore. (If some syscall is not known at current
architecture, it is still silently ignored.)
2018-09-20 14:19:41 +02:00
|
|
|
log_warning_errno(r, "Failed to add rule for system call %s on %s, ignoring: %m",
|
|
|
|
*p, seccomp_arch_to_string(arch));
|
2017-09-11 17:45:21 +02:00
|
|
|
}
|
|
|
|
|
2020-08-21 17:23:48 +02:00
|
|
|
/* The default action is ENOSYS. Respond with EPERM to all other "known" but not allow-listed
|
|
|
|
* syscalls. */
|
|
|
|
r = seccomp_add_syscall_filter_item(ctx, "@known", SCMP_ACT_ERRNO(EPERM), added, true, NULL);
|
|
|
|
if (r < 0)
|
|
|
|
log_warning_errno(r, "Failed to add rule for @known set on %s, ignoring: %m",
|
|
|
|
seccomp_arch_to_string(arch));
|
|
|
|
|
2020-08-22 13:30:18 +02:00
|
|
|
#if (SCMP_VER_MAJOR == 2 && SCMP_VER_MINOR >= 5) || SCMP_VER_MAJOR > 2
|
|
|
|
/* We have a large filter here, so let's turn on the binary tree mode if possible. */
|
|
|
|
r = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_OPTIMIZE, 2);
|
|
|
|
if (r < 0)
|
|
|
|
return r;
|
|
|
|
#endif
|
|
|
|
|
seccomp: tighten checking of seccomp filter creation
In seccomp code, the code is changed to propagate errors which are about
anything other than unknown/unimplemented syscalls. I *think* such errors
should not happen in normal usage, but so far we would summarilly ignore all
errors, so that part is uncertain. If it turns out that other errors occur and
should be ignored, this should be added later.
In nspawn, we would count the number of added filters, but didn't use this for
anything. Drop that part.
The comments suggested that seccomp_add_syscall_filter_item() returned negative
if the syscall is unknown, but this wasn't true: it returns 0.
The error at this point can only be if the syscall was known but couldn't be
added. If the error comes from our internal whitelist in nspawn, treat this as
error, because it means that our internal table is wrong. If the error comes
from user arguments, warn and ignore. (If some syscall is not known at current
architecture, it is still silently ignored.)
2018-09-20 14:19:41 +02:00
|
|
|
return 0;
|
2016-05-26 22:42:29 +02:00
|
|
|
}
|
|
|
|
|
2020-06-23 08:31:16 +02:00
|
|
|
int setup_seccomp(uint64_t cap_list_retain, char **syscall_allow_list, char **syscall_deny_list) {
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
uint32_t arch;
|
2016-05-26 22:42:29 +02:00
|
|
|
int r;
|
|
|
|
|
2016-09-07 01:25:22 +02:00
|
|
|
if (!is_seccomp_available()) {
|
2020-11-02 14:51:10 +01:00
|
|
|
log_debug("SECCOMP features not detected in the kernel or disabled at runtime, disabling SECCOMP filtering");
|
2016-09-07 01:25:22 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
SECCOMP_FOREACH_LOCAL_ARCH(arch) {
|
|
|
|
_cleanup_(seccomp_releasep) scmp_filter_ctx seccomp = NULL;
|
|
|
|
|
2020-06-23 08:31:16 +02:00
|
|
|
log_debug("Applying allow list on architecture: %s", seccomp_arch_to_string(arch));
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
|
2020-08-21 17:23:48 +02:00
|
|
|
/* We install ENOSYS as the default action, but it will only apply to syscalls which are not
|
|
|
|
* in the @known set, see above. */
|
|
|
|
r = seccomp_init_for_arch(&seccomp, arch, SCMP_ACT_ERRNO(ENOSYS));
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
if (r < 0)
|
|
|
|
return log_error_errno(r, "Failed to allocate seccomp object: %m");
|
|
|
|
|
2020-08-21 17:23:48 +02:00
|
|
|
r = add_syscall_filters(seccomp, arch, cap_list_retain, syscall_allow_list, syscall_deny_list);
|
2017-09-14 10:18:57 +02:00
|
|
|
if (r < 0)
|
|
|
|
return r;
|
|
|
|
|
|
|
|
r = seccomp_load(seccomp);
|
2019-04-11 01:08:41 +02:00
|
|
|
if (ERRNO_IS_SECCOMP_FATAL(r))
|
2017-09-14 10:18:57 +02:00
|
|
|
return log_error_errno(r, "Failed to install seccomp filter: %m");
|
|
|
|
if (r < 0)
|
|
|
|
log_debug_errno(r, "Failed to install filter set for architecture %s, skipping: %m", seccomp_arch_to_string(arch));
|
|
|
|
}
|
|
|
|
|
|
|
|
SECCOMP_FOREACH_LOCAL_ARCH(arch) {
|
|
|
|
_cleanup_(seccomp_releasep) scmp_filter_ctx seccomp = NULL;
|
|
|
|
|
|
|
|
log_debug("Applying NETLINK_AUDIT mask on architecture: %s", seccomp_arch_to_string(arch));
|
|
|
|
|
|
|
|
r = seccomp_init_for_arch(&seccomp, arch, SCMP_ACT_ALLOW);
|
|
|
|
if (r < 0)
|
|
|
|
return log_error_errno(r, "Failed to allocate seccomp object: %m");
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
Audit is broken in containers, much of the userspace audit hookup will fail if running inside a
|
|
|
|
container. We don't care and just turn off creation of audit sockets.
|
|
|
|
|
|
|
|
This will make socket(AF_NETLINK, *, NETLINK_AUDIT) fail with EAFNOSUPPORT which audit userspace uses
|
|
|
|
as indication that audit is disabled in the kernel.
|
|
|
|
*/
|
|
|
|
|
|
|
|
r = seccomp_rule_add_exact(
|
|
|
|
seccomp,
|
|
|
|
SCMP_ACT_ERRNO(EAFNOSUPPORT),
|
|
|
|
SCMP_SYS(socket),
|
|
|
|
2,
|
|
|
|
SCMP_A0(SCMP_CMP_EQ, AF_NETLINK),
|
|
|
|
SCMP_A2(SCMP_CMP_EQ, NETLINK_AUDIT));
|
2017-09-14 10:18:57 +02:00
|
|
|
if (r < 0) {
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
log_debug_errno(r, "Failed to add audit seccomp rule, ignoring: %m");
|
|
|
|
continue;
|
2017-09-14 10:18:57 +02:00
|
|
|
}
|
2016-05-26 22:42:29 +02:00
|
|
|
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
r = seccomp_load(seccomp);
|
2019-04-11 01:08:41 +02:00
|
|
|
if (ERRNO_IS_SECCOMP_FATAL(r))
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
return log_error_errno(r, "Failed to install seccomp audit filter: %m");
|
|
|
|
if (r < 0)
|
|
|
|
log_debug_errno(r, "Failed to install filter set for architecture %s, skipping: %m", seccomp_arch_to_string(arch));
|
2016-05-26 22:42:29 +02:00
|
|
|
}
|
|
|
|
|
seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
2016-12-27 15:28:25 +01:00
|
|
|
return 0;
|
2016-05-26 22:42:29 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
2020-06-23 08:31:16 +02:00
|
|
|
int setup_seccomp(uint64_t cap_list_retain, char **syscall_allow_list, char **syscall_deny_list) {
|
2016-05-26 22:42:29 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|