seccomp: add clock query and sleeping syscalls to "@default" group

Timing and sleep are so basic operations, it makes very little sense to ever
block them, hence don't.
This commit is contained in:
Lennart Poettering 2016-10-25 15:38:36 +02:00
parent 67234d218b
commit c79aff9a82
2 changed files with 22 additions and 25 deletions

View file

@ -1255,30 +1255,20 @@
<varlistentry>
<term><varname>SystemCallFilter=</varname></term>
<listitem><para>Takes a space-separated list of system call
names. If this setting is used, all system calls executed by
the unit processes except for the listed ones will result in
immediate process termination with the
<constant>SIGSYS</constant> signal (whitelisting). If the
first character of the list is <literal>~</literal>, the
effect is inverted: only the listed system calls will result
in immediate process termination (blacklisting). If running in
user mode, or in system mode, but without the
<constant>CAP_SYS_ADMIN</constant> capability (e.g. setting
<varname>User=nobody</varname>),
<varname>NoNewPrivileges=yes</varname> is implied. This
feature makes use of the Secure Computing Mode 2 interfaces of
the kernel ('seccomp filtering') and is useful for enforcing a
minimal sandboxing environment. Note that the
<function>execve</function>,
<function>rt_sigreturn</function>,
<function>sigreturn</function>,
<function>exit_group</function>, <function>exit</function>
system calls are implicitly whitelisted and do not need to be
listed explicitly. This option may be specified more than once,
in which case the filter masks are merged. If the empty string
is assigned, the filter is reset, all prior assignments will
have no effect. This does not affect commands prefixed with <literal>+</literal>.</para>
<listitem><para>Takes a space-separated list of system call names. If this setting is used, all system calls
executed by the unit processes except for the listed ones will result in immediate process termination with the
<constant>SIGSYS</constant> signal (whitelisting). If the first character of the list is <literal>~</literal>,
the effect is inverted: only the listed system calls will result in immediate process termination
(blacklisting). If running in user mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant>
capability (e.g. setting <varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is
implied. This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp filtering')
and is useful for enforcing a minimal sandboxing environment. Note that the <function>execve</function>,
<function>exit</function>, <function>exit_group</function>, <function>getrlimit</function>,
<function>rt_sigreturn</function>, <function>sigreturn</function> system calls and the system calls for
querying time and sleeping are implicitly whitelisted and do not need to be listed explicitly. This option may
be specified more than once, in which case the filter masks are merged. If the empty string is assigned, the
filter is reset, all prior assignments will have no effect. This does not affect commands prefixed with
<literal>+</literal>.</para>
<para>If you specify both types of this option (i.e.
whitelisting and blacklisting), the first encountered will

View file

@ -253,15 +253,22 @@ const SyscallFilterSet syscall_filter_sets[_SYSCALL_FILTER_SET_MAX] = {
"sys_debug_setcontext\0"
},
[SYSCALL_FILTER_SET_DEFAULT] = {
/* Default list */
/* Default list: the most basic of operations */
.name = "@default",
.value =
"clock_getres\0"
"clock_gettime\0"
"clock_nanosleep\0"
"execve\0"
"exit\0"
"exit_group\0"
"getrlimit\0" /* make sure processes can query stack size and such */
"gettimeofday\0"
"nanosleep\0"
"pause\0"
"rt_sigreturn\0"
"sigreturn\0"
"time\0"
},
[SYSCALL_FILTER_SET_IO_EVENT] = {
/* Event loop use */