man: beef up systemd.exec(5)

Prompted by:

https://lists.freedesktop.org/archives/systemd-devel/2019-May/042773.html
This commit is contained in:
Lennart Poettering 2019-05-28 16:50:10 +02:00 committed by Zbigniew Jędrzejewski-Szmek
parent b070c7c0e1
commit 330703fb22

View file

@ -1540,24 +1540,29 @@ RestrictNamespaces=~cgroup net</programlisting>
<varlistentry> <varlistentry>
<term><varname>SystemCallFilter=</varname></term> <term><varname>SystemCallFilter=</varname></term>
<listitem><para>Takes a space-separated list of system call names. If this setting is used, all system calls <listitem><para>Takes a space-separated list of system call names. If this setting is used, all
executed by the unit processes except for the listed ones will result in immediate process termination with the system calls executed by the unit processes except for the listed ones will result in immediate
<constant>SIGSYS</constant> signal (whitelisting). If the first character of the list is <literal>~</literal>, process termination with the <constant>SIGSYS</constant> signal (whitelisting). (See
the effect is inverted: only the listed system calls will result in immediate process termination <varname>SystemCallErrorNumber=</varname> below for changing the default action). If the first
(blacklisting). Blacklisted system calls and system call groups may optionally be suffixed with a colon character of the list is <literal>~</literal>, the effect is inverted: only the listed system calls
(<literal>:</literal>) and <literal>errno</literal> error number (between 0 and 4095) or errno name such as will result in immediate process termination (blacklisting). Blacklisted system calls and system call
<constant>EPERM</constant>, <constant>EACCES</constant> or <constant>EUCLEAN</constant>. This value will be groups may optionally be suffixed with a colon (<literal>:</literal>) and <literal>errno</literal>
returned when a blacklisted system call is triggered, instead of terminating the processes immediately. This error number (between 0 and 4095) or errno name such as <constant>EPERM</constant>,
value takes precedence over the one given in <varname>SystemCallErrorNumber=</varname>. If running in user <constant>EACCES</constant> or <constant>EUCLEAN</constant> (see <citerefentry
mode, or in system mode, but without the <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting project='man-pages'><refentrytitle>errno</refentrytitle><manvolnum>3</manvolnum></citerefentry> for a
<varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is implied. This feature makes use of full list). This value will be returned when a blacklisted system call is triggered, instead of
the Secure Computing Mode 2 interfaces of the kernel ('seccomp filtering') and is useful for enforcing a terminating the processes immediately. This value takes precedence over the one given in
minimal sandboxing environment. Note that the <function>execve</function>, <function>exit</function>, <varname>SystemCallErrorNumber=</varname>, see below. If running in user mode, or in system mode,
<function>exit_group</function>, <function>getrlimit</function>, <function>rt_sigreturn</function>, but without the <constant>CAP_SYS_ADMIN</constant> capability (e.g. setting
<function>sigreturn</function> system calls and the system calls for querying time and sleeping are implicitly <varname>User=nobody</varname>), <varname>NoNewPrivileges=yes</varname> is implied. This feature
whitelisted and do not need to be listed explicitly. This option may be specified more than once, in which case makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp filtering') and is useful
the filter masks are merged. If the empty string is assigned, the filter is reset, all prior assignments will for enforcing a minimal sandboxing environment. Note that the <function>execve</function>,
have no effect. This does not affect commands prefixed with <literal>+</literal>.</para> <function>exit</function>, <function>exit_group</function>, <function>getrlimit</function>,
<function>rt_sigreturn</function>, <function>sigreturn</function> system calls and the system calls
for querying time and sleeping are implicitly whitelisted and do not need to be listed
explicitly. This option may be specified more than once, in which case the filter masks are
merged. If the empty string is assigned, the filter is reset, all prior assignments will have no
effect. This does not affect commands prefixed with <literal>+</literal>.</para>
<para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off <para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off
alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
@ -1717,6 +1722,22 @@ RestrictNamespaces=~cgroup net</programlisting>
SystemCallFilter=@system-service SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM</programlisting> SystemCallErrorNumber=EPERM</programlisting>
<para>Note that various kernel system calls are defined redundantly: there are multiple system calls
for executing the same operation. For example, the <function>pidfd_send_signal()</function> system
call may be used to execute operations similar to what can be done with the older
<function>kill()</function> system call, hence blocking the latter without the former only provides
weak protection. Since new system calls are added regularly to the kernel as development progresses,
keeping system call blacklists comprehensive requires constant work. It is thus recommended to use
whitelisting instead, which offers the benefit that new system calls are by default implicitly
blocked until the whitelist is updated.</para>
<para>Also note that a number of system calls are required to be accessible for the dynamic linker to
work. The dynamic linker is required for running most regular programs (specifically: all dynamic ELF
binaries, which is how most distributions build packaged programs). This means that blocking these
system calls (which include <function>open()</function>, <function>openat()</function> or
<function>mmap()</function>) will make most programs typically shipped with generic distributions
unusable.</para>
<para>It is recommended to combine the file system namespacing related options with <para>It is recommended to combine the file system namespacing related options with
<varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the <varname>SystemCallFilter=~@mount</varname>, in order to prohibit the unit's processes to undo the
mappings. Specifically these are the options <varname>PrivateTmp=</varname>, mappings. Specifically these are the options <varname>PrivateTmp=</varname>,
@ -1729,11 +1750,13 @@ SystemCallErrorNumber=EPERM</programlisting>
<varlistentry> <varlistentry>
<term><varname>SystemCallErrorNumber=</varname></term> <term><varname>SystemCallErrorNumber=</varname></term>
<listitem><para>Takes an <literal>errno</literal> error number (between 1 and 4095) or errno name such as <listitem><para>Takes an <literal>errno</literal> error number (between 1 and 4095) or errno name
<constant>EPERM</constant>, <constant>EACCES</constant> or <constant>EUCLEAN</constant>, to return when the such as <constant>EPERM</constant>, <constant>EACCES</constant> or <constant>EUCLEAN</constant>, to
system call filter configured with <varname>SystemCallFilter=</varname> is triggered, instead of terminating return when the system call filter configured with <varname>SystemCallFilter=</varname> is triggered,
the process immediately. When this setting is not used, or when the empty string is assigned, the process will instead of terminating the process immediately. See <citerefentry
be terminated immediately when the filter is triggered.</para></listitem> project='man-pages'><refentrytitle>errno</refentrytitle><manvolnum>3</manvolnum></citerefentry> for a
full list of error codes. When this setting is not used, or when the empty string is assigned, the
process will be terminated immediately when the filter is triggered.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>