Merge pull request #8077 from sourcejedi/seccomp_cosmetic

seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060)
This commit is contained in:
Lennart Poettering 2018-02-05 13:52:23 +01:00 committed by GitHub
commit cb51f86af8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 29 additions and 9 deletions

View File

@ -1429,17 +1429,19 @@ CapabilityBoundingSet=~CAP_B CAP_C</programlisting>
filter. The known architecture identifiers are the same as for <varname>ConditionArchitecture=</varname>
described in <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
as well as <constant>x32</constant>, <constant>mips64-n32</constant>, <constant>mips64-le-n32</constant>, and
the special identifier <constant>native</constant>. If this setting is used, processes of this unit will only
be permitted to call native system calls, and system calls of the specified architectures. This is an
effective way to disable compatibility with non-native architectures for processes, for example to prohibit
execution of 32-bit x86 binaries on 64-bit x86-64 systems. The special <constant>native</constant> identifier
the special identifier <constant>native</constant>. The special identifier <constant>native</constant>
implicitly maps to the native architecture of the system (or more precisely: to the architecture the system
manager is compiled for). If running in user mode, or in system mode, but without the
<constant>CAP_SYS_ADMIN</constant> capability (e.g. setting <varname>User=nobody</varname>),
<varname>NoNewPrivileges=yes</varname> is implied. By default, this option is set to the empty list, i.e. no
system call architecture filtering is applied.</para>
<para>Note that system call filtering is not equally effective on all architectures. For example, on x86
<para>If this setting is used, processes of this unit will only be permitted to call native system calls, and
system calls of the specified architectures. For the purposes of this option, the x32 architecture is treated
as including x86-64 system calls. However, this setting still fulfills its purpose, as explained below, on
x32.</para>
<para>System call filtering is not equally effective on all architectures. For example, on x86
filtering of network socket-related calls is not possible, due to ABI limitations — a limitation that x86-64
does not have, however. On systems supporting multiple ABIs at the same time — such as x86/x86-64 — it is hence
recommended to limit the set of permitted system call architectures so that secondary ABIs may not be used to

View File

@ -1534,17 +1534,35 @@ int seccomp_restrict_archs(Set *archs) {
int r;
/* This installs a filter with no rules, but that restricts the system call architectures to the specified
* list. */
* list.
*
* There are some qualifications. However the most important use is to stop processes from bypassing
* system call restrictions, in case they used a broader (multiplexing) syscall which is only available
* in a non-native architecture. There are no holes in this use case, at least so far. */
/* Note libseccomp includes our "native" (current) architecture in the filter by default.
* We do not remove it. For example, our callers expect to be able to call execve() afterwards
* to run a program with the restrictions applied. */
seccomp = seccomp_init(SCMP_ACT_ALLOW);
if (!seccomp)
return -ENOMEM;
SET_FOREACH(id, archs, i) {
r = seccomp_arch_add(seccomp, PTR_TO_UINT32(id) - 1);
if (r == -EEXIST)
continue;
if (r < 0)
if (r < 0 && r != -EEXIST)
return r;
}
/* The vdso for x32 assumes that x86-64 syscalls are available. Let's allow them, since x32
* x32 syscalls should basically match x86-64 for everything except the pointer type.
* The important thing is that you can block the old 32-bit x86 syscalls.
* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850047 */
if (seccomp_arch_native() == SCMP_ARCH_X32 ||
set_contains(archs, UINT32_TO_PTR(SCMP_ARCH_X32 + 1))) {
r = seccomp_arch_add(seccomp, SCMP_ARCH_X86_64);
if (r < 0 && r != -EEXIST)
return r;
}