Commit Graph

162 Commits

Author SHA1 Message Date
Greg Depoire--Ferrer 6597686865 seccomp: don't install filters for archs that can't use syscalls
When seccomp_restrict_archs is called, architectures that are blocked
are replaced by the SECCOMP_LOCAL_ARCH_BLOCKED marker so that they are
not disabled again and filters are not installed for them.

This can make some service that use SystemCallArchitecture= and
SystemCallFilter= start faster.
2020-12-10 16:13:02 +01:00
Zbigniew Jędrzejewski-Szmek d5923e38bc shared/seccomp-util: address family filtering is broken on ppc
This reverts the gist of da1921a5c3 and
0d9fca76bb (for ppc).

Quoting #17559:
> libseccomp 2.5 added socket syscall multiplexing on ppc64(el):
> https://github.com/seccomp/libseccomp/pull/229
>
> Like with i386, s390 and s390x this breaks socket argument filtering, so
> RestrictAddressFamilies doesn't work.
>
> This causes the unit test to fail:
> /* test_restrict_address_families */
> Operating on architecture: ppc
> Failed to install socket family rules for architecture ppc, skipping: Operation canceled
> Operating on architecture: ppc64
> Failed to add socket() rule for architecture ppc64, skipping: Invalid argument
> Operating on architecture: ppc64-le
> Failed to add socket() rule for architecture ppc64-le, skipping: Invalid argument
> Assertion 'fd < 0' failed at src/test/test-seccomp.c:424, function test_restrict_address_families(). Aborting.
>
> The socket filters can't be added so `socket(AF_UNIX, SOCK_DGRAM, 0);` still
> works, triggering the assertion.

Fixes #17559.
2020-11-26 14:23:15 +01:00
Yu Watanabe 11b9105dfd seccomp: also move munmap into @default syscall filter set
Follow-up for 5abede3247.
2020-11-24 16:18:34 +01:00
Lennart Poettering 5abede3247 seccomp: move brk+mmap+mmap2 into @default syscall filter set
These three syscalls are internally used by libc's memory allocation
logic, i.e. ultimately back malloc(). Allocating a bit of memory is so
basic, it should just be in the default set.

This fixes a couple of issues with asan/msan and the seccomp tests: when
asan/msan is used some additional, large memory allocations take place
in the background, and unless mmap/mmap2/brk are allowlisted these will
fail, aborting the test prematurely.
2020-11-19 16:44:50 +01:00
Yu Watanabe db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Lennart Poettering ce8f6d478e seccomp: allow turning off of seccomp filtering via env var
Fixes: #17504

(While we are it, also move $SYSTEMD_SECCOMP_LOG= env var description
into the right document section)

Also suggested in: https://github.com/systemd/systemd/issues/17245#issuecomment-704773603
2020-11-05 20:22:19 +01:00
Topi Miettinen ae5e9bf46f shared/seccomp-util: move stime() to @obsolete
Quoting the manual page of stime(2): "Starting with glibc 2.31, this function
is no longer available to newly linked applications and is no longer declared
in <time.h>."
2020-11-04 09:48:33 +01:00
Lennart Poettering 6ea0d25c57 seccomp: allowlist close_range() by default in @basic-io 2020-10-14 10:40:06 +02:00
Frantisek Sumsal d7a0f1f4f9 tree-wide: assorted coccinelle fixes 2020-10-09 15:02:23 +02:00
Samanta Navarro 7b121df640 seccomp-util: fix typo in help message 2020-10-03 11:56:40 +00:00
Lennart Poettering 8e24b1d23f seccomp-util: add cacheflush() syscall to @default syscall set
This is like membarrier() I guess and basically just exposes CPU
functionality via kernel syscall on some archs. Let's whitelist it for
everyone.

Fixes: #17197
2020-09-30 10:08:15 +02:00
Topi Miettinen 9df2cdd8ec exec: SystemCallLog= directive
With new directive SystemCallLog= it's possible to list system calls to be
logged. This can be used for auditing or temporarily when constructing system
call filters.

---
v5: drop intermediary, update HASHMAP_FOREACH_KEY() use
v4: skip useless debug messages, actually parse directive
v3: don't declare unused variables with old libseccomp
v2: fix build without seccomp or old libseccomp
2020-09-15 12:54:17 +03:00
Topi Miettinen 005bfaf118 exec: Add kill action to system call filters
Define explicit action "kill" for SystemCallErrorNumber=.

In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.

---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
 init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
2020-09-15 12:54:17 +03:00
Zbigniew Jędrzejewski-Szmek 90e74a66e6 tree-wide: define iterator inside of the macro 2020-09-08 12:14:05 +02:00
fangxiuning 6d95e7d9b2
tree-wide: drop pointless zero initialization (#16900) 2020-08-30 06:21:20 +09:00
Zbigniew Jędrzejewski-Szmek 9f56c88aeb
Merge pull request #16819 from keszybz/seccomp-enosys
Return ENOSYS in nspawn for "unknown" syscalls
2020-08-25 09:18:46 +02:00
Zbigniew Jędrzejewski-Szmek 000c05207d shared/seccomp-util: added functionality to make list of filtred syscalls
While at it, start removing the "seccomp_" prefix from our
own functions. It is used by libseccomp.
2020-08-24 20:05:09 +02:00
Zbigniew Jędrzejewski-Szmek 077e8fc0ca shared/seccomp: reduce scope of indexing variables 2020-08-24 20:04:54 +02:00
Zbigniew Jędrzejewski-Szmek 95aac01259 shared: add @known syscall list 2020-08-24 20:04:17 +02:00
Steve Dodd 44aaddad06 Request seccomp logging if SYSTEMD_LOG_SECCOMP environment variable is set. 2020-08-21 11:24:53 +02:00
Aurelien Jarno f9252236c8 seccomp: add support for riscv64
This patch adds seccomp support to the riscv64 architecture. seccomp
support is available in the riscv64 kernel since version 5.5, and it
has just been added to the libseccomp library.

riscv64 uses generic syscalls like aarch64, so I used that architecture
as a reference to find which code has to be modified.

With this patch, the testsuite passes successfully, including the
test-seccomp test. The system boots and works fine with kernel 5.4 (i.e.
without seccomp support) and kernel 5.5 (i.e. with seccomp support). I
have also verified that the "SystemCallFilter=~socket" option prevents a
service to use the ping utility when running on kernel 5.5.
2020-08-21 10:10:29 +02:00
Zbigniew Jędrzejewski-Szmek b4eaa6cc99 shared/seccomp: use _cleanup_ in one more place
(cherry picked from commit 27605d6a836d85563faf41db9f7a72883d44c0ff)
2020-08-19 10:57:30 +02:00
Zbigniew Jędrzejewski-Szmek 6da432fd54 shared/seccomp: do not use ifdef guards around textual syscall names
It is possible that we will be running with an upgraded libseccomp, in which
case libseccomp might know the syscall name, even if the number is not known at
the time when systemd is being compiled. The guard only serves to break such
upgrades, by requiring that we also recompile systemd.

For s390-specific syscalls, use a define to exclude them, so that that we don't
try to filter them on other arches.

(cherry picked from commit 6cf852e79eb0eced2f77653941f9c75c3bd79386)
2020-08-19 10:57:18 +02:00
Michael Scherer bcf08acbff Newer Glibc use faccessat2 to implement faccessat
cf https://repo.or.cz/glibc.git/commit/3d3ab573a5f3071992cbc4f57d50d1d29d55bde2

This cause breakage on Fedora Rawhide: https://bugzilla.redhat.com/show_bug.cgi?id=1869030
2020-08-16 15:10:13 +02:00
Lennart Poettering 6b000af4f2 tree-wide: avoid some loaded terms
https://tools.ietf.org/html/draft-knodel-terminology-02
https://lwn.net/Articles/823224/

This gets rid of most but not occasions of these loaded terms:

1. scsi_id and friends are something that is supposed to be removed from
   our tree (see #7594)

2. The test suite defines an API used by the ubuntu CI. We can remove
   this too later, but this needs to be done in sync with the ubuntu CI.

3. In some cases the terms are part of APIs we call or where we expose
   concepts the kernel names the way it names them. (In particular all
   remaining uses of the word "slave" in our codebase are like this,
   it's used by the POSIX PTY layer, by the network subsystem, the mount
   API and the block device subsystem). Getting rid of the term in these
   contexts would mean doing some major fixes of the kernel ABI first.

Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
2020-06-25 09:00:19 +02:00
Zbigniew Jędrzejewski-Szmek de7fef4b6e tree-wide: use set_ensure_put()
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.

Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
2020-06-22 16:32:37 +02:00
Lennart Poettering ecc04067f9 seccomp: filter openat2() entirely in seccomp_restrict_sxid() 2020-06-03 18:26:34 +02:00
Benjamin Robin b9c54c4665 tree-wide: Initialize _cleanup_ variables if needed 2020-05-13 22:56:42 +02:00
Lennart Poettering 8270e3d8ed seccomp-util: add new syscalls from kernel 5.6 to syscall filter table 2020-05-11 06:24:02 +00:00
Zbigniew Jędrzejewski-Szmek b069c2a3f2 shared/seccomp: avoid possibly writing bogus errno code in debug log
CID 1409488.

This code was added in 903659e7b2. The change
that is done here is a simple fix to avoid use of a
unitialized/wrongly-initialized variable, but the bigger issue is that nothing
looks at the returned result to distinguish between 0 and a positive return
value.
2019-12-06 15:12:40 +01:00
Christian Ehrhardt 5ef3ed97e3
seccomp: use per arch shmat_syscall
At the beginning of seccomp_memory_deny_write_execute architectures
can set individual filter_syscall, block_syscall, shmat_syscall values.
The former two are then used in the call to add_seccomp_syscall_filter
but shmat_syscall is not.

Right now all shmat_syscall values are the same, so the change is a
no-op, but if ever an architecture is added/modified this would be a
subtle source for a mistake so fix it by using shmat_syscall later.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
2019-12-05 07:19:12 +01:00
Christian Ehrhardt 903659e7b2
seccomp: ensure rules are loaded in seccomp_memory_deny_write_execute
If seccomp_memory_deny_write_execute was fatally failing to load rules it
already returned a bad retval.
But if any adding filters failed it skipped the subsequent seccomp_load and
always returned an rc of 0 even if no rule was loaded at all.

Lets fix this requiring to (non fatally-failing) load at least one rule set.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
2019-12-05 07:19:12 +01:00
Christian Ehrhardt bed4668d1d
seccomp: fix multiplexed system calls
Since libseccomp 2.4.2 more architectures have shmat handled as multiplexed
call. Those will fail to be added due to seccomp_rule_add_exact failing
on them since they'd need to add multiple rules [1].
See the discussion at https://github.com/seccomp/libseccomp/issues/193

After discussions about the options rejected [2][3] the initial thought of
a fallback to the non '_exact' version of the seccomp rule adding the next
option is to handle those now affected (i386, s390, s390x) the same way as
ppc which ignores and does not block shmat.

[1]: https://github.com/seccomp/libseccomp/issues/193
[2]: https://github.com/systemd/systemd/pull/14167#issuecomment-559136906
[3]: https://github.com/systemd/systemd/commit/469830d1
2019-12-05 07:19:07 +01:00
Kevin Kuehler 620dbdd248 shared: Add ProtectKernelLogs property
Add seccomp_protect_syslog, which adds a filter rule for the syslog
system call.
2019-11-11 12:11:56 -08:00
Zbigniew Jędrzejewski-Szmek 9493b16871 Add @pkey syscall group
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1769299.
This change doesn't solve the issue, but makes it easier to whitelist the
syscall group.
2019-11-08 14:41:22 +01:00
Zbigniew Jędrzejewski-Szmek 6ca6771069 seccomp: add all *time64 syscalls
From https://bugzilla.redhat.com/show_bug.cgi?id=1770154:
> utime is an obsolete system call. The current kernel interface is
> utimensat_time64. New 32-bit architectures do not even provide the utime
> system call.

Also add all other *time64 syscalls listed in
https://fedora.juszkiewicz.com.pl/syscalls.html.
2019-11-08 14:40:49 +01:00
Lennart Poettering 9e48626571 seccomp: add new Linux 5.3 syscalls to syscall filter lists
Many syscalls added and all fit nicely into existing groups, hence lets
add them there.
2019-10-30 15:42:49 +01:00
Zbigniew Jędrzejewski-Szmek a8fb09f573 shared/seccomp: add sync_file_range2
Some architectures need the arguments to be reordered because of alignment
issues. Otherwise, it's the same as sync_file_range.
2019-08-19 11:10:40 +02:00
Dan Streetman 57311925aa src/shared/seccomp-util.c: Add mmap definitions for s390 2019-08-13 15:40:36 -04:00
Lennart Poettering 46fcf95dbe seccomp: add new 5.1 syscall pidfd_send_signal() to filter set list 2019-05-28 17:01:05 +02:00
Lennart Poettering 915fb32438 seccomp: add scmp_act_kill_process() helper that returns SCMP_ACT_KILL_PROCESS if supported 2019-05-24 10:48:28 +02:00
Anita Zhang 7bc5e0b12b seccomp: check more error codes from seccomp_load()
We noticed in our tests that occasionally SystemCallFilter= would
fail to set and the service would run with no syscall filtering.
Most of the time the same tests would apply the filter and fail
the service as expected. While it's not totally clear why this happens,
we noticed seccomp_load() in the systemd code base would fail open for
all errors except EPERM and EACCES.

ENOMEM, EINVAL, and EFAULT seem like reasonable values to add to the
error set based on what I gather from libseccomp code and man pages:

-ENOMEM: out of memory, failed to allocate space for a libseccomp structure, or would exceed a defined constant
-EINVAL: kernel isn't configured to support the operations, args are invalid (to seccomp_load(), seccomp(), or prctl())
-EFAULT: addresses passed as args are invalid
2019-04-12 10:23:07 +02:00
Zbigniew Jędrzejewski-Szmek b3e8032bb4
Merge pull request #12198 from keszybz/seccomp-parsing-logging
Seccomp parsing logging cleanup
2019-04-03 17:19:14 +02:00
Zbigniew Jędrzejewski-Szmek da4dc9a674 seccomp: rework how the S[UG]ID filter is installed
If we know that a syscall is undefined on the given architecture, don't
even try to add it.

Try to install the filter even if some syscalls fail. Also use a helper
function to make the whole a bit less magic.

This allows the S[UG]ID test to pass on arm64.
2019-04-03 13:33:06 +02:00
Zbigniew Jędrzejewski-Szmek 58f6ab4454 pid1: pass unit name to seccomp parser when we have no file location
Building on previous commit, let's pass the unit name when parsing
dbus message or builtin whitelist, which is better than nothing.

seccomp_parse_syscall_filter() is not needed anymore, so it is removed,
and seccomp_parse_syscall_filter_full() is renamed to take its place.
2019-04-03 09:17:42 +02:00
Lennart Poettering 3c27973b13 seccomp: introduce seccomp_restrict_suid_sgid() for blocking chmod() for suid/sgid files 2019-04-02 16:56:48 +02:00
Lennart Poettering 9e6e543c17 seccomp: add debug messages to seccomp_protect_hostname() 2019-04-02 16:56:48 +02:00
Lennart Poettering 6fee3be0b4 seccomp: add rseq() to default list of syscalls to whitelist
Apparently glibc is going to call this implicitly soon, hence let's
whitelist this by default.

Fixes: #12127
2019-03-28 12:09:38 +01:00
Zbigniew Jędrzejewski-Szmek 67fb5f338f seccomp: allow shmat to be a separate syscall on architectures which use a multiplexer
After
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817,
those syscalls have their separate numbers and we can block them.
But glibc might still use the old ones. So let's just do a best-effort
block and not assume anything about how effective it is.
2019-03-15 15:46:41 +01:00
Zbigniew Jędrzejewski-Szmek e55bdf9b6c seccomp: shm{get,at,dt} now have their own numbers everywhere
E.g. on i686:

(previously)
arch x86: SCMP_SYS(mmap) = 90
arch x86: SCMP_SYS(mmap2) = 192
arch x86: SCMP_SYS(shmat) = -221
arch x86: SCMP_SYS(shmat) = -221
arch x86: SCMP_SYS(shmdt) = -222

(now)
arch x86: SCMP_SYS(mmap) = 90
arch x86: SCMP_SYS(mmap2) = 192
arch x86: SCMP_SYS(shmat) = 397
arch x86: SCMP_SYS(shmat) = 397
arch x86: SCMP_SYS(shmdt) = 398

The relevant commit seems to be
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817.
2019-03-15 15:28:43 +01:00