Systemd

Author	SHA1	Message	Date
Zbigniew Jędrzejewski-Szmek	0af05e485a	test-seccomp: accept ENOSYS from sysctl(2) too It seems that kernel 5.9 started returning that.	2020-09-24 17:02:20 +02:00
Zbigniew Jędrzejewski-Szmek	9f56c88aeb	Merge pull request #16819 from keszybz/seccomp-enosys Return ENOSYS in nspawn for "unknown" syscalls	2020-08-25 09:18:46 +02:00
Zbigniew Jędrzejewski-Szmek	95aac01259	shared: add @known syscall list	2020-08-24 20:04:17 +02:00
Aurelien Jarno	f9252236c8	seccomp: add support for riscv64 This patch adds seccomp support to the riscv64 architecture. seccomp support is available in the riscv64 kernel since version 5.5, and it has just been added to the libseccomp library. riscv64 uses generic syscalls like aarch64, so I used that architecture as a reference to find which code has to be modified. With this patch, the testsuite passes successfully, including the test-seccomp test. The system boots and works fine with kernel 5.4 (i.e. without seccomp support) and kernel 5.5 (i.e. with seccomp support). I have also verified that the "SystemCallFilter=~socket" option prevents a service to use the ping utility when running on kernel 5.5.	2020-08-21 10:10:29 +02:00
Zbigniew Jędrzejewski-Szmek	604b163a31	test-seccomp: minor simpification	2020-08-05 10:49:46 +02:00
Lennart Poettering	6b000af4f2	tree-wide: avoid some loaded terms https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.	2020-06-25 09:00:19 +02:00
Topi Miettinen	3c14dc61f7	tests: various small fixes for strict systems Don't assume that 4MB can be allocated from stack since there could be smaller DefaultLimitSTACK= in force, so let's use malloc(). NUL terminate the huge strings by hand, also ensure termination in test_lz4_decompress_partial() and optimize the memset() for the string. Some items in /proc and /etc may not be accessible to poor unprivileged users due to e.g. SELinux, BOFH or both, so check for EACCES and EPERM. /var/tmp may be a symlink to /tmp and then path_compare() will always fail, so let's stick to /tmp like elsewhere. /tmp may be mounted with noexec option and then trying to execute scripts from there would fail. Detect and warn if seccomp is already in use, which could make seccomp test fail if the syscalls are already blocked. Unset $TMPDIR so it will not break specifier tests where %T is assumed to be /tmp and %V /var/tmp.	2020-04-26 20:18:48 +02:00
Yu Watanabe	dd0395b565	make namespace_flags_to_string() not return empty string This improves the following debug log. Before: systemd[1162]: Restricting namespace to: . After: systemd[1162]: Restricting namespace to: n/a.	2020-03-03 21:17:38 +01:00
Mike Gilbert	fb4b0465ab	seccomp: real syscall numbers are >= 0 Real syscall numbers start at 0. The fake seccomp values seem to be strictly less than 0. Fixes: `4df8fe8415`	2019-12-09 11:29:06 +01:00
Christian Ehrhardt	49219b5c2a	seccomp: mmap test results depend on kernel/libseccomp/glibc Like with shmat already the actual results of the test test_memory_deny_write_execute_mmap depend on kernel/libseccomp/glibc of the platform it is running on. There are known-good platforms, but on the others do not assert success (which implies test has actually failed as no seccomp blocking was achieved), but instead make the check dependent to the success of the mmap call on that platforms. Finally the assert of the munmap on that valid pointer should return ==0, so that is what the check should be for in case of p != MAP_FAILED. Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>	2019-12-05 07:19:12 +01:00
Lennart Poettering	8af381679d	Merge pull request #13940 from keur/protect_kernel_logs Add ProtectKernelLogs to systemd.exec	2019-11-15 16:26:10 +01:00
Lennart Poettering	4df8fe8415	seccomp: more comprehensive protection against libseccomp's __NR_xyz namespace invasion A follow-up for `59b657296a`, adding the same conditioning for all cases of our __NR_xyz use. Fixes: #14031	2019-11-15 08:13:36 +01:00
Kevin Kuehler	97d05f3b70	test/test-seccomp: add test_protect_syslog	2019-11-14 13:31:03 -08:00
Yu Watanabe	df26692947	tree-wide: drop sched.h when missing_sched.h is included	2019-11-04 00:30:32 +09:00
Yu Watanabe	f5947a5e92	tree-wide: drop missing.h	2019-10-31 17:57:03 +09:00
Lennart Poettering	7bbc229cf7	test: use the new action in our tests This way, we know that it works as intended.	2019-05-24 10:48:28 +02:00
Zbigniew Jędrzejewski-Szmek	dff6c6295b	test-seccomp: fix compilation on arm64 It has no open().	2019-04-03 13:24:43 +02:00
Lennart Poettering	167fc10cb3	test: add test case for restrict_suid_sgid()	2019-04-02 16:56:48 +02:00
Zbigniew Jędrzejewski-Szmek	67fb5f338f	seccomp: allow shmat to be a separate syscall on architectures which use a multiplexer After https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817, those syscalls have their separate numbers and we can block them. But glibc might still use the old ones. So let's just do a best-effort block and not assume anything about how effective it is.	2019-03-15 15:46:41 +01:00
Zbigniew Jędrzejewski-Szmek	e55bdf9b6c	seccomp: shm{get,at,dt} now have their own numbers everywhere E.g. on i686: (previously) arch x86: SCMP_SYS(mmap) = 90 arch x86: SCMP_SYS(mmap2) = 192 arch x86: SCMP_SYS(shmat) = -221 arch x86: SCMP_SYS(shmat) = -221 arch x86: SCMP_SYS(shmdt) = -222 (now) arch x86: SCMP_SYS(mmap) = 90 arch x86: SCMP_SYS(mmap2) = 192 arch x86: SCMP_SYS(shmat) = 397 arch x86: SCMP_SYS(shmat) = 397 arch x86: SCMP_SYS(shmdt) = 398 The relevant commit seems to be https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817.	2019-03-15 15:28:43 +01:00
Lennart Poettering	d8b4d14df4	util: split out nulstr related stuff to nulstr-util.[ch]	2019-03-14 13:25:52 +01:00
Lennart Poettering	0a9707187b	util: split out memcmp()/memset() related calls into memory-util.[ch] Just some source rearranging.	2019-03-13 12:16:43 +01:00
Lennart Poettering	5f00dc4df6	test: skip various tests if namespacing is not available Apparently on Debian LXC/AppArmor doesn't allow namespacing to container payloads. Deal with it. Fixes: #9700	2018-10-24 19:40:24 +02:00
Zbigniew Jędrzejewski-Szmek	b54f36c604	seccomp: reduce logging about failure to add syscall to seccomp Our logs are full of: Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain ... This is pointless and makes debug logs hard to read. Let's keep the logs in test code, but disable it in nspawn and pid1. This is done through a function parameter because those functions operate recursively and it's not possible to make the caller to log meaningfully. There should be no functional change, except the skipped debug logs.	2018-09-24 17:21:09 +02:00
Zbigniew Jędrzejewski-Szmek	f09da7ccbc	test-seccomp: log function names Various tests produce similar output, and the function names make it easier to see where the output is generated.	2018-09-24 17:21:09 +02:00
Zbigniew Jędrzejewski-Szmek	23e12f8e6c	test-seccomp: move two similar tests closer	2018-09-24 17:19:11 +02:00
Yu Watanabe	cd90ec7544	test-seccomp: add log messages when skipping tests	2018-09-21 00:32:44 +09:00
Zbigniew Jędrzejewski-Szmek	6d7c403324	tests: use a helper function to parse environment and open logging The advantages are that we save a few lines, and that we can override logging using environment variables in more test executables.	2018-09-14 09:29:57 +02:00
Lennart Poettering	705268414f	seccomp: add new system call filter, suitable as default whitelist for system services Currently we employ mostly system call blacklisting for our system services. Let's add a new system call filter group @system-service that helps turning this around into a whitelist by default. The new group is very similar to nspawn's default filter list, but in some ways more restricted (as sethostname() and suchlike shouldn't be available to most system services just like that) and in others more relaxed (for example @keyring is blocked in nspawn since it's not properly virtualized yet in the kernel, but is fine for regular system services).	2018-06-14 17:44:20 +02:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Yu Watanabe	86c2a9f1c2	nsflsgs: drop namespace_flag_{from,to}_string() This also drops namespace_flag_to_string_many_with_check(), and renames namespace_flag_{from,to}_string_many() to namespace_flags_{from,to}_string().	2018-05-05 11:07:37 +09:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Lennart Poettering	7d4904fe7a	process-util: rework wait_for_terminate_and_warn() to take a flags parameter This renames wait_for_terminate_and_warn() to wait_for_terminate_and_check(), and adds a flags parameter, that controls how much to log: there's one flag that means we log about abnormal stuff, and another one that controls whether we log about non-zero exit codes. Finally, there's a shortcut flag value for logging in both cases, as that's what we usually use. All callers are accordingly updated. At three occasions duplicate logging is removed, i.e. where the old function was called but logged in the caller, too.	2018-01-04 13:27:27 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Yu Watanabe	b4891260b9	test: add tests for syscall:errno style in SystemCallFilter=	2017-11-11 21:54:20 +09:00
Matija Skala	d7e454ba9c	fix includes sys/wait.h is needed for WEXITED macro poll.h is more portable than sys/poll.h	2017-10-30 10:32:45 +01:00
Lennart Poettering	25e94f8c75	tests: let's make sure the seccomp filter lists remain properly ordered It's too easy to corrupt the order, hence let's check for the right order automatically as part of testing.	2017-09-14 15:45:21 +02:00
Lennart Poettering	21022b9dde	util-lib: wrap personality() to fix up broken glibc error handling (#6766 ) glibc appears to propagate different errors in different ways, let's fix this up, so that our own code doesn't get confused by this. See #6752 + #6737 for details. Fixes: #6755	2017-09-08 17:16:29 +03:00
Evgeny Vereshchagin	48fa42d4ef	tests: check the return value of personality when errno is not set (#6752 ) The `personality` wrapper might not set errno, so in that case the return value should be checked instead. For details, see https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=e0043e17dfc52fe1702746543127cb4a87232bcd. Closes #6737.	2017-09-06 06:08:04 +02:00
Lennart Poettering	72eafe7159	seccomp: rework seccomp_lock_personality() to apply filter to all archs	2017-08-29 15:58:13 +02:00
Lennart Poettering	e8132d63fe	seccomp: default to something resembling the current personality when locking it Let's lock the personality to the currently set one, if nothing is specifically specified. But do so with a grain of salt, and never default to any exotic personality here, but only PER_LINUX or PER_LINUX32.	2017-08-29 15:56:57 +02:00
Topi Miettinen	78e864e5b3	seccomp: LockPersonality boolean (#6193 ) Add LockPersonality boolean to allow locking down personality(2) system call so that the execution domain can't be changed. This may be useful to improve security because odd emulations may be poorly tested and source of vulnerabilities, while system services shouldn't need any weird personalities.	2017-08-29 15:54:50 +02:00
Zbigniew Jędrzejewski-Szmek	f60a865a49	test-seccomp: arm64 does not have access() and poll() glibc uses faccessat and ppoll, so just add a filters for that. (cherry picked from commit abc0213839fef92e2e2b98a434914f22ece48490)	2017-07-15 17:18:22 -04:00
Zbigniew Jędrzejewski-Szmek	2e64e8f46d	seccomp: arm64/x32 do not have _sysctl So don't even try to added the filter to reduce noise. The test is updated to skip calling _sysctl because the kernel prints an oops-like message that is confusing and unhelpful: Jul 15 21:07:01 rpi3 kernel: test-seccomp[8448]: syscall -10080 Jul 15 21:07:01 rpi3 kernel: Code: aa0503e4 aa0603e5 aa0703e6 d4000001 (b13ffc1f) Jul 15 21:07:01 rpi3 kernel: CPU: 3 PID: 8448 Comm: test-seccomp Tainted: G W 4.11.8-300.fc26.aarch64 #1 Jul 15 21:07:01 rpi3 kernel: Hardware name: raspberrypi rpi/rpi, BIOS 2017.05 06/24/2017 Jul 15 21:07:01 rpi3 kernel: task: ffff80002bb0bb00 task.stack: ffff800036354000 Jul 15 21:07:01 rpi3 kernel: PC is at 0xffff8669c7c4 Jul 15 21:07:01 rpi3 kernel: LR is at 0xaaaac64b6750 Jul 15 21:07:01 rpi3 kernel: pc : [<0000ffff8669c7c4>] lr : [<0000aaaac64b6750>] pstate: 60000000 Jul 15 21:07:01 rpi3 kernel: sp : 0000ffffdc640fd0 Jul 15 21:07:01 rpi3 kernel: x29: 0000ffffdc640fd0 x28: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x27: 0000000000000000 x26: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x25: 0000000000000000 x24: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x23: 0000000000000000 x22: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x21: 0000aaaac64b4940 x20: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x19: 0000aaaac64b88f8 x18: 0000000000000020 Jul 15 21:07:01 rpi3 kernel: x17: 0000ffff8669c7a0 x16: 0000aaaac64d2ee0 Jul 15 21:07:01 rpi3 kernel: x15: 0000000000000000 x14: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x13: 203a657275746365 x12: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x11: 0000ffffdc640418 x10: 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x9 : 0000000000000005 x8 : 00000000ffffd8a0 Jul 15 21:07:01 rpi3 kernel: x7 : 7f7f7f7f7f7f7f7f x6 : 7f7f7f7f7f7f7f7f Jul 15 21:07:01 rpi3 kernel: x5 : 65736d68716f7277 x4 : 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x3 : 0000000000000008 x2 : 0000000000000000 Jul 15 21:07:01 rpi3 kernel: x1 : 0000000000000000 x0 : 0000000000000000 Jul 15 21:07:01 rpi3 kernel: (cherry picked from commit 1e20e640132c700c23494bb9e2619afb83878380)	2017-07-15 17:18:22 -04:00
Zbigniew Jędrzejewski-Szmek	da1921a5c3	seccomp: enable RestrictAddressFamilies on ppc64, autodetect SECCOMP_RESTRICT_ADDRESS_FAMILIES_BROKEN We expect that if socket() syscall is available, seccomp works for that architecture. So instead of explicitly listing all architectures where we know it is not available, just assume it is broken if the number is not defined. This should have the same effect, except that other architectures where it is also broken will pass tests without further changes. (Architectures where the filter should work, but does not work because of missing entries in seccomp-util.c, will still fail.) i386, s390, s390x are the exception — setting the filter fails, even though socket() is available, so it needs to be special-cased (https://github.com/systemd/systemd/issues/5215#issuecomment-277241488). This remove the last define in seccomp-util.h that was only used in test-seccomp.c. Porting the seccomp filter to new architectures should be simpler because now only two places need to be modified. RestrictAddressFamilies seems to work on ppc64[bl]e, so enable it (the tests pass).	2017-05-10 09:21:16 -04:00
Zbigniew Jędrzejewski-Szmek	511ceb1f8d	seccomp: assume clone() arg order is known on all architectures While adding the defines for arm, I realized that we have pretty much all known architectures covered, so SECCOMP_RESTRICT_NAMESPACES_BROKEN is not necessary anymore. clone(2) is adamant that the order of the first two arguments is only reversed on s390/s390x. So let's simplify things and remove the #if.	2017-05-07 20:01:04 -04:00
Zbigniew Jędrzejewski-Szmek	4278d1f531	seccomp: add mmap/shmat defines for arm and arm64	2017-05-07 20:01:04 -04:00
Zbigniew Jędrzejewski-Szmek	2a8d6e6395	seccomp: add mmap/shmat defines for ppc64	2017-05-07 20:01:04 -04:00
Zbigniew Jędrzejewski-Szmek	2a65bd94e4	seccomp: drop SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN, add test for shmat SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN was conflating two separate things: 1. whether shmat/shmdt/shmget can be filtered (if ipc multiplexer is used, they can not) 2. whether we know this for the current architecture For i386, shmat is implemented as ipc, so seccomp filter is "broken" for shmat, but not for mmap, and SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN cannot be used to cover both cases. The define was only used for tests — not in the implementation in seccomp-util.c. So let's get rid of SECCOMP_MEMORY_DENY_WRITE_EXECUTE_BROKEN and encode the right condition directly in tests.	2017-05-07 18:59:37 -04:00

1 2

58 commits