This adds a number of entries nspawn already applies to regular service
namespacing too. Most importantly let's mask /proc/kcore and
/proc/kallsyms too.
We need to do this in all cases, including on cgroupsv1 in order to
ensure the host systemd and any systemd in the payload won't fight for
the cgroup attributes of the top-level cgroup of the payload.
This is because systemd for Delegate=yes units will only delegate the
right to create children as well as their attributes. However, nspawn
expects that the cgroup delegated covers both the right to create
children and the attributes of the cgroup itself. Hence, to clear this
up, let's unconditionally insert a intermediary cgroup, on cgroupsv1 as
well as cgroupsv2, unconditionally.
This is also nice as it reduces the differences in the various setups
and exposes very close behaviour everywhere.
Let's not make /run too special and let's make sure the source file is
not guessable: let's use our regular temporary file helper calls to
create the source node.
This tightens security on /proc: a couple of files exposed there are now
made inaccessible. These files might potentially leak kernel internals
or expose non-virtualized concepts, hence lock them down by default.
Moreover, a couple of dirs in /proc that expose stuff also exposed in
/sys are now marked read-only, similar to how we handle /sys.
The list is taken from what docker/runc based container managers
generally apply, but slightly extended.
if we lack privs to create device nodes that's fine, and creating
/run/systemd/inaccessible/chr or /run/systemd/inaccessible/blk won't
work then. Document this in longer comments.
Fixes: #4484
This is based on @jsynacek's patch from #8837, but adds the new URL in
two flavours instead of replacing the old, also making @keszybz happy.
Replaces: #8837
Before this, `signal_from_string()` accepts simple signal name
or RTMIN+n. This makes the function also accept RTMIN, RTMAX,
and RTMAX-n.
Note that RTMIN+0 is equivalent to RTMIN, and RTMAX-0 is to RTMAX.
This also fixes the integer overflow reported by oss-fuzz #8064.
https://oss-fuzz.com/v2/testcase-detail/5648573352902656
The leak can be reproduced by running systemd-path --suffix .tmp under valgrind or asan:
$ ./build/systemd-path --suffix .tmp search-binaries
/usr/local/bin/.tmp:/usr/bin/.tmp:/usr/local/sbin/.tmp:/usr/sbin/.tmp:/home/vagrant/.local/bin/.tmp:/home/vagrant/bin/.tmp
=================================================================
==19177==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 56 byte(s) in 1 object(s) allocated from:
*0 0x7fd6adf72850 in malloc (/lib64/libasan.so.4+0xde850)
*1 0x7fd6ad2b93d2 in malloc_multiply ../src/basic/alloc-util.h:69
*2 0x7fd6ad2bafd2 in strv_split ../src/basic/strv.c:269
*3 0x7fd6ad42ba67 in search_from_environment ../src/libsystemd/sd-path/sd-path.c:409
*4 0x7fd6ad42bffe in get_search ../src/libsystemd/sd-path/sd-path.c:482
*5 0x7fd6ad42c55b in sd_path_search ../src/libsystemd/sd-path/sd-path.c:607
*6 0x7fd6ad42b3a2 in sd_path_home ../src/libsystemd/sd-path/sd-path.c:348
*7 0x55f59c65ebea in print_home ../src/path/path.c:97
*8 0x55f59c65f157 in main ../src/path/path.c:177
*9 0x7fd6abaea009 in __libc_start_main (/lib64/libc.so.6+0x21009)
Indirect leak of 68 byte(s) in 5 object(s) allocated from:
*0 0x7fd6adf72850 in malloc (/lib64/libasan.so.4+0xde850)
*1 0x7fd6abb5f689 in strndup (/lib64/libc.so.6+0x96689)
Indirect leak of 25 byte(s) in 1 object(s) allocated from:
*0 0x7fd6adf72850 in malloc (/lib64/libasan.so.4+0xde850)
*1 0x7fd6abb5f689 in strndup (/lib64/libc.so.6+0x96689)
*2 0x6c2e2f746e617266 (<unknown module>)
SUMMARY: AddressSanitizer: 149 byte(s) leaked in 7 allocation(s).
Currently, "reboot" behaves differently in setups with and without logind.
If logind is used (which is probably the more common case) the operation
is asynchronous, we should behave in the same way as "systemctl <verb>".
Let's clean this up, and always expose the same behaviour, regardless if
logind is used or not: let's always make it asynchronous.
See: #6479
Fixes: commit 130246d2e8
Let's simplify the code a bit. Let's reduce the number of redundant if
checks a bit, (i.e. if we want to check for equality with
VIRTUALIZATION_VM_OTHER there's no need to check for non-equality with
VIRTUALIZATION_NONE first). As a very welcome side-effect this means we
lose some lines of code and our level of indentation is reduced.
No changes in behaviour.
Scope units are populated from PIDs specified by the bus client. We do
that when a scope is started. We really shouldn't allow scopes to be
started multiple times, as the PIDs then might be heavily out of date.
Moreover, clients should have the guarantee that any scope they allocate
has a clear runtime cycle which is not repetitive.
Let's properly terminate on SIGTERM or SIGINT. Previously we'd just rely
on the implicit process clean-up logic on UNIX. By shutting down
properly on SIGTERM/SIGINT we make it easier to track down memory leaks
by employing valgrind.
Let's propagate errors correctly, and stick to the usual naming and
behaviour of these functions. Or in other words, make this closer to the
matching code in machined.
This extends the change done in b29f6480ec to other logging functions.
This actually fixes some bugs in callers of log_struct(), for example
config_parse_alias() called 'return log_syntax(..., 0, ...)' which could result
in a bogus non-zero return value.
Calls to log_object() and log_format_iovec() — which is only used by
server_driver_message() — appear correct.
Let's optionally translate BSD exit codes to error strings too.
My first approach on adding this was to turn ExitStatusLevel into a
bitmask rather than a linear level, with one bit for the various feature
bits. However, the exit code ranges are generally not defined
independently from each other, i.e. our own ones are defined with the
LSB ones in mind, and most sets are defined with the ISO C ones.
Hence, instead I changed the existing hierarchy of MINIMAL, SYSTEMD, LSB
with an alias of FULL == LSB, only slightly by seperating FULL and LSB
into two separate levels, so that there's now:
1. MINIMAL (only EXIT_SUCCESS/EXIT_FAILURE)
2. SYSTEMD (incorporating our own exit codes)
3. LSB (like SYSTEMD but adding in LSB service exit codes)
4. FULL (like FULL but adding BSD exit codes)
Note that across the codebase only FULL, SYSTEMD, and MINIMAL are used,
depending on context, how much we know about the process and whether we
are logging for debugging purposes or not. This means the LSB level
wouldn't really have to be separate, but it appeared careless to me to
fold it into FULL along with the BSD exit codes.
Note that this commit doesn't change much for regular codepaths: the
FULL exit status level is only used during debug logging, as a helper to
the user reading the debug logs.
Of course, alloca() shouldn't be used with anything that can grow
without bounds anyway, but let's better safe than sorry, and catch this
early.
Since alloca() is not supposed to return an error we trigger an
assert() instead, which is still better than heap trickery.
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
Rather than choosing to set or unset any of these flag
use kernel defaults. This patch makes following properties to unset.
UseBPDU = unset
HairPin = unset
FastLeave = unset
AllowPortToBeRoot = unset
UnicastFlood = unset
If an interface name is changed, then the link state, especially
managed or not, may need to be updated, as its corresponding
.link or .network files may be different. So, let's once drop
the link and recreate a new link object.
Fixes#8794.