This also changes that .netdev files are loaded in ascending order.
Otherwise, when NetDev.ifname= setting conflicts with other .netdev file,
then .netdev file with large prefix number wins.
We nowadays rename our child processes, hence argv[] will be clobbered,
let's hence copy the device path to dynamic memory before forking.
This is fall-out from 60ffa37a65 since we
now a lot more often end up overriding the argv[] buffer than before,
simple because we know what to override.
These kind of bugs kinda suck. THere are only two options here: stop
overriding argv[] for all cases (or just these cases) or explicitly
copying out everything we need in child processes before forking. With
this patch I opt for the latter, though I am not 100% convinced this is
a great solution. Just a better solution than everything else, i.e.
allowing argv[] to remain out of sync with what others see.
Fixes: #12135
Apparently by the C standard "int" bitfields can have any signedness
(unlike non-bitfield declarations which are "signed" if the signedness
is not specified).
Let's fix the LGTM warning about this hence and be explicit that we mean
"signed" here.
Some chattrs only work sensible if you set them right after opening a
file for create (think: FS_NOCOW_FL). Others only work when they are
applied when the file is fully written (think: FS_IMMUTABLE_FL). Let's
take that into account when copying files and applying a chattr to them.
With new LUKS2 header format it is possible to use Argon2 key derivation
function. This function is "memory-hard" hence keyslot unlocking can
potentially use a lot of RAM as this increases resistance to massively
parallel GPU based password cracking.
However, when multiple systemd-cryptsetup binaries run at the same
time it is very likely that system using Argon2 (e.g. Fedora 30)
will encounter memory-pressure during early boot, following OOM killing
spree.
This patch aims to lower the damage done by OOM killer and sets OOMScore
for systemd-cryptsetup units to 500. Hopefully OOM killer will then
shoot us down and leave rest of the system services alone.
We are about to add system calls (rseq()) not available on old
libseccomp/old kernels, and hence we need to be permissive when parsing
our definitions.
I couldn't figure out what is going on here, because LTO inlines everything and
then the backtrace reported a different spot. But when compiled with NDEBUG but
no LTO, it's fairly obvious ;)
C.f. #12008.
When compiled with -DNDEBUG, we get warnings about set-but-unused variables.
In general, it's not something we care about, but since removing those
variables arguably makes the code nicer, let's just to it in this case.
Let's be helpful to static analyzers which care about whether we
knowingly ignore return values. We do in these cases, since they are
usually part of error paths.
Let's unify the two similar code paths to watch /run/systemd/journal.
The code in manager.c is similar, but it uses mkdir_p_label(), and unifying
that would be too much trouble, so let's just adjust the error messages to
be the same.
CID #1400224.
Coverity is unhappy because we use "line" in the assert that checks
the return value. It doesn't matter much, but let's clean this up.
Also, let's not assume that /proc/cmdline contains anything.
CID #1400219.
When running in Fedora "mock", / is a tmpfs and /home is not mounted. The test
assumes that /home will be a tmpfs only and only if we can unshare. Obviously,
this does not hold in this case, because unsharing is not possible, but /home
is still a tmpfs. Let's just skip the test, since it's fully legitimate to
mount either or both of / and /home as tmpfs.
test_exec_ambientcapabilities: exec-ambientcapabilities-nobody.service: exit status 0, expected 1
Sometimes we get just the last line, for example from the failure summary,
so make it as useful as possible.
This adds some extra paranoia: when we recursively chown a directory for
use with DynamicUser=1 services we'll now drop suid/sgid from all files
we chown().
Of course, such files should not exist in the first place, and noone
should get access to those dirs who isn't root anyway, but let's better
be safe than sorry, and drop everything we come across.
Add even more suid/sgid protection to DynamicUser= envionments: the
state directories we bind mount from the host will now have the nosuid
flag set, to disable the effect of nosuid on them.
The host mounts it like that, nspawn hence should do too.
Moreover, mount the file system after doing CLONEW_NEWIPC so that it
actually reflects the right mqueues. Finally, mount it wthout
considering it fatal, since POSIX mqueue support is little used and it
should be fine not to support it in the kernel.
The function is otherwise generic enough to toggle other bind mount
flags beyond MS_RDONLY (for example: MS_NOSUID or MS_NODEV), hence let's
beef it up slightly to support that too.
Humans are susceptible to making orthographic errors sometimes. A
misspelled "%systemd_post caek.service" would go unnoticed due to all
output from systemctl being discarded if and when %post runs.
To alleviate this, cease hiding outputs. Then, to account for the
potential absence of systemd from the system, add file checks so as
to not generate a "command not found" error.
This adds support for user to set & get reboot parameter for reboot.
As callee would be next issuing Reboot call same policy checks are being used.
If unit file issuing the reboot action defines RebootArgument (or similar) that
setting takes precedence.
Previously both run() and run_container() would free 'fds'. Let's fix
that, and let run() free it but make run_container() already remove all
fds from it, because that's what we actually want to do.
Fixes: #12073
Commit d85515edcf changed logic how reboot is
executed. That commit changed behavior to use emergency action reboot code path
to perform the reboot.
This inadvertently broke rebooting with argument:
$ systemctl reboot custom-reason
Restore original behavior so that if reboot service unit similar to
systemd-reboot.service is executed it is possible to override reboot reason
with "systemctl reboot ARG".
When "systemctl reboot ARG" is executed ARG is placed in file
/run/systemd/reboot-param and reboot is issued using logind's Reboot
dbus-service.
If RebootArgument is specified in systemd-reboot.service it takes precedence
over what systemctl sets.
Fixes: #11828
Invoking %systemd_tmpfiles (in %post) without any arguments, while
possible, will cause systemd-tmpfiles to process the entire system
configuration, rather than just the newly installed configuration
files. In https://github.com/systemd/systemd/pull/12048, it was
established that processing everything constitutes unusual practice,
and should be flagged as a mistake at build time.
Furthermore, invoking %systemd_post without any arguments will cause
the underlying `systemctl preset` to outright return an error ("Too
few arguments") when run. This can be flagged during build time in
the same manner.
As I have found no ways to successfully nest %if clauses inside a
macro[1], I am helping myself by reusing the recursive variable
expansion technique pioneered in [2].
Now, when %systemd_post or %systemd_tmpfiles is incorrectly used,
rpm gives accurate line number reporting, too:
error: This macro requires some arguments
error: line 11: %{systemd_post}
error: This macro requires two arguments
error: line 13: %{tmpfiles_create_package meh more more}
[1] what has been tried: %{expand:%%if "%#" == 0 \\\
%%{error:you have given me %# args} \\\
%%endif}
[2] http://git.savannah.gnu.org/cgit/automake.git/commit/?id=e0bd4af16da88e4c2c61bde42675660eff7dff51
As general principle, we generally check command line args first, the
enviroment second, and external configuration and system state only later.
In case of the invocation ID, checking the keyring before the environment
was implemented as a poor-man's security measure. But this is not really
useful, since we're moving within the same security boundary. So let's just
do the expected thing, and check environment first.
Prompted by https://github.com/systemd/systemd/pull/11991#issuecomment-474647652.
This makes two changes:
1. Instead of resetting the configured service TTY each time after a
process exited, let's do so only when the service goes back to "dead"
state. This should be preferable in case the started processes leave
background child processes around that still reference the TTY.
2. chmod() and chown() the TTY at the same time. This should make it
safe to run "systemd-run -p DynamicUser=1 -p StandardInput=tty -p
TTYPath=/dev/tty8 /bin/bash" without leaving a TTY owned by a dynamic
user around.
Since switching to extract_first_word with no flags for parsing
unit names in 4c9565eea5, escape
characters will be stripped from escaped unit names such as
"mnt-persistent\x2dvolume.mount" resulting in the unit not being
configured as defined. Preserve escape characters again for
compatibility with existing preset definitions.
On arm64 with gcc-8.2.1-5.fc29.aarch64:
../src/test/test-fileio.c:645:29: warning: comparison is always false due to limited range of data type [-Wtype-limits]
assert_se(c == EOF || safe_fgetc(f, &c) == 1);
^~
Casting c to int is not enough, gcc is able to figure out that the original
type was unsigned and still warns. So let's just silence the warning like
in test-sizeof.c.
Fixes#12036.
../../../src/systemd/src/libsystemd/sd-bus/bus-objects.c: In function ‘add_object_vtable_internal’:
../../../src/systemd/src/basic/macro.h:423:19: error: duplicate case value
The thread launched in journal_file_set_offline() accesses a memory
mapped file, so it needs to handle SIGBUS. Leave SIGBUS unblocked on the
offlining thread so that it uses the same handler as the main thread.
The result of triggering SIGBUS in a thread where it's blocked is
undefined in Linux. The tested implementations were observed to cause
the default handler to run, taking down the whole journald process.
We can leave SIGBUS unblocked in multiple threads since it's handler is
thread-safe. If SIGBUS is sent to the journald process asynchronously
(i.e. with kill, sigqueue, or raise), either thread handling it will
result in the same behavior: it will install the default handler and
reraise the signal, killing the process.
Fixes: #12042
If we know that main pid is our child then it's unnecessary to watch all
other processes of a unit since in this case we will get SIGCHLD when the main
process will exit and will act upon accordingly.
So let's watch all processes only if the main process is not our child since in
this case we need to detect when the cgroup will become empty in order to
figure out when the service becomes dead. This is only needed by cgroupv1.
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.
However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.
If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
It's a glibc-specific API, but supported on FreeBSD and musl too at
least, hence fairly common. This way we can reduce our calls to
realloc() as much as possible.
The loop around ttyname_r() makes it look like we use unbounded stack
allocations. We know that that paths have a maximum size, so let's simplify
the whole thing.
Replaces #12043.
It's probably unexpected if we do a recursive chown() when dynamic users
are used but not on static users.
hence, let's tweak the logic slightly, and recursively chown in both
cases, except when operating on the configuration directory.
Fixes: #11842
Let's reduce the chance of failure: if we can't apply the chmod/chown
requested, check if it's applied anyway, and if so, supress the error.
This is even race-free since we operate on an O_PATH fd anyway.