Split out of #15457, let's see if this is the culprit of the CI failure.
(also setting green label here, since @keszybz already greenlit it in that other PR)
We unregister binfmt_misc twice during shutdown with this change:
1. A previous commit added support for doing that in the final shutdown
phase, i.e. when we do the aggressive umount loop. This is the robust
thing to do, in case the earlier ("clean") shutdown phase didn't work
for some reason.
2. This commit adds support for doing that when systemd-binfmt.service
is stopped. This is a good idea so that people can order mounts
before the service if they want to register binaries from such
mounts, as in that case we'll undo the registration on shutdown
again, before unmounting those mounts.
And all that, just because of that weird "F" flag the kernel introduced
that can pin files...
Fixes: #14981
Let's just copy out the bit of the string we need, and let's make sure
we refuse rules called "status" and "register", since those are special
files in binfmt_misc's file system.
Apparently if the new "F" flag is used they might pin files, which
blocks us from unmounting things. Let's hence clear this up explicitly.
Before entering our umount loop.
Fixes: #14981
let's return ENOSYS in that case, to make things a bit less confusng.
Previously we'd just propagate ENOENT, which people might mistake as
applying to the object being modified rather than /proc/ just not being
there.
Let's return ENOSYS instead, i.e. an error clearly indicating that some
kernel API is not available. This hopefully should put people on a
better track.
Note that we only do the procfs check in the error path, which hopefully
means it's the less likely path.
We probably can add similar bits to more suitable codepaths dealing with
/proc/self/fd, but for now, let's pick to the ones noticed in #14745.
Fixes: #14745
Otherwise we'd not read the services input while waiting for the job to
wait, and there's no point in waiting for the job anyway if we wait for
the unit to stop ultimately.
Fixes: #15395
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
ubsan complains that we add an offset to a NULL ptr here in some cases.
Which isn't really a bug though, since we only use it as the end
condition for a for loop, but we can still fix it...
Fixes: #15522
Let's add flavours for copying stub/uplink resolv.conf versions.
Let's add a more brutal "replace" mode, where we'll replace any existing
destination file.
Let's also change what "auto" means: instead of copying the static file,
let's use the stub file, so that DNS search info is copied over.
Fixes: #15340
When a job is skipped due its dependencies not being ready, log
a debug message saying what is holding it back.
This was very useful with transient units timing out to figure
out where the problem was.
Anyone previously using the UseRoutes=false parameter expected their
dhcp4-provided gateway route to be ignored, as well. However, with
the introduction of the UseGateway= parameter, this is no longer true.
In order to keep backwards compatibility, this sets the UseGateway=
default value to whatever UseRoutes= has been set to.
Just as log_full already does, check if the log level would result in
logging immediately in the macro in order to avoid doing
unnecessary work that adds up in hot spots.
Let's define a new, generic bus interface that any daemon can implement
for querying/setting the log level.
We can turn this into something more powerful later on, but for now,
only expose three properties: the log level, log target and the syslog
identifier (with the former two being writable).
This is supposed to be generic, so that it can be implemented by 3rd
party daemons too, eventually.
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.
Fixes: #14972
When the stub listener is disabled, stub-resolv.conf is useless. Instead of
warning about this, let's just make stub-resolv.conf point to the private
resolv.conf file. (The original bug report asked for "mirroring", but I think
a symlink is nicer than a copy because it is easier to see that a redirection
was made.)
Fixes#14700.
All callers ignore the return value.
This is almost entirely theoretical, since writing to /run is unlikely to
fail..., but the user is almost certainly better off keeping the old copy
around and having working dns resolution with an out-of-date dns server list
than having having a dangling /etc/resolv.conf symlink.
An error with a full path is immediately clear. OTOH, a user might not be
familiar with concenpt like "private resolv.conf".
I opted to use %s-formatting for the path, because the code is much easier to
read this way. Any difference in t speed of execution is not important.
This is useful to raise the log level for a single transaction or a few,
without affecting other state of the resolved as a restart would.
The log level can only be set, I didn't bother with having the ability
to restore the original as in pid1.
Hiding the first column, which may contain bullet circles, with --no-legend
is undocumented and potentially unexpected. On the other hand, not printing
bullet circles with --plain is documented so hiding the column with that
switch is sensible.
The combination "--full --no-legend --no-pager --plain" is appropriate for
automated processing of systemctl output.
ebf963c551 changed the 'sep' argument to always
be either " " or "\n", which broke the indentation logic for the first line
in base64_append_width(). Since it now always is one character, and never NULL,
let's change the type to char and simplify the logic a bit.
$ COLUMNS=30 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1
AwEAAcLPVEcg0hFBheXQf
QOqqLiRgckk69o2KTAsq3
lNRY0c9mnEjzZDGsGmXNy
2EQ6yelkIYYus7KLor2Fz
x59hEqcM82zqkdHV6hXvZ
yjxxSHG3nl8xQS6gF8mdI
YouDTWWhTInfjSKoIeDok
Hq3S67EjSngV7/wVCMTbI
amS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
$ COLUMNS=120 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1 AwEAAcLPVEcg0hFBheXQfQOqqLiRgckk69o2KTAsq3lNRY0c9mnEjzZDGsGmXNy2EQ6yelkIYYus7KLor
2Fzx59hEqcM82zqkdHV6hXvZyjxxSHG3nl8xQS6gF8mdIYouDTWWhTInfjSKoIeDokHq3S67EjSngV7/w
VCMTbIamS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
Let's make it optional whether auditing is enabled at journald start-up
or not.
Note that this only controls whether audit is enabled/disabled in the
kernel. Either way we'll still collect the audit data if it is
generated, i.e. if some other tool enables it, we'll collect it.
Fixes: #959
There are legitimate reasons to access the file directly, as currently
discussed on fedora-devel. Hence tone things down from "must" to "should
typically not".
Also, let's use fputs() instead of fputs_unlocked() here,
fopen_temporary_label() turns off stdio locking anyway for the whole
FILE*, hence no need to do this manually each time.
Builds with recent glibc would fail with:
../src/network/netdev/fou-tunnel.c: In function ‘config_parse_ip_protocol’:
../src/basic/macro.h:380:9: error: static assertion failed: "IPPROTO_MAX-1 <= UINT8_MAX"
380 | static_assert(expr, #expr)
| ^~~~~~~~~~~~~
../src/network/netdev/fou-tunnel.c:161:9: note: in expansion of macro ‘assert_cc’
161 | assert_cc(IPPROTO_MAX-1 <= UINT8_MAX);
| ^~~~~~~~~
This is because f9ac84f92f151e07586c55e14ed628d493a5929d (present in
glibc-2.31.9000-9.fc33.x86_64) added IPPROTO_MPTCP=262, following
v5.5-rc5-1002-gfaf391c382 in the kernel.
The watchdog ping is performed for every iteration of manager event
loop. This results in a lot of ioctls on watchdog device driver
especially during boot or if services are aggressively using sd_notify.
Depending on the watchdog device driver this may have performance
impact on embedded systems.
The patch skips sending the watchdog to device driver if the ping is
requested before half of the watchdog timeout.
proot provides userspace-powered emulation of chroot and mount --bind,
lending it to be used on environments without unprivileged user
namespaces, or in otherwise restricted environments like Android.
In order to achieve this, proot makes use of the kernel's ptrace()
facility, which we can use in order to detect its presence. Since it
doesn't use any kind of namespacing, including PID namespacing, we don't
need to do any tricks when trying to get the tracer's metadata.
For our purposes, proot is listed as a "container", since we mostly use
this also as the bucket for non-container-but-container-like
technologies like WSL. As such, it seems like a good fit for this
section as well.
It's pretty, and it highlights that the pw prompt is kinda special and
needs user input.
We suppress the emoji entirel if there's no emoji support (i.e. this
means we suppress the ASCII replacement), since it carries no additional
information, it is just decoration to highlight a line.
We provide a way via the '-' symbol to ignore errors when nonexistent
executable files are passed to Exec* parameters & so on. In such a case,
the flag `EXEC_COMMAND_IGNORE_FAILURE` is set and we go on happily with
our life if that happens. However, `systemd-analyze verify` complained
about missing executables even in such a case. In such a case it is not
an error for this to happen so check if the flag is set before checking
if the file is accessible and executable.
Add some small tests to check this condition.
Closes#15218.
This commit introduces new meson build option "kernel-install" to prevent kernel-install from building if the user
sets the added option as "false".
Signed-off-by: Jakov Smolic <jakov.smolic@sartura.hr>
Signed-off-by: Luka Perkov <luka.perkov@sartura.hr>
An stdio FILE* stream usually refers to something with a file
descriptor, but that's just "usually". It doesn't have to, when taking
fmemopen() and similar into account. Most of our calls to fileno()
assumed the call couldn't fail. In most cases this was correct, but in
some cases where we didn't know whether we work on files or memory we'd
use the returned fd as if it was unconditionally valid while it wasn't,
and passed it to a multitude of kernel syscalls. Let's fix that, and do
something reasonably smart when encountering this case.
(Running test-fileio with this patch applied will remove tons of ioctl()
calls on -1).
This should hopefully help us avoid c&p mistakes. And there are plans to
add more settings like this, which should then be rather straightforward.
There is a slight functional change: the code got uplink handling wrong
and run manager_find_uplink() repeatedly. That part is fixed.
Let's not trigger MACs needlessly.
Ideally everybody would turn on userdb, but if people insist in not
doing so, then let's not attempt to open shadow.
It's a bit ugly to implement this, since shadow information is more than
just passwords (but accound validity metadata), and thus userdb's own
"privieleged" scheme is orthogonal to this, but let's still do this for
the client side.
Fixes: #15105
This patch changes the way user managers set the default umask for the units it
manages.
Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.
Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.
Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.
Fixes#6077.
When managing a home directory as LUKS image we currently place a
directory at the top that contains the actual home directory (so that
the home directory of the user won't be cluttered by lost-found and
suchlike). On btrfs let's make that a subvol though. This is a good idea
so that possibly later on we can make use of this for automatic history
management.
Fixes: #15121
The commit b3ac5f8cb9 has changed the system mount propagation to
shared by default, and according to the following patch:
https://github.com/opencontainers/runc/pull/208
When starting the container, the pouch daemon will call runc to execute
make-private.
However, if the systemctl daemon-reexec is executed after the container
has been started, the system mount propagation will be changed to share
again by default, and the make-private operation above will have no chance
to execute.
This makes the Environment entries more round-trippable: a similar format is
used for input and output. It is certainly more useful for users, because
showing [unprintable] on anything non-trivial makes systemctl show -p Environment
useless in many cases.
Fixes: #14723 and https://bugzilla.redhat.com/show_bug.cgi?id=1525593.
$ systemctl --user show -p Environment run-*.service
Environment=ASDF=asfd "SPACE= "
Environment=ASDF=asfd "SPACE=\n\n\n"
Environment=ASDF=asfd "TAB=\t\\" "FOO=X X"
non-underlined yellow uses RGB ANSI sequences while the underlined
version uses the paletted ANSI sequences. Let's unify that and use the
RGB sequence for both cases, so that underlined or not doesn't alter the
color.
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.
The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)
The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…
This effectively liberaralizes a lot what we expect from usernames.
The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.
Fixes: #15149#15090
The userdb_by_name() invocation immediately following does the same check
anyway, no need to do this twice.
(Also, make sure we exit the function early on failure)
And similar for other settings that require a writable /var/.
Rationale: if these options are used for early-boot services (such as
systemd-pstore.service) we need /var/ writable. And if /var/ is on the
root fs, then systemd-remount-fs.service is the service that ensures
that /var/ is writable.
This allows us to remove explicit deps in services such as
systemd-pstore.service.
A warning is emitted from sd_bus_message_{get,set}_priority. Those functions
are exposed by pystemd, so we have no easy way of checking if anything is
calling them.
Just making the functions always return without doing anything would be an
option, but then we could leave the caller with an undefined variable. So I
think it's better to make the functions emit a warnings and return priority=0
in the get operation.
Many of the convenience functions from sd-bus operate on verbose sets
of discrete strings for destination/path/interface/member.
For most callers, destination/path/interface are uniform, and just the
member is distinct.
This commit introduces a new struct encapsulating the
destination/path/interface pointers called BusAddress, and wrapper
functions which take a BusAddress* instead of three strings, and just
pass the encapsulated strings on to the sd-bus convenience functions.
Future commits will update call sites to use these helpers throwing
out a bunch of repetitious destination/path/interface strings littered
throughout the codebase, replacing them with some appropriately named
static structs passed by pointer to these new helpers.
From https://github.com/microsoft/WSL/issues/423#issuecomment-221627364:
> it's unlikely we'll change it to something that doesn't contain "Microsoft"
> or "WSL".
... but well, it happened. If they change it incompatibly w/o adding an stable
detection mechanism, I think we should not add yet another detection method.
But adding a different casing of "microsoft" is not a very big step, so let's
do that.
Follow-up for #11932.
On certain distributions such as NixOS the mtime of `/etc/hosts` is
locked to a fixed value. In such cases, only checking the last mtime of
`/etc/hosts` is not enough - we also need to check if the st_ino/st_dev
match up. Thus, let's make sure make sure that systemd-resolved also
rereads `/etc/hosts` if the inode or the device containing `/etc/hosts` changes.
Test script:
```bash
hosts="/etc/hosts"
echo "127.0.0.1 testpr" > "hosts_new"
mv "hosts_new" "$hosts"
resolvectl query testpr || exit 1
mtime="$(stat -c %y "$hosts")"
echo "127.0.0.1 newhost" > "hosts_tmp"
touch -d "$mtime" "hosts_tmp"
install -p "hosts_tmp" "$hosts"
sleep 10
resolvectl query newhost || exit 1
rm -f "hosts_tmp"
```
Closes#14456.
split() and FOREACH_WORD really should die, and everything be moved to
extract_first_word() and friends, but let's at least make sure that for
the remaining code using it we can't deadlock by not progressing in the
word iteration.
Fixes: #15305
Merging by hand because github refuses merging because "Rebasing the commits of
this branch on top of the base branch cannot be performed automatically as this
would create a different result than a regular merge.".
Consumers of the sd-bus convenience API can't make convenience
helpers of their own without va_list variants.
This commit is a mechanical change splitting out the existing function
bodies into bare va_list variants having a 'v' suffixed to the names.
The original functions now simply create the va_list before forwarding
the call on to the va_list variant, and the va_list variants dispense
with those steps.
The function sd_device_get_property_value has some paths where it exits without
touching the n pointer. In those cases, n remained uninitialized until it was
eventually read inside isempty where it caused the segmentation fault.
Fixes#15078
For now, this function is nearly equivalent to the si_uint64 parser, except for
an additional range check as Linux only takes 32-bit values as bitrates. In
future, this may also be used to introduce fancier bitrate config formats.
sd_bus_reply_method_errno already does the same two checks
(sd_bus_error_is_set(error), r < 0) internally. But it did them in opposite
order. The effect is the same, because sd_bus_reply_method_errno falls back to
sd_bus_reply_method_error, but it seems inelegant. So let's simplify
bus_maybe_reply_error() to offload the job fully to sd_bus_reply_method_errno().
No functional change.
Some fdopendir() calls remain where safe_close() is manually
performed, those could be simplified as well by converting to
use the _cleanup_close_ machinery, but makes things less trivial
to review so left for a future cleanup.
With the addition of _cleanup_close_ there's a repetitious
pattern of assigning -1 to the fd after a successful fdopen
to prevent its close on cleanup now that the FILE * owns the
fd.
This introduces a wrapper that instead takes a pointer to the
fd being opened, and always overwrites the fd with -1 on success.
A future commit will cleanup all the fdopen call sites to use the
wrapper and elide the manual -1 fd assignment.
When we are supposed to accept numeric UIDs formatted as string, then
let's check that first, before passing things on to
valid_user_group_name_full(), since that might log about, and not the
other way round.
See: #15201
Follow-up for: 93c23c9297
https://github.com/systemd/systemd/pull/14133 made
capability_ambient_set_apply() acquire capabilities that were explicitly
asked for and drop all others. This change means the function is called
even with an empty capability set, opening up a code path for users
without ambient capabilities to call this function. This function will
error with EINVAL out on kernels < 4.3 because PR_CAP_AMBIENT is not
understood. This turns capability_ambient_set_apply() into a noop for
kernels < 4.3
Fixes https://github.com/systemd/systemd/issues/15225
Functions called from device_setup_unit() already make sure that unit is
enqueued in case it is a new unit or properties exported on the bus have
changed.
This should prevent unnecessary DBus wakeups and associated DBus traffic
when device_setup_unit() was called while reparsing /proc/self/mountinfo
due to the mountinfo notifications. Note that we parse
/proc/self/mountinfo quite often on the busy systems (e.g. k8s container
hosts) but majority of the time mounts didn't change, only some mount
got added. Thus we don't need to generate PropertiesChanged for devices
associated with the mounts that didn't change.
Thanks to Renaud Métrich <rmetrich@redhat.com> for debugging the
problem and providing draft version of the patch.