LOOP_CONFIGURE allows us to configure a loopback device in one ioctl
instead of two, which is not just faster but also removes the race that
udev might start probing the device before we adjusted things properly.
Unfortunately LOOP_CONFIGURE is broken in regards to LO_FLAGS_PARTSCAN
as of kernel 5.8.0. This patch contains a work-around for that, to
fallback to old behaviour if partition scanning is requested but does
not work. Sucks a bit.
Proposed upstream fix for that issue:
https://lkml.org/lkml/2020/8/6/97
Given a string in the format 'one:two three four:five', returns a string
vector with each word. If the second element of the tuple is not
present, an empty string is returned in its place, so that the vector
can be processed in pairs.
[zjs: use EXTRACT_UNESCAPE_SEPARATORS instead of EXTRACT_CUNESCAPE_RELAX.
This way we do escaping exactly once and in normal strict mode.]
We would add and remove definitions based on which colors were used by other
code. Let's just define all of them to simplify tests and allow easy comparisons
which colors look good.
Let's split this out into its own helper function we can reuse at
various places.
Also, let's avoid signed values where we can so that we can cover more
of the available time range.
We never return anything higher than 63, so using "long unsigned"
as the type only confused the reader. (We can still use "long unsigned"
and safe_atolu() to parse the kernel file.)
We would refuse to print capabilities which were didn't have a name
for. The kernel adds new capabilities from time to time, most recently
cap_bpf. 'systmectl show -p CapabilityBoundingSet ...' would fail with
"Failed to parse bus message: Invalid argument" because
capability_set_to_string_alloc() would fail with -EINVAL. So let's
print such capabilities in hexadecimal:
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search
cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap
cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin
cap_net_raw cap_ipc_lock cap_ipc_owner 0x10 0x11 0x12 0x13 0x14 0x15 0x16
0x17 0x18 0x19 0x1a ...
For symmetry, also allow capabilities that we don't know to be specified.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1853736.
https://tools.ietf.org/html/draft-knodel-terminology-02https://lwn.net/Articles/823224/
This gets rid of most but not occasions of these loaded terms:
1. scsi_id and friends are something that is supposed to be removed from
our tree (see #7594)
2. The test suite defines an API used by the ubuntu CI. We can remove
this too later, but this needs to be done in sync with the ubuntu CI.
3. In some cases the terms are part of APIs we call or where we expose
concepts the kernel names the way it names them. (In particular all
remaining uses of the word "slave" in our codebase are like this,
it's used by the POSIX PTY layer, by the network subsystem, the mount
API and the block device subsystem). Getting rid of the term in these
contexts would mean doing some major fixes of the kernel ABI first.
Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
Presently, CLI utilities such as systemctl will check whether they have a tty
attached or not to decide whether to parse /proc/cmdline or EFI variable
SystemdOptions looking for systemd.log_* entries.
But this check will be misleading if these tools are being launched by a
daemon, such as a monitoring daemon or automation service that runs in
background.
Make log handling of CLI tools uniform by never checking /proc/cmdline or EFI
variables to determine the logging level.
Furthermore, introduce a new log_setup_cli() shortcut to set up common options
used by most command-line utilities.
Also use double space before the tracking args at the end. Without
the comma this looks ugly, but it's a bit better with the double space.
At least it doesn't look like a variable with a type.
This combines set_ensure_allocated() with set_consume(). The cool thing is that
because we know the hash ops, we can correctly free the item if appropriate.
Similarly to set_consume(), the goal is to simplify handling of the case where
the item needs to be freed on error and if already present in the set.
* Drop mac_selinux_use() condition from mac_selinux_free(): if the
passed pointer holds memory we want to free it even if SELinux is
disabled
* Drop NULL-check cause man:freecon(3) states that freecon(NULL) is a
well-defined NOP
* Assert that on non-SELinux builds the passed pointer is always NULL,
to avoid memory leaks
This just adds a _cleanup_ helper call encapsulating dlclose().
This also means libsystemd-shared is linked against libdl now. I don't
think this is much of an issue, since libdl is part of glibc anyway, and
anything from exotic. It's not an optional part of the OS (think: NSS
requires dynamic linking), hence this pulls in no deps and is almost
certainly loaded into all process' memory anyway.
[zj: use DEFINE_TRIVIAL_CLEANUP_FUNC().]
It's such a common operation to allocate the set and put an item in it,
that it deserves a helper. set_ensure_put() has the same return values
as set_put().
Comes with tests!
../src/core/main.c: In function 'main':
../src/core/main.c:2637:32: error: implicit declaration of function 'cache_efi_options_variable'; did you mean 'systemd_efi_options_variable'? [-Werror=implicit-function-declaration]
(void) cache_efi_options_variable();
^~~~~~~~~~~~~~~~~~~~~~~~~~
systemd_efi_options_variable
The original logic was logging an "ignored" debug message, but it was still
going ahead and calling proc_cmdline_parse_given() on the NULL line. Fix that
to skip that explicitly when the EFI variable wasn't really read.
Cache it early in startup of the system manager, right after `/run/systemd` is
created, so that further access to it can be done without accessing the EFI
filesystem at all.
Prompted by the discussion on #16110, let's migrate more code to
fd_wait_for_event().
This only leaves 7 places where we call into poll()/poll() directly in
our entire codebase. (one of which is fd_wait_for_event() itself)
"less" doesn't properly reset its terminal on SIGTERM, it does so only
on SIGINT. Let's thus configure SIGINT instead of SIGTERM.
I think this is something less should fix too, and clean up things
correctly on SIGTERM, too. However, given that we explicitly enable
SIGINT behaviour by passing "K" to $LESS I figure it makes sense if we
also send SIGINT instead of SIGTERM to match it.
Fixes: #16084
poll() sets POLLNVAL inside of the poll structures if an invalid fd is
passed. So far we generally didn't check for that, thus not taking
notice of the error. Given that this specific kind of error is generally
indication of a programming error, and given that our code is embedded
into our projects via NSS or because people link against our library,
let's explicitly check for this and convert it to EBADF.
(I ran into a busy loop because of this missing check when some of my
test code accidentally closed an fd it shouldn't close, so this is a
real thing)
The usual behaviour when a timeout expires is to terminate/kill the
service. This is what user usually want in production systems. To debug
services that fail to start/stop (especially sporadic failures) it
might be necessary to trigger the watchdog machinery and write core
dumps, though. Likewise, it is usually just a waste of time to
gracefully stop a stuck service. Instead it might save time to go
directly into kill mode.
This commit adds two new options to services: TimeoutStartFailureMode=
and TimeoutStopFailureMode=. Both take the same values and tweak the
behavior of systemd when a start/stop timeout expires:
* 'terminate': is the default behaviour as it has always been,
* 'abort': triggers the watchdog machinery and will send SIGABRT
(unless WatchdogSignal was changed) and
* 'kill' will directly send SIGKILL.
To handle the stop failure mode in stop-post state too a new
final-watchdog state needs to be introduced.
Let's allow "-0" as alternative to "+0" and "0" when parsing integers,
unless the new SAFE_ATO_REFUSE_PLUS_MINUS flag is specified.
In cases where allowing the +/- syntax shall not be allowed
SAFE_ATO_REFUSE_PLUS_MINUS is the right flag to use, but this also means
that -0 as only negative integer that fits into an unsigned value should
be acceptable if the flag is not specified.
Quoting https://github.com/systemd/systemd/issues/14828#issuecomment-635212615:
> [kernel uses] msleep_interruptible() and that means when the process receives
> any kind of signal masked or not this will abort with EINTR. systemd-logind
> gets signals from the TTY layer all the time though.
> Here's what might be happening: while logind reads the EFI stuff it gets a
> series of signals from the TTY layer, which causes the read() to be aborted
> with EINTR, which means logind will wait 50ms and retry. Which will be
> aborted again, and so on, until quite some time passed. If we'd not wait for
> the 50ms otoh we wouldn't wait so long, as then on each signal we'd
> immediately retry again.
We would parse numbers with base prefixes as user identifiers. For example,
"0x2b3bfa0" would be interpreted as UID==45334432 and "01750" would be
interpreted as UID==1000. This parsing was used also in cases where either a
user/group name or number may be specified. This means that names like
0x2b3bfa0 would be ambiguous: they are a valid user name according to our
documented relaxed rules, but they would also be parsed as numeric uids.
This behaviour is definitely not expected by users, since tools generally only
accept decimal numbers (e.g. id, getent passwd), while other tools only accept
user names and thus will interpret such strings as user names without even
attempting to convert them to numbers (su, ssh). So let's follow suit and only
accept numbers in decimal notation. Effectively this means that we will reject
such strings as a username/uid/groupname/gid where strict mode is used, and try
to look up a user/group with such a name in relaxed mode.
Since the function changed is fairly low-level and fairly widely used, this
affects multiple tools: loginctl show-user/enable-linger/disable-linger foo',
the third argument in sysusers.d, fourth and fifth arguments in tmpfiles.d,
etc.
Fixes#15985.
"internal" is a lot of characters. Let's take a leaf out of the Python's book
and simply use _ to mean private. Much less verbose, but the meaning is just as
clear, or even more.
This is a safey net anyway, let's make it fully safe: if the data ends
on an uneven byte, then we need to complete the UTF-16 codepoint first,
before adding the final NUL byte pair. Hence let's suffix with three
NULs, instead of just two.
Tracking down #15931 confused the hell out of me, since running homed in
gdb from the command line worked fine, but doing so as a service failed.
Let's make this more debuggable and check if we live in the host netns
when allocating a new udev monitor.
This is just debug stuff, so that if things don't work, a quick debug
run will reveal what is going on.
That said, while we are at it, also fix unexpected closing of passed in
fd when failing.
Possibly fixes#15220. (There might be another leak. I'm still investigating.)
The leak would occur when the path cache was rebuilt. So in normal circumstances
it wouldn't be too bad, since usually the path cache is not rebuilt too often. But
the case in #15220, where new unit files are created in a loop and started, the leak
occurs once for each unit file:
$ for i in {1..300}; do cp ~/.config/systemd/user/test0001.service ~/.config/systemd/user/test$(printf %04d $i).service; systemctl --user start test$(printf %04d $i).service;done
Let's be more thorough that whenever we build a unit name based on
parameters, that the result is actually a valid user name. If it isn't
fail early.
This should allows us to catch various issues earlier, in particular
when we synthesize mount units from /proc/self/mountinfo: instead of
actually attempting to allocate a mount unit we will fail much earlier
when we build the name to synthesize the unit under. Failing early is a
good thing generally.
And do not use it in the IMPORT{cmdline} udev code. Wherever we expose
direct interfaces to check the kernel cmdline, let's not consult our
systemd-specific EFI variable, but strictly use the actual kernel
variable, because that's what we claim we do. i.e. it's fine to use the
EFI variable for our own settings, but for the generic APIs to the
kernel cmdline we should not use it.
Specifically, this applies to IMPORT{cmdline} and
ConditionKernelCommandLine=. In the latter case we weren#t checking the
EFI variable anyway, hence let's do the same for the udev case, too.
Fixes: #15739
It annoyed me for quite a while that running "journalctl --file=…" on a
file that is not readable failed with a "File not found" error instead
of a permission error. Let's fix that.
We make this work by using the GLOB_NOCHECK flag for glob() which means
that files are not accessible will be returned in the array as they are
instead of being filtered away. This then means that our later attemps
to open the files will fail cleanly with a good error message.
kernel 5.6 added support for a new flag for getrandom(): GRND_INSECURE.
If we set it we can get some random data out of the kernel random pool,
even if it is not yet initializated. This is great for us to initialize
hash table seeds and such, where it is OK if they are crap initially. We
used RDRAND for these cases so far, but RDRAND is only available on
newer CPUs and some archs. Let's now use GRND_INSECURE for these cases
as well, which means we won't needlessly delay boot anymore even on
archs/CPUs that do not have RDRAND.
Of course we never set this flag when generating crypto keys or uuids.
Which makes it different from RDRAND for us (and is the reason I think
we should keep explicit RDRAND support in): RDRAND we don't trust enough
for crypto keys. But we do trust it enough for UUIDs.
As described in #15603, it is a fairly common setup to use a fqdn as the
configured hostname. But it is often convenient to use just the actual
hostname, i.e. until the first dot. This adds support in tmpfiles, sysusers,
and unit files for %l which expands to that.
Fixes#15603.
This new helper checks whether the specified locale is installed. It's
distinct from locale_is_valid() which just superficially checks if a
string looks like something that could be a valid locale.
Heavily inspired by @jsynacek's #13964.
Replaces: #13964
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.
Fixes: #14828
Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.
WSL2 will soon (TM) include the "WSL2" string in /proc/sys/kernel/osrelease
so the workaround will no longer be necessary.
We have several different cloud images which do include the "microsoft"
string already, which would break this detection. They are for internal
usage at the moment, but the userspace side can come from all over the
place so it would be quite hard to track and downstream-patch to avoid
breakages.
This reverts commit a2f838d590.
On my laptop (Lenovo X1carbo 4th) I very occasionally see test-boot-timestamps
fail with this tb:
262/494 test-boot-timestamps FAIL 0.7348453998565674 s (killed by signal 6 SIGABRT)
08:12:48 SYSTEMD_LANGUAGE_FALLBACK_MAP='/home/zbyszek/src/systemd/src/locale/language-fallback-map' SYSTEMD_KBD_MODEL_MAP='/home/zbyszek/src/systemd/src/locale/kbd-model-map' PATH='/home/zbyszek/src/systemd/build:/home/zbyszek/.local/bin:/usr/lib64/qt-3.3/bin:/usr/share/Modules/bin:/usr/condabin:/usr/lib64/ccache:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/zbyszek/bin:/var/lib/snapd/snap/bin' /home/zbyszek/src/systemd/build/test-boot-timestamps
--- stderr ---
Failed to read $container of PID 1, ignoring: Permission denied
Found container virtualization none.
Failed to get SystemdOptions EFI variable, ignoring: Interrupted system call
Failed to read ACPI FPDT: Permission denied
Failed to read LoaderTimeInitUSec: Interrupted system call
Failed to read EFI loader data: Interrupted system call
Assertion 'q >= 0' failed at src/test/test-boot-timestamps.c:84, function main(). Aborting.
Normally it takes ~0.02s, but here there's a slowdown to 0.73 and things fail with EINTR.
This happens only occasionally, and I haven't been able to capture a strace.
It would be to ignore that case in test-boot-timestamps or always translate
EINTR to -ENODATA. Nevertheless, I think it's better to retry, since this gives
as more resilient behaviour and avoids a transient failure.
See
https://github.com/torvalds/linux/blob/master/fs/efivarfs/file.c#L75
and
bef3efbeb8.
When nothing at all is mounted at /sys/fs/cgroup, the fs.f_type is
SYSFS_MAGIC (0x62656572) which results in the confusing debug log:
"Unknown filesystem type 62656572 mounted on /sys/fs/cgroup."
Instead, if the f_type is SYSFS_MAGIC, a more accurate message is:
"No filesystem is currently mounted on /sys/fs/cgroup."
let's return ENOSYS in that case, to make things a bit less confusng.
Previously we'd just propagate ENOENT, which people might mistake as
applying to the object being modified rather than /proc/ just not being
there.
Let's return ENOSYS instead, i.e. an error clearly indicating that some
kernel API is not available. This hopefully should put people on a
better track.
Note that we only do the procfs check in the error path, which hopefully
means it's the less likely path.
We probably can add similar bits to more suitable codepaths dealing with
/proc/self/fd, but for now, let's pick to the ones noticed in #14745.
Fixes: #14745
When accessing journal files we generally are fine when values change
beneath our feet, while we are looking at them, as long as they change
from something valid to zero. This is required since we nowadays
forcibly unallocate journal files on vacuuming, to ensure they are
actually released.
However, we need to make sure that the validity checks we enforce are
done on suitable copies of the fields in the file. Thus provide a macro
that forces a copy, and disallows the compiler from merging our copy
with the actually memory where it is from.
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
ubsan complains that we add an offset to a NULL ptr here in some cases.
Which isn't really a bug though, since we only use it as the end
condition for a for loop, but we can still fix it...
Fixes: #15522
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.
Fixes: #14972
ebf963c551 changed the 'sep' argument to always
be either " " or "\n", which broke the indentation logic for the first line
in base64_append_width(). Since it now always is one character, and never NULL,
let's change the type to char and simplify the logic a bit.
$ COLUMNS=30 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1
AwEAAcLPVEcg0hFBheXQf
QOqqLiRgckk69o2KTAsq3
lNRY0c9mnEjzZDGsGmXNy
2EQ6yelkIYYus7KLor2Fz
x59hEqcM82zqkdHV6hXvZ
yjxxSHG3nl8xQS6gF8mdI
YouDTWWhTInfjSKoIeDok
Hq3S67EjSngV7/wVCMTbI
amS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
$ COLUMNS=120 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1 AwEAAcLPVEcg0hFBheXQfQOqqLiRgckk69o2KTAsq3lNRY0c9mnEjzZDGsGmXNy2EQ6yelkIYYus7KLor
2Fzx59hEqcM82zqkdHV6hXvZyjxxSHG3nl8xQS6gF8mdIYouDTWWhTInfjSKoIeDokHq3S67EjSngV7/w
VCMTbIamS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
proot provides userspace-powered emulation of chroot and mount --bind,
lending it to be used on environments without unprivileged user
namespaces, or in otherwise restricted environments like Android.
In order to achieve this, proot makes use of the kernel's ptrace()
facility, which we can use in order to detect its presence. Since it
doesn't use any kind of namespacing, including PID namespacing, we don't
need to do any tricks when trying to get the tracer's metadata.
For our purposes, proot is listed as a "container", since we mostly use
this also as the bucket for non-container-but-container-like
technologies like WSL. As such, it seems like a good fit for this
section as well.
An stdio FILE* stream usually refers to something with a file
descriptor, but that's just "usually". It doesn't have to, when taking
fmemopen() and similar into account. Most of our calls to fileno()
assumed the call couldn't fail. In most cases this was correct, but in
some cases where we didn't know whether we work on files or memory we'd
use the returned fd as if it was unconditionally valid while it wasn't,
and passed it to a multitude of kernel syscalls. Let's fix that, and do
something reasonably smart when encountering this case.
(Running test-fileio with this patch applied will remove tons of ioctl()
calls on -1).
This patch changes the way user managers set the default umask for the units it
manages.
Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.
Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.
Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.
Fixes#6077.
non-underlined yellow uses RGB ANSI sequences while the underlined
version uses the paletted ANSI sequences. Let's unify that and use the
RGB sequence for both cases, so that underlined or not doesn't alter the
color.
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.
The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)
The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…
This effectively liberaralizes a lot what we expect from usernames.
The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.
Fixes: #15149#15090
From https://github.com/microsoft/WSL/issues/423#issuecomment-221627364:
> it's unlikely we'll change it to something that doesn't contain "Microsoft"
> or "WSL".
... but well, it happened. If they change it incompatibly w/o adding an stable
detection mechanism, I think we should not add yet another detection method.
But adding a different casing of "microsoft" is not a very big step, so let's
do that.
Follow-up for #11932.
split() and FOREACH_WORD really should die, and everything be moved to
extract_first_word() and friends, but let's at least make sure that for
the remaining code using it we can't deadlock by not progressing in the
word iteration.
Fixes: #15305
Some fdopendir() calls remain where safe_close() is manually
performed, those could be simplified as well by converting to
use the _cleanup_close_ machinery, but makes things less trivial
to review so left for a future cleanup.
With the addition of _cleanup_close_ there's a repetitious
pattern of assigning -1 to the fd after a successful fdopen
to prevent its close on cleanup now that the FILE * owns the
fd.
This introduces a wrapper that instead takes a pointer to the
fd being opened, and always overwrites the fd with -1 on success.
A future commit will cleanup all the fdopen call sites to use the
wrapper and elide the manual -1 fd assignment.
When we are supposed to accept numeric UIDs formatted as string, then
let's check that first, before passing things on to
valid_user_group_name_full(), since that might log about, and not the
other way round.
See: #15201
Follow-up for: 93c23c9297
https://github.com/systemd/systemd/pull/14133 made
capability_ambient_set_apply() acquire capabilities that were explicitly
asked for and drop all others. This change means the function is called
even with an empty capability set, opening up a code path for users
without ambient capabilities to call this function. This function will
error with EINVAL out on kernels < 4.3 because PR_CAP_AMBIENT is not
understood. This turns capability_ambient_set_apply() into a noop for
kernels < 4.3
Fixes https://github.com/systemd/systemd/issues/15225
I put the helper functions in a separate header file, because they don't fit
anywhere else. pthread_mutex_{lock,unlock} is used in two places: nss-systemd
and hashmap. I don't indent to convert hashmap to use the helpers, because
there it'd make the code more complicated. Is it worth to create a new header
file even if the only use is in nss-systemd.c? I think yes, because it feels
clean and also I think it's likely that pthread_mutex_{lock,unlock} will be
used in other places later.