As described in #15603, it is a fairly common setup to use a fqdn as the
configured hostname. But it is often convenient to use just the actual
hostname, i.e. until the first dot. This adds support in tmpfiles, sysusers,
and unit files for %l which expands to that.
Fixes#15603.
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.
Fixes: #14828
Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.
WSL2 will soon (TM) include the "WSL2" string in /proc/sys/kernel/osrelease
so the workaround will no longer be necessary.
We have several different cloud images which do include the "microsoft"
string already, which would break this detection. They are for internal
usage at the moment, but the userspace side can come from all over the
place so it would be quite hard to track and downstream-patch to avoid
breakages.
This reverts commit a2f838d590.
On my laptop (Lenovo X1carbo 4th) I very occasionally see test-boot-timestamps
fail with this tb:
262/494 test-boot-timestamps FAIL 0.7348453998565674 s (killed by signal 6 SIGABRT)
08:12:48 SYSTEMD_LANGUAGE_FALLBACK_MAP='/home/zbyszek/src/systemd/src/locale/language-fallback-map' SYSTEMD_KBD_MODEL_MAP='/home/zbyszek/src/systemd/src/locale/kbd-model-map' PATH='/home/zbyszek/src/systemd/build:/home/zbyszek/.local/bin:/usr/lib64/qt-3.3/bin:/usr/share/Modules/bin:/usr/condabin:/usr/lib64/ccache:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/zbyszek/bin:/var/lib/snapd/snap/bin' /home/zbyszek/src/systemd/build/test-boot-timestamps
--- stderr ---
Failed to read $container of PID 1, ignoring: Permission denied
Found container virtualization none.
Failed to get SystemdOptions EFI variable, ignoring: Interrupted system call
Failed to read ACPI FPDT: Permission denied
Failed to read LoaderTimeInitUSec: Interrupted system call
Failed to read EFI loader data: Interrupted system call
Assertion 'q >= 0' failed at src/test/test-boot-timestamps.c:84, function main(). Aborting.
Normally it takes ~0.02s, but here there's a slowdown to 0.73 and things fail with EINTR.
This happens only occasionally, and I haven't been able to capture a strace.
It would be to ignore that case in test-boot-timestamps or always translate
EINTR to -ENODATA. Nevertheless, I think it's better to retry, since this gives
as more resilient behaviour and avoids a transient failure.
See
https://github.com/torvalds/linux/blob/master/fs/efivarfs/file.c#L75
and
bef3efbeb8.
When nothing at all is mounted at /sys/fs/cgroup, the fs.f_type is
SYSFS_MAGIC (0x62656572) which results in the confusing debug log:
"Unknown filesystem type 62656572 mounted on /sys/fs/cgroup."
Instead, if the f_type is SYSFS_MAGIC, a more accurate message is:
"No filesystem is currently mounted on /sys/fs/cgroup."
let's return ENOSYS in that case, to make things a bit less confusng.
Previously we'd just propagate ENOENT, which people might mistake as
applying to the object being modified rather than /proc/ just not being
there.
Let's return ENOSYS instead, i.e. an error clearly indicating that some
kernel API is not available. This hopefully should put people on a
better track.
Note that we only do the procfs check in the error path, which hopefully
means it's the less likely path.
We probably can add similar bits to more suitable codepaths dealing with
/proc/self/fd, but for now, let's pick to the ones noticed in #14745.
Fixes: #14745
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
ubsan complains that we add an offset to a NULL ptr here in some cases.
Which isn't really a bug though, since we only use it as the end
condition for a for loop, but we can still fix it...
Fixes: #15522
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.
Fixes: #14972
ebf963c551 changed the 'sep' argument to always
be either " " or "\n", which broke the indentation logic for the first line
in base64_append_width(). Since it now always is one character, and never NULL,
let's change the type to char and simplify the logic a bit.
$ COLUMNS=30 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1
AwEAAcLPVEcg0hFBheXQf
QOqqLiRgckk69o2KTAsq3
lNRY0c9mnEjzZDGsGmXNy
2EQ6yelkIYYus7KLor2Fz
x59hEqcM82zqkdHV6hXvZ
yjxxSHG3nl8xQS6gF8mdI
YouDTWWhTInfjSKoIeDok
Hq3S67EjSngV7/wVCMTbI
amS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
$ COLUMNS=120 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1 AwEAAcLPVEcg0hFBheXQfQOqqLiRgckk69o2KTAsq3lNRY0c9mnEjzZDGsGmXNy2EQ6yelkIYYus7KLor
2Fzx59hEqcM82zqkdHV6hXvZyjxxSHG3nl8xQS6gF8mdIYouDTWWhTInfjSKoIeDokHq3S67EjSngV7/w
VCMTbIamS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
proot provides userspace-powered emulation of chroot and mount --bind,
lending it to be used on environments without unprivileged user
namespaces, or in otherwise restricted environments like Android.
In order to achieve this, proot makes use of the kernel's ptrace()
facility, which we can use in order to detect its presence. Since it
doesn't use any kind of namespacing, including PID namespacing, we don't
need to do any tricks when trying to get the tracer's metadata.
For our purposes, proot is listed as a "container", since we mostly use
this also as the bucket for non-container-but-container-like
technologies like WSL. As such, it seems like a good fit for this
section as well.
An stdio FILE* stream usually refers to something with a file
descriptor, but that's just "usually". It doesn't have to, when taking
fmemopen() and similar into account. Most of our calls to fileno()
assumed the call couldn't fail. In most cases this was correct, but in
some cases where we didn't know whether we work on files or memory we'd
use the returned fd as if it was unconditionally valid while it wasn't,
and passed it to a multitude of kernel syscalls. Let's fix that, and do
something reasonably smart when encountering this case.
(Running test-fileio with this patch applied will remove tons of ioctl()
calls on -1).