This effectively makes little difference because we exit soon later
anyway, which will close the fds, too. However, it's still useful since
it means the parent will get EOF events on them in the order we process
things and isn't delayed to process the data from the pipes until the
child dies.
Let's use a proper table for outputting partition information. Let's
also put the general information about the image first, and the table
after that.
Moreover, dissect the image before showing any output, so that we can
early on return an error if the image is not valid.
That way we can turn off kernel partition scanning if verity data is
available (as we don't support verity for full GPT images, only for
simple file system images).
LOOP_CONFIGURE allows us to configure a loopback device in one ioctl
instead of two, which is not just faster but also removes the race that
udev might start probing the device before we adjusted things properly.
Unfortunately LOOP_CONFIGURE is broken in regards to LO_FLAGS_PARTSCAN
as of kernel 5.8.0. This patch contains a work-around for that, to
fallback to old behaviour if partition scanning is requested but does
not work. Sucks a bit.
Proposed upstream fix for that issue:
https://lkml.org/lkml/2020/8/6/97
The machine-id is used to create a few directories and setup a default
loader entry in loader.conf. Having bootctl create the directories
itself is not particularly useful as it does not put anything in them
and bootctl install is not guaranteed to be called before an initramfs
tool like kernel-install so other programs will always need to have
logic to create the directories themselves if they happen to be called
before bootctl install is called.
On top of this, when using unified kernel images, these are installed to
$BOOT/EFI/Linux which removes the need to have the directories created
by bootctl at all. This further indicates that these directories should
be created by the program that puts something in them rather than by
bootctl.
Removing the machine-id dependency allows bootctl install to be called
even when there's no machine-id in the image. This is useful for image
builders such as mkosi which don't have a machine-id when
installing systemd-boot (via bootctl) because it should only be
generated by systemd when the final image is booted.
The default entry in loader.conf based on the machine-id in loader.conf
is also removed which shouldn't be a massive loss in usability overall.
This commit reverts commit 341890d.
Let's fix up invalid GECOS fields both when we convert from NSS to JSON
and the other way round.
Kinda sucks we have to do that, but NSS does it when writing data to
/etc/passwd, so let's do the same.
Fixes: #16668
User records have the realname/gecos fields, groups never had that, but
it would really be useful to have it, hence let's add it with similar
semantics.
We enforce the same syntax as for GECOS, since it's better to start with
strict rules and losen them later instead of the opposite.
The commit 1070d271fa which was supposed
too fix this does not seem to take effect any more. We get again 34%
compilation success rate while scanning systemd itself. Moreover, the
installed header file breaks compilation of programs that include it:
"/usr/include/systemd/_sd-common.h", line 23: error #35: #error directive: "Do
not include _sd-common.h directly; it is a private header."
# error "Do not include _sd-common.h directly; it is a private header."
^
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.
Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
Given a string in the format 'one:two three four:five', returns a string
vector with each word. If the second element of the tuple is not
present, an empty string is returned in its place, so that the vector
can be processed in pairs.
[zjs: use EXTRACT_UNESCAPE_SEPARATORS instead of EXTRACT_CUNESCAPE_RELAX.
This way we do escaping exactly once and in normal strict mode.]
This code was changed in this pull request:
https://github.com/systemd/systemd/pull/16571
After some discussion and more investigation, we better understand
what's going on. So, update the comment, so things are more clear
to future readers.
It is OK for some symbols to be missing. With this change, "test-nss sss" can
be used to test nss-sss without crashing.
$ build-rawhide/test-nss sss fedoraproject.org
======== sss ========
_nss_sss_gethostbyname4_r not defined
_nss_sss_gethostbyname3_r not defined
_nss_sss_gethostbyname3_r not defined
_nss_sss_gethostbyname3_r not defined
_nss_sss_gethostbyname3_r not defined
_nss_sss_gethostbyname2_r("fedoraproject.org", AF_INET) → status=NSS_STATUS_NOTFOUND
errno=0/--- h_errno=-1/Resolver internal error
_nss_sss_gethostbyname2_r("fedoraproject.org", AF_INET6) → status=NSS_STATUS_NOTFOUND
errno=0/--- h_errno=-1/Resolver internal error
_nss_sss_gethostbyname2_r("fedoraproject.org", *) → status=NSS_STATUS_UNAVAIL
errno=97/EAFNOSUPPORT h_errno=-1/Resolver internal error
_nss_sss_gethostbyname2_r("fedoraproject.org", AF_UNIX) → status=NSS_STATUS_UNAVAIL
errno=97/EAFNOSUPPORT h_errno=-1/Resolver internal error
_nss_sss_gethostbyname_r("fedoraproject.org") → status=NSS_STATUS_NOTFOUND
errno=0/--- h_errno=-1/Resolver internal error
We talked about the verification key, then about sealing keys, and then
about the verification key again. Let's shorten things a bit, and divide
the output in three paragraphs: one about the machine, one about the sealing
keys, and one about verification keys and the qr code with them.
From a report in https://bugzilla.redhat.com/show_bug.cgi?id=1861463:
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Trying to enqueue job usb-gadget.target/start/fail
usb-gadget.target: Failed to load configuration: No such file or directory
Assertion '!bus_error_is_dirty(e)' failed at src/libsystemd/sd-bus/bus-error.c:239, function bus_error_setfv(). Ignoring.
sys-devices-platform-soc-2100000.bus-2184000.usb-ci_hdrc.0-udc-ci_hdrc.0.device: Failed to enqueue SYSTEMD_WANTS= job, ignoring: Unit usb-gadget.target not found.
I *think* this is the place where the reuse occurs: we call
bus_unit_validate_load_state(unit, e) twice in a row.
This allows more granular access control in PolicyKit rules, similar to
/etc/sudoers, for polkit actions:
* org.freedesktop.machine1.host-shell
* org.freedesktop.machine1.shell
Example configuration, place in /etc/polkit-1/rules.d/
polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.machine1.host-shell"
&& subject.user == "my-user"
&& action.lookup("user") == "target-user") {
return polkit.Result.YES;
}
});
I happen to have a machine where /boot is not a separate mountpoint,
but rather just a directory under /. After upgrade to recent Fedora,
I found out that grub2 can't find any new kernels.
This happens because loadentry script generates kernel and initrd file
paths relative to /boot, while grub2 expects path to be relative to the
root of filesystem on which they are residing.
This commit fixes this issue by using stat's %m to find the mount point
of a partition holding the images, and using it as a prefix to be
removed from ENTRY_DIR_ABS.
Note that %m for stat requires coreutils 8.6, released in Oct 2010.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Apparently, if IO is still in flight at the moment we invoke LOOP_CLR_FD
it is likely simply dropped (probably because yanking physical storage,
such as a USB stick would drop it too). Let's protect ourselves against
that and always sync explicitly before we invoke it.
Fixed below systemd codesonar warning.
isprint() is invoked here with an argument of signed
type char, but only has defined behavior for int arguments that are
either representable as unsigned char or equal to the value
of macro EOF(-1).
As per codesonar report, in a number of libc implementations, isprint()
function implemented using lookup tables (arrays): passing in a
negative value can result in a read underrun.
The explicit limit is dropped, which means that we return to the kernel default
of 50% of RAM. See 362a55fc14 for a discussion why that is not as much as it
seems. It turns out various applications need more space in /dev/shm and we
would break them by imposing a low limit.
While at it, rename the define and use a single macro for various tmpfs mounts.
We don't really care what the purpose of the given tmpfs is, so it seems
reasonable to use a single macro.
This effectively reverts part of 7d85383edb. Fixes#16617.
The new retry intervals are [15, 20, 26, 34, 45, 60, 80, 106, 141, 188, 250,
333, 360, ...]. This should allow graceful response if a transient network
failure is encountered. Growth is exponential, but with a small power and
capped to a non-too-large value so that we resynchronize within a few minutes
after network is restored. I made the minimum 15 s to make sure that we never
send packets more often than that.
Fixes#16492.
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:
RootImageOptions=1:ro,dev 2:nosuid nodev
In absence of a partition number, 0 is assumed.
This should be enough to fix https://bugzilla.redhat.com/show_bug.cgi?id=1856514.
But the limit should be significantly higher than 10% anyway. By setting a
limit on /tmp at 10% we'll break many reasonable use cases, even though the
machine would deal fine with a much larger fraction devoted to /tmp.
(In the first version of this patch I made it 25% with the comment that
"Even 25% might be too low.". The kernel default is 50%, and we have been using
that seemingly without trouble since https://fedoraproject.org/wiki/Features/tmp-on-tmpfs.
So let's just make it 50% again.)
See 7d85383edb.
(Another consideration is that we learned from from the whole initiative with
zram in Fedora that a reasonable size for zram is 0.5-1.5 of RAM, and that pretty
much all systems benefit from having zram or zswap enabled. Thus it is reasonable
to assume that it'll become widely used. Taking the usual compression effectiveness
of 0.2 into account, machines have effective memory available of between
1.0 - 0.2*0.5 + 0.5 = 1.4 (for zram sized to 0.5 of RAM) and
1.0 - 0.2*1.5 + 1.5 = 2.2 (for zram 1.5 sized to 1.5 of RAM) times RAM size.
This means that the 10% was really like 7-4% of effective memory.)
A comment is added to mount-util.h to clarify that tmp.mount is separate.
Previously, on DHCPv4 address renewal, the old address may be removed
while the new address is not ready yet.
This also simplifies the logic of removing address and routes.
When we're disabling controller on a direct child of root cgroup, we
forgot to add root slice into cgroup realization queue, which prevented
proper disabling of the controller (on unified hierarchy).
The mechanism relying on "bounce from bottom and propagate up" in
unit_create_cgroup doesn't work on unified hierarchy (leaves needn't be
enabled). Drop it as we rely on the ancestors to be queued -- that's now
intentional but was artifact of combining the two patches:
cb5e3bc37d ("cgroup: Don't explicitly check for member in UNIT_BEFORE") v240~78
65f6b6bdcb ("core: fix re-realization of cgroup siblings") v245-rc1~153^2
Fixes: #14917
The current implementation is LIFO, which is a) confusing b) prevents
some ordered operations on the cgroup tree (e.g. removing children
before parents).
Fix it quickly. Current list implementation turns this from O(1) to O(n)
operation. Rework the lists later.
We frequently want to set a timer relative to the current time. Let's
add an explicit API for this. This not only saves us a few lines of code
everywhere and simplifies things, but also allows us to do correct
overflow checking.
The tests fail in Fedora's koji with a timeout. Let's just bump
the timeout:
--- stderr ---
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-exists.service: Failed to create cgroup /system.slice/kojid.service/path-exists.service: Permission denied
path-exists.service: Succeeded.
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-exists.service: Failed to create cgroup /system.slice/kojid.service/path-exists.service: Permission denied
path-exists.service: Succeeded.
path-exists.path: Succeeded.
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-existsglob.service: Failed to create cgroup /system.slice/kojid.service/path-existsglob.service: Permission denied
path-existsglob.service: Succeeded.
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-existsglob.service: Failed to create cgroup /system.slice/kojid.service/path-existsglob.service: Permission denied
path-existsglob.service: Succeeded.
path-existsglob.path: Succeeded.
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-changed.service: Failed to create cgroup /system.slice/kojid.service/path-changed.service: Permission denied
path-changed.service: Succeeded.
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-changed.service: Failed to create cgroup /system.slice/kojid.service/path-changed.service: Permission denied
path-changed.service: Succeeded.
path-changed.path: Succeeded.
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-modified.service: Failed to create cgroup /system.slice/kojid.service/path-modified.service: Permission denied
path-modified.service: Succeeded.
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-modified.service: Failed to create cgroup /system.slice/kojid.service/path-modified.service: Permission denied
path-modified.service: Succeeded.
path-modified.path: Succeeded.
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-mycustomunit.service: Failed to create cgroup /system.slice/kojid.service/path-mycustomunit.service: Permission denied
path-mycustomunit.service: Succeeded.
path-unit.path: Succeeded.
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-directorynotempty.service: Failed to create cgroup /system.slice/kojid.service/path-directorynotempty.service: Permission denied
path-directorynotempty.service: Succeeded.
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-directorynotempty.service: Failed to create cgroup /system.slice/kojid.service/path-directorynotempty.service: Permission denied
path-directorynotempty.service: Failed to attach to cgroup /system.slice/kojid.service/path-directorynotempty.service: No such file or directory
path-directorynotempty.service: Failed at step CGROUP spawning /bin/true: No such file or directory
path-directorynotempty.service: Main process exited, code=exited, status=219/CGROUP
path-directorynotempty.service: Failed with result 'exit-code'.
Test timeout when testing path-directorynotempty.path
When a prefix is delegated to an interface that is already sending
RAs, send an RA immediately to inform clients of the new prefix.
This allows them to start using it immediately instead of waiting
up to nearly 10 minutes (depending on when the last timed RA was
sent). This type of situation might occur if, for example, an
outage of the WAN connection caused the addresses and prefixes to
be lost and later regained after service was restored. The
condition for the number of RAs sent being above 0 simultaneously
ensures that RADV is already running and that this code doesn't
send any RAs before the timed RAs have started when the interface
first comes up.
Previously, we assumed that success meant we definitely got a valid
pointer. There is at least one edge case where this is not true (i.e.,
we can get both a 0 return value, and *also* a NULL pointer):
4246bb550d/libselinux/src/procattr.c (L175)
When this case occurrs, if we don't check the pointer we SIGSEGV in
early initialization.
Let's find the right os-release file on the host side, and only mount
the one that matters, i.e. /etc/os-release if it exists and
/usr/lib/os-release otherwise. Use the fixed path /run/host/os-release
for that.
Let's also mount /run/host as a bind mount on itself before we set up
/run/host, and let's mount it MS_RDONLY after we are done, so that it
remains immutable as a whole.
It needs to be world readable (unlike /etc/shadow) when created anew.
This fixes systems that boot with "systemd-nspawn --volatile=yes", i.e.
come up with an entirely empty /etc/ and thus no existing /etc/passwd
file when firstboot runs.
We want our OS trees to be MS_SHARED by default, so that our service
namespacing logic can work correctly. Thus in nspawn we mount everything
MS_SHARED when organizing our tree. We do this early on, before changing
the user namespace (if that's requested). However CLONE_NEWUSER actually
resets MS_SHARED to MS_SLAVE for all mounts (so that less privileged
environments can't affect the more privileged ones). Hence, when
invoking it we have to reset things to MS_SHARED afterwards again. This
won't reestablish propagation, but it will make sure we get a new set of
mount peer groups everywhere that then are honoured for the mount
namespaces/propagated mounts set up inside the container further down.
This API is a complete mess. We forgot to do a hashed comparison for duplicate
entries and we use a direct pointer comparison. For trivial_hash_ops the result
is the same. For all other case, it's not. Fixing this properly will require
auditing all the uses of set_put() and ordered_set_put(). For now, let's just
acknowledge the breakage.
We would add and remove definitions based on which colors were used by other
code. Let's just define all of them to simplify tests and allow easy comparisons
which colors look good.
There are a lot of edge cases that the current implementation
doesn't handle, especially in cases where one of passwd/shadow
exists and the other doesn't exist. For example, if
--root-password is specified, we will write /etc/shadow but
won't add a root entry to /etc/passwd if there is none.
To fix some of these issues, we constrain systemd-firstboot to
only modify /etc/passwd and /etc/shadow if both do not exist
already (or --force) is specified. On top of that, we calculate
all necessary information for both passwd and shadow upfront so
we can take it all into account when writing the actual files.
If no root password options are given --force is specified or both
files do not exist, we lock the root account for security purposes.
Fixes#16401.
c80a9a33d0 introduced the .can_fail field,
but didn't set it on .targets. Targets can fail through dependencies.
This leaves .slice and .device units as the types that cannot fail.
$ systemctl cat bad.service bad.target bad-fallback.service
[Service]
Type=oneshot
ExecStart=false
[Unit]
OnFailure=bad-fallback.service
[Service]
Type=oneshot
ExecStart=echo Fixing everythign!
$ sudo systemctl start bad.target
systemd[1]: Starting bad.service...
systemd[1]: bad.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: bad.service: Failed with result 'exit-code'.
systemd[1]: Failed to start bad.service.
systemd[1]: Dependency failed for bad.target.
systemd[1]: bad.target: Job bad.target/start failed with result 'dependency'.
systemd[1]: bad.target: Triggering OnFailure= dependencies.
systemd[1]: Starting bad-fallback.service...
echo[46901]: Fixing everythign!
systemd[1]: bad-fallback.service: Succeeded.
systemd[1]: Finished bad-fallback.service.
The Address objects in the set generated by ndisc_router_generate_addresses()
have the equivalent prefixlen, flags, prefered lifetime.
This commit makes ndisc_router_generate_addresses() return Set of
in6_addr.
The CI occasionally fail in test-path with a timeout. test-path loads
units from the filesystem, and this conceivably might take more than
the default limit of 3 s. Increase the timeout substantially to see if
this helps.
Opening a verity device is an expensive operation. The kernelspace operations
are mostly sequential with a global lock held regardless of which device
is being opened. In userspace jumps in and out of multiple libraries are
required. When signatures are used, there's the additional cryptographic
checks.
We know when two devices are identical: they have the same root hash.
If libcrypsetup returns EEXIST, double check that the hashes are really
the same, and that either both or none have a signature, and if everything
matches simply remount the already open device. The kernel will do
reference counting for us.
In order to quickly and reliably discover if a device is already open,
change the node naming scheme from '/dev/mapper/major:minor-verity' to
'/dev/mapper/$roothash-verity'.
Unfortunately libdevmapper is not 100% reliable, so in some case it
will say that the device already exists and it is active, but in
reality it is not usable. Fallback to an individually-activated
unique device name in those cases for robustness.
../src/home/homectl-pkcs11.c:19:13: warning: ‘pkcs11_callback_data_release’ defined but not used [-Wunused-function]
19 | static void pkcs11_callback_data_release(struct pkcs11_callback_data *data) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
decompress_blob_zstd() would allocate ever bigger buffers in a loop trying to
get a buffer big enough to decompress the input data. This is wasteful, since
we can just query the size of the decompressed data from the compressed header.
Worse, it doesn't work when the output size is capped, i.e. when dst_max != 0.
If the decompressed blob happened to be bigger than dst_max, decompression
would fail with -ENOBUFS. We need to use "stream decompression" instead, and
only get min(uncompressed size, dst_max) bytes of output.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856037 in a second way.
SD_JOURNAL_FOREACH_DATA() and SD_JOURNAL_FOREACH_UNIQUE() would immediately
terminate when a field couldn't be accessed. This can happen for example when a
field is compressed with an unavailable compression format. But it's likely
that this is the wrong thing to do: the caller for example might want to
iterate over the fields but isn't interested in all of them. coredumpctl is
like this: it uses SD_JOURNAL_FOREACH_DATA() but only uses a subset of the
fields.
Add two new functions sd_journal_enumerate_good_data() and
sd_journal_enumerate_good_unique() that retry sd_journal_enumerate_data() and
sd_journal_enumerate_unique() if the return value is something that applies to
a single field: ENOBUS, E2BIG, EOPNOTSUPP.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1856037.
An alternative would be to make the macros themselves smarter instead of adding
new symbols, and do the looping internally in the macro. I don't like that
approach for two reasons. First, it would embed the logic in the macro, so
recompilation would be required if we decide to update the logic. With the
current version of the patch, recompilation is required to use the new symbols,
but after that, library upgrades are enough. So the current approach is safer
in case further updates are needed. Second, our headers use primitive C, and it
is hard to do the macros without using newer features.
Let's split this out into its own helper function we can reuse at
various places.
Also, let's avoid signed values where we can so that we can cover more
of the available time range.
Let's use the new flag wherever we read key material/passphrases/hashes
off disk, so that people can plug in their own IPC service as backend if
they like, easily.
(My main goal was actually to support this for crypttab key files — i.e.
that you can specify AF_UNIX sockets as third column in crypttab — but
that's harder to implement, since the keys are read via libcryptsetup's
API, not ours.)
According to the docs, and to the
org.freedesktop.login1.get-reboot-to-boot-loader-menu code, the
(oneshot) boot-loader-menu timeout should be stored in
/run/systemd/reboot-to-boot-loader-menu, but the set method was storing it
in /run/systemd/reboot-to-loader-menu.
This commit fixes this. Note that the fixed name also is a better match
for the dbus call names and matches the related
/run/systemd/reboot-to-boot-loader-entry structure, so fixing the set code,
rather then the get code + docs seems like the right thing to do here.