There are downsides to using fexecve:
when fexecve is used (for normal executables), /proc/pid/status shows Name: 3,
which means that ps -C foobar doesn't work. pidof works, because it checks
/proc/self/cmdline. /proc/self/exe also shows the correct link, but requires
privileges to read. /proc/self/comm also shows "3".
I think this can be considered a kernel deficiency: when O_CLOEXEC is used, this
"3" is completely meaningless. It could be any number. The kernel should use
argv[0] instead, which at least has *some* meaning.
I think the approach with fexecve/execveat is instersting, so let's provide it
as opt-in.
For scripts, when we call fexecve(), on new kernels glibc calls execveat(),
which fails with ENOENT, and then we fall back to execve() which succeeds:
[pid 63039] execveat(3, "", ["/home/zbyszek/src/systemd/test/test-path-util/script.sh", "--version"], 0x7ffefa3633f0 /* 0 vars */, AT_EMPTY_PATH) = -1 ENOENT (No such file or directory)
[pid 63039] execve("/home/zbyszek/src/systemd/test/test-path-util/script.sh", ["/home/zbyszek/src/systemd/test/test-path-util/script.sh", "--version"], 0x7ffefa3633f0 /* 0 vars */) = 0
But on older kernels glibc (some versions?) implement a fallback which falls
into the same trap with bash $0:
[pid 13534] execve("/proc/self/fd/3", ["/home/test/systemd/test/test-path-util/script.sh", "--version"], 0x7fff84995870 /* 0 vars */) = 0
We don't want that, so let's call execveat() ourselves. Then we can do the
execve() fallback as we want.
Previously it was very likely, when multiple contenders for the symlink
appear in parallel, that algorithm would select wrong symlink (i.e. one
with lower-priority).
Now the algorithm is much more defensive and when we detect change in
set of contenders for the symlink we reevaluate the selection. Same
happens when new symlink replaces already existing symlink that points
to different device node.
Note that st_mtime member of struct stat is defined as follows,
#define st_mtime st_mtim.tv_sec
Hence we omitted checking nanosecond part of the timestamp (struct
timespec) and possibly would miss modifications that happened within the
same second.
FixedRandomDelay=yes will use
`siphash24(sd_id128_get_machine() || MANAGER_IS_SYSTEM(m) || getuid() || u->id)`,
where || is concatenation, instead of a random number to choose a value between
0 and RandomizedDelaySec= as the timer delay.
This essentially sets up a fixed, but seemingly random, offset for each timer
iteration rather than having a random offset recalculated each time it fires.
Closes#10355
Co-author: Anita Zhang <the.anitazha@gmail.com>
Quoting the manual page of stime(2): "Starting with glibc 2.31, this function
is no longer available to newly linked applications and is no longer declared
in <time.h>."
If we encounter an RR that has no matching signature, then we don't know
whether it was expanded from a wildcard or not. We need to accept that
and not make the NSEC test fail, just skip over the RR.
The three answer sections can only carry up to UINT16_MAX entries, hence
put a hard upper limit on how far DnsAnswer can grow. The three count
fields in the DNS packet header are 16 bit only, hence the limit.
If code actually tries to add more than 64K RRs it will get ENOSPC with
this new checking.
And similar to DnsQuestion.
The current overflow checking is broken in the corner case of the strings'
combined length being exactly SIZE_MAX: After the loop, l would be SIZE_MAX,
but we're not testing whether the l+1 expression overflows.
Fix it by simply pre-accounting for the final '\0': initialize l to 1 instead
of 0.
The loops over (x, then all varargs, until a NULL is found) can be written much
simpler with an ordinary for loop. Just initialize the loop variable to x, test
that, and in the increment part, fetch the next va_arg(). That removes a level
of indentation, and avoids doing a separate strlen()/stpcpy() call for x.
While touching this code anyway, change (size_t)-1 to the more readable
SIZE_MAX.
This beefs up the READ_FULL_FILE_CONNECT_SOCKET logic of
read_full_file_full() a bit: when used a sender socket name may be
specified. If specified as NULL behaviour is as before: the client
socket name is picked by the kernel. But if specified as non-NULL the
client can pick a socket name to use when connecting. This is useful to
communicate a minimal amount of metainformation from client to server,
outside of the transport payload.
Specifically, these beefs up the service credential logic to pass an
abstract AF_UNIX socket name as client socket name when connecting via
READ_FULL_FILE_CONNECT_SOCKET, that includes the requesting unit name
and the eventual credential name. This allows servers implementing the
trivial credential socket logic to distinguish clients: via a simple
getpeername() it can be determined which unit is requesting a
credential, and which credential specifically.
Example: with this patch in place, in a unit file "waldo.service" a
configuration line like the following:
LoadCredential=foo:/run/quux/creds.sock
will result in a connection to the AF_UNIX socket /run/quux/creds.sock,
originating from an abstract namespace AF_UNIX socket:
@$RANDOM/unit/waldo.service/foo
(The $RANDOM is replaced by some randomized string. This is included in
the socket name order to avoid namespace squatting issues: the abstract
socket namespace is open to unprivileged users after all, and care needs
to be taken not to use guessable names)
The services listening on the /run/quux/creds.sock socket may thus
easily retrieve the name of the unit the credential is requested for
plus the credential name, via a simpler getpeername(), discarding the
random preifx and the /unit/ string.
This logic uses "/" as separator between the fields, since both unit
names and credential names appear in the file system, and thus are
designed to use "/" as outer separators. Given that it's a good safe
choice to use as separators here, too avoid any conflicts.
This is a minimal patch only: the new logic is used only for the unit
file credential logic. For other places where we use
READ_FULL_FILE_CONNECT_SOCKET it is probably a good idea to use this
scheme too, but this should be done carefully in later patches, since
the socket names become API that way, and we should determine the right
amount of info to pass over.
When selinux is enabled, the call of
manager_rtnl_enumerate_nexthop() fails.
This fix is to facilitate selinux hook handling for enumerating
nexthop.
In manager_rtnl_enumerate_nexthop() there is a check
if "Not supported" is returned by the send_netlink() call.
This check expects that -EOPNOTSUPP is returned,
the selinux hook seems to return -EINVAL instead.
This happens in kernel older than 5.3
(more specificallytorvalds/linux@65ee00a) as it does not support
nexthop handling through netlink.
And if SELinux is enforced in the order kernel, callingRTM_GETNEXTHOP
returns -EINVAL.
Thus adding a call in the manager_rtnl_enumerate_nexthop for the
extra return -EINVAL.
* Existing valid rule files written with KEY="value" are not affected
* Now, KEY=e"value\n" becomes valid. Where `\n` is a newline character
* Escape sequences supported by src/basic/escape.h:cunescape() is
supported
With these patches applied, networkd is successfully able to get an
address from a DHCP server on an IPoIB interface.
1)
Makes networkd pass the actual interface type to the dhcp client,
instead of hardcoding it to Ethernet.
2)
Fixes some issues in handling the larger (20 Byte) IB MAC addresses in
the dhcp code.
3)
Add a new field to networkds Link struct, which holds the interface
broadcast address.
3.1)
Modify the DHCP code to also expect the broadcast address as parameter.
On an Ethernet-Interface the Broadcast address never changes and is always
all 6 bytes set to 0xFF.
On an IB one however it is not neccesarily always the same, thus
fetching the actual address from the interface is neccesary.
4)
Only the last 8 bytes of an IB MAC are stable, so when using an IB MAC to
generate a client ID, only pass those 8 bytes.
This passes the legacy ethernet address to functions in a lot of places,
which all will need migrated to handle arbitrary size hardware addresses
eventually.
Hardware addresses come in various shapes and sizes, these new functions
and accomapying data structures account for that instead of hard-coding
a hardware address to the 6 bytes of an ethernet MAC.
The link state file is always removed when networkd is stopping. So,
the deserialization logic does not work. Moreover, the ADDRESSES=
entry is not used by sd-network, so serialization is also not necessary.
The link state file is always removed on stop. So, we cannot deserialize
the address from the file. Moreover, currently the IPv4 link-local address
is always dropped by link_drop_foreign_addresses() on restart. Let's
drop the serialize/deserialize logic for IPv4 LL address.
Previously, the address was taken from the state file, but DHCP4_ADDRESS=
entry was dropped by 46986251d6.
Moreover, the link state file is always removed when networkd is
stopping. Let's take the address from the list of enumerated addresses.
Ideally, we'd read back what we wrote, but that would have been
much more complicated. But just writing stuff is useful to test under
valgrind or manually.
We had two of each: both homectl and journalctl had the whole dlopen()
wrapper, and journalctl had two implementations (slightly different) of the
code to print the fss:// pattern.
print_qrcode() now returns -EOPNOTSUPP when compiled with qrcode support. Both
callers ignore the return value, so this changes nothing.
No functional change.
This adds a way to control SO_TIMESTAMP/SO_TIMESTAMPNS socket options
for sockets PID 1 binds to.
This is useful in journald so that we get proper timestamps even for
ingress log messages that are submitted before journald is running.
We recently turned on packet info metadata from PID 1 for these sockets,
but the timestamping info was still missing. Let's correct that.
Let's not have #ifdeffery both in the consumers and the providers of the
selinux glue code. Unless the code is particularly complex, let's do the
ifdeffery only in the provider of the selinux glue code, and let's keep
the consumers simple and just invoke it.
So far half of sd_event_source_set_enabled() was doing enabling, the
other half was doing disabling. Let's split that into two separate
calls.
(This also adds a new shortcut to sd_event_source_set_enabled(): if the
caller toggles between "ON" and "ONESHOT" we'll now shortcut this, since
the event source is already enabled in that case and shall remain
enabled.)
This heavily borrows and is inspired from Michal Sekletár's #17284
refactoring.
We typically don't just reshuffle a single prioq at once, but always
two. Let's add two helper functions that do this, and reuse them
everywhere.
(Note that this drops one minor optimization:
sd_event_source_set_time_accuracy() previously only reshuffled the
"latest" prioq, since changing the accuracy has no effect on the
earliest time of an event source, just the latest time an event source
can run. This optimization is removed to simplify things, given that
it's not really worth the effort as prioq_reshuffle() on properly
ordered prioqs has practically zero cost O(1)).
(Slightly generalized, commented and split out of #17284 by Lennart)
This reverts commit 6c5496c492.
sysinit.target is shared between the initrd and the host system. Pulling in
initrd-cryptsetup.target into sysinit.target causes the following warning at
boot:
Oct 27 10:42:30 workstation-uefi systemd[1]: initrd-cryptsetup.target: Starting requested but asserts failed.
Oct 27 10:42:30 workstation-uefi systemd[1]: Assertion failed for initrd-cryptsetup.target.
If processes remain in the unit's cgroup after the final SIGKILL is
sent and the unit has exceeded stop timeout, don't release the unit's
cgroup information. Pid1 will have failed to `rmdir` the cgroup path due
to processes remaining in the cgroup and releasing would leave the cgroup
path on the file system with no tracking for pid1 to clean it up.
Instead, keep the information around until the last process exits and pid1
sends the cgroup empty notification. The service/scope can then prune
the cgroup if the unit is inactive/failed.
Ubuntu builds on the Launchpad infrastructure run inside a chroot that does
not have the sysfs cgroup dirs mounted, so this call will return ENOMEDIUM
from cg_unified_cached() during the build-time testing, for example when
building the package in a Launchpad PPA.
This effectively reverts the commit 22fc2420b2.
The function `asynchronous_close()` confuses valgrind. Before this commit,
valgrind may report the following:
```
HEAP SUMMARY:
in use at exit: 384 bytes in 1 blocks
total heap usage: 4,787 allocs, 4,786 frees, 1,379,191 bytes allocated
384 bytes in 1 blocks are possibly lost in loss record 1 of 1
at 0x483CAE9: calloc (vg_replace_malloc.c:760)
by 0x401456A: _dl_allocate_tls (in /usr/lib64/ld-2.31.so)
by 0x4BD212E: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.31.so)
by 0x499B662: asynchronous_job (async.c:47)
by 0x499B7DC: asynchronous_close (async.c:102)
by 0x4CFA8B: client_initialize (sd-dhcp-client.c:696)
by 0x4CFC5E: client_stop (sd-dhcp-client.c:725)
by 0x4D4589: sd_dhcp_client_stop (sd-dhcp-client.c:2134)
by 0x493C2F: link_stop_clients (networkd-link.c:620)
by 0x4126DB: manager_free (networkd-manager.c:867)
by 0x40D193: manager_freep (networkd-manager.h:97)
by 0x40DAFC: run (networkd.c:20)
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 384 bytes in 1 blocks
still reachable: 0 bytes in 0 blocks
suppressed: 0 bytes in 0 blocks
For lists of detected and suppressed errors, rerun with: -s
ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
This reverts commit b45c068dd8.
I think the idea was generally sound, but didn't take into account the
limitations of show-environment and how it is used. People expect to be able to
eval systemctl show-environment output in bash, and no escaping syntax is
defined for environment *names* (we only do escaping for *values*). We could
skip such problematic variables in 'systemctl show-environment', and only allow
them to be inherited directly. But this would be confusing and ugly.
The original motivation for this change was that various import operations
would fail. a4ccce22d9 changed systemctl to filter
invalid variables in import-environment.
https://gitlab.gnome.org/GNOME/gnome-session/-/issues/71 does a similar change
in GNOME. So those problematic variables should not cause failures, but just
be silently ignored.
Finally, the environment block is becoming a dumping ground. In my gnome
session 'systemctl show-environment --user' includes stuff like PWD, FPATH
(from zsh), SHLVL=0 (no idea what that is). This is not directly related to
variable names (since all those are allowed under the stricter rules too), but
I think we should start pushing people away from running import-environment and
towards importing only select variables.
https://github.com/systemd/systemd/pull/17188#issuecomment-708676511
This changes the default from putting all units into the root slice to
placing them into the app slice in the user manager. The advantage is
that we get the right behaviour in most cases, and we'll need special
case handling in all other cases anyway.
Note that we have currently defined that applications *should* start
their unit names with app-, so we could also move only these by creating
a drop-in for app-.scope and app-.service.
However, that would not answer the question on how we should manage
session.slice. And we would end up placing anything that does not fit
the system (e.g. anything started by dbus-broker currently) into the
root slice.
These should not be bundled into the link_up() operation, as that is
not (currently) called during interface configuration if the interface
already is IFF_UP, which is unrelated to the need to change the mac
to a user-defined value, or set 'nomaster' on the interface.
Additionally, there is no need to re-set the mac or re-assert nomaster
every time the interface is brought up; those should be only part of
normal initial interface configuration.
Fixes: #17391
Let's try to make collisions when multiple clients want to use the same
device less likely, by sleeping a random time on collision.
The loop device allocation protocol is inherently collision prone:
first, a program asks which is the next free loop device, then it tries
to acquire it, in a separate, unsynchronized setp. If many peers do this
all at the same time, they'll likely all collide when trying to
acquire the device, so that they need to ask for a free device again and
again.
Let's make this a little less prone to collisions, reducing the number
of failing attempts: whenever we notice a collision we'll now wait
short and randomized time, making it more likely another peer succeeds.
(This also adds a similar logic when retrying LOOP_SET_STATUS64, but
with a slightly altered calculation, since there we definitely want to
wait a bit, under all cases)
On systems that have a udev before
a7fdc6cbd3 uevents would sometimes be
eaten because of device node collisions that caused the ruleset to fail.
Let's add an (ugly) work-around for this, so that we can even work with
such an older udev.
On current kernels (5.8 for example) under some conditions I don't fully
grok it might happen that a detached loopback block device still has
partition block devices around. Accessing these partition block devices
results in EIO errors (that also fill up dmesg). These devices cannot be
claned up with LOOP_CLR_FD (since the main device already is officially
detached), nor with LOOP_CTL_DELETE (returns EBUSY as long as the
partitions still exist). This is a kernel bug. But it appears to apply
to all recent kernels. I cannot really pin down what triggers this,
suffice to say our heavy-duty test can trigger it.
Either way, let's do something about it: when we notice this state we'll
attach an empty file to it, which is guaranteed to have to part table.
This makes the partitions go away. After closing/reoping the device we
hence are good to go again. ugly workaround, but I think OK enough to
use.
The net result is: with this commit, we'll guarantee that by the time we
attach a file to the loopback device we have zero kernel partitions
associated with it. Thus if we then wait for the kernel partitions we
need to appear we should have entirely reliable behaviour even if
loopback devices by the name are heavily recycled and udev events reach
us very late.
Fixes: #16858
Previously, we'd just wait for the first moment where the kernel exposes
the same numbre of partitions as libblkid tells us. After that point we
enumerate kernel partitions and look for matching libblkid partitions.
With this change we'll instead enumerate with libblkid only, and then
wait for each kernel partition to show up with the exact parameters we
expect them to show up. Once that happens we are happy.
Care is taken to use the udev device notification messages only as hint
to recheck what the kernel actually says. That's because we are
otherwise subject to a race: we might see udev events from an earlier
use of a loopback device. After all these devices are heavily recycled.
Under the assumption that we'll get udev events for *at least* all
partitions we care about (but possibly more) we can fix the race
entirely with one more fix coming in a later commit: if we make sure
that a loopback block device has zero kernel partitions when we take
possession of it, it doesn't matter anymore if we get spurious udev
events from a previous use. All we have to do is notice when the devices
we need all popped up.
When we fall back to classic LOOP_SET_FD logic in case LOOP_CONFIGURE
didn't work we issue LOOP_CLR_FD first. But that call turns out to be
potentially async in the kernel: if something else (let's say
udev/blkid) is accessing the device the ioctl just sets the autoclear
flag and exits. Hence quite often the LOOP_SET_FD will subsequently
fail. Let's avoid the trouble, and immediately exit with EBUSY if
LOOP_CONFIGURE fails, and but remember that LOOP_CONFIGURE is not
available so that on the next iteration we go directly for LOOP_SET_FD
instead.
This adds tests to make sure that basic/env-util considers environment
variables containing \r characters invalid, and that it removes such
variables during environment cleanup in strv_env_clean*().
test-env-util has not verified this behaviour before.
As \r characters can be used to hide information, disallowing them
helps with systemd's security barrier role, even when the \r
character comes as part of a DOS style (\r\n) line ending.
Prompted-by: https://github.com/systemd/systemd/issues/17378
v246 is long released. Hence the new scheme should be named v247.
(Interesting, how we pretty systematically for the last releases changed
the scheme only every second release)
The idea is that we have strvs like list of server names or addresses, where
the majority of strings is rather short, but some are long and there can
potentially be many strings. So formattting them either all on one line or all
in separate lines leads to output that is either hard to read or uses way too
many rows. We want to wrap them, but relying on the pager to do the wrapping is
not nice. Normal text has a lot of redundancy, so when the pager wraps a line
in the middle of a word the read can understand what is going on without any
trouble. But for a high-density zero-redundancy text like an IP address it is
much nicer to wrap between words. This also makes c&p easier.
This adds a variant of TABLE_STRV which is wrapped on output (with line breaks
inserted between different strv entries).
The change table_print() is quite ugly. A second pass is added to re-calculate
column widths. Since column size is now "soft", i.e. it can adjust based on
available columns, we need to two passes:
- first we figure out how much space we want
- in the second pass we figure out what the actual wrapped columns
widths will be.
To avoid unnessary work, the second pass is only done when we actually have
wrappable fields.
A test is added in test-format-table.
We check that the binary exists before writing the service file, but
let's also not consider the service started until the fork has happened.
This is still relatively new stuff, so we're can change the implementation
details like this.
The test was failing because it couldn't start the service:
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
path-modified.path: state = waiting; result = success
path-modified.service: state = failed; result = exit-code
Failed to connect to system bus: No such file or directory
-.slice: Failed to enable/disable controllers on cgroup /system.slice/kojid.service, ignoring: Permission denied
path-modified.service: Failed to create cgroup /system.slice/kojid.service/path-modified.service: Permission denied
path-modified.service: Failed to attach to cgroup /system.slice/kojid.service/path-modified.service: No such file or directory
path-modified.service: Failed at step CGROUP spawning /bin/true: No such file or directory
path-modified.service: Main process exited, code=exited, status=219/CGROUP
path-modified.service: Failed with result 'exit-code'.
Test timeout when testing path-modified.path
In fact any of the services that we try to start may fail, especially
considering that we're doing some rogue cgroup operations. See
https://github.com/systemd/systemd/pull/16603#issuecomment-679133641.
Just to make it easier to grok what happens when test-path fails.
Change printf→log_info so that output is interleaved and not split in two
independent parts in log files.
A comment indicates the start of the new contents of the override file,
and another indicates that lines following it will be discarded once
editing is finished.
The contents of the unit file and drop-ins are listed out after this
last marker.
Adds WRITE_STRING_FILE_TRUNCATE to set O_TRUNC when opening a file.
Thanks to cgzones for providing the required SELinux function calls.
Co-authored-by: Christian Göttsche <cgzones@googlemail.com>
For encrypted block devices that we need to unlock from the initramfs,
we currently rely on dracut shipping `cryptsetup.target`. This works,
but doesn't cover the case where the encrypted block device requires
networking (i.e. the `remote-cryptsetup.target` version). That target
however is traditionally dynamically enabled.
Instead, let's rework things here by adding a `initrd-cryptsetup.target`
specifically for initramfs encrypted block device setup. This plays the
role of both `cryptsetup.target` and `remote-cryptsetup.target` in the
initramfs.
Then, adapt `systemd-cryptsetup-generator` to hook all generated
services to this new unit when running from the initrd. This is
analogous to `systemd-fstab-generator` hooking all mounts to
`initrd-fs.target`, regardless of whether they're network-backed or not.
Why this change
---------------
Assumption - PAM's auth stack is properly configured.
Currently account pam_systemd_home.so returns PAM_SUCCESS for non
systemd-homed users, and a variety of return values (including
PAM_SUCCESS) for homed users.
account pam_unix returns PAM_AUTHINFO_UNAVAIL for systemd-homed
users, and a variety of return values (including PAM_AUTHINFO_UNAVAIL)
for normal users.
No possible combination in the pam stack can let us preserve the
various return values of the modules. For example, the configuration
mentioned in the manpage causes account pam_unix to never be reached
since pam_systemd_home just returns a success for ordinary users. Users
with expired passwords are allowed to log in because a check cannot be
made.
More configuration examples and why they don't work are mentioned
in #16906 and the downstream discussion linked there.
After this change
-----------------
account pam_unix will continue to return wrong value for homed users.
But we can skip the module conditionally using the return value from
account pam_systemd_home. We can already do this with the auth and
password modules.
This creates a private mount namespace for test-mountpint-util, with all
propagation from the host turned off. This gives us the guarantee that
/proc/self/mountinfo remains fixed and constant while we operate,
removing potential races against other unrelated stuff running on the
system that changes the mount table.
Prompted-by: #17050
(I doubt this actually fixes 17050, this is mostly to make sure that we
aren't possibly affected by such races in our test)
The status string is modeled after our --version output: +enabled -disabled equals=more-info
For example:
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=allow-downgrade/supported
We would print the whole string as a single super-long line. Let's nicely
break the text into lines that fit on the screen.
$ COLUMNS=70 build/resolvectl --no-pager nta
Global: home local intranet 23.172.in-addr.arpa lan
18.172.in-addr.arpa 16.172.in-addr.arpa 19.172.in-addr.arpa
25.172.in-addr.arpa 21.172.in-addr.arpa d.f.ip6.arpa
20.172.in-addr.arpa 30.172.in-addr.arpa 17.172.in-addr.arpa
internal 168.192.in-addr.arpa 28.172.in-addr.arpa
22.172.in-addr.arpa 24.172.in-addr.arpa 26.172.in-addr.arpa
corp 10.in-addr.arpa private 29.172.in-addr.arpa test
27.172.in-addr.arpa 31.172.in-addr.arpa
Link 2 (hub0):
Link 4 (enp0s31f6):
Link 5 (wlp4s0):
Link 7 (virbr0): adsfasdfasdfasd.com 21.172.in-addr.arpa lan j b
a.com home d.f.ip6.arpa b.com local 16.172.in-addr.arpa
19.172.in-addr.arpa 18.172.in-addr.arpa 25.172.in-addr.arpa
20.172.in-addr.arpa k i h 23.172.in-addr.arpa
168.192.in-addr.arpa d g intranet 17.172.in-addr.arpa c e.com
30.172.in-addr.arpa a f d.com e internal
Link 8 (virbr0-nic):
Link 9 (vnet0):
Link 10 (vb-rawhide):
Link 15 (wwp0s20f0u2i12):
Failed to enter shared memory directory multipath: Permission denied
→
Failed to enter shared memory directory /dev/shm/multipath: Permission denied
When looking at nested directories, we will print only the final two elements
of the path. That is still more useful than just the last component of the
path. To print the full path, we'd have to allocate the string, and since the
error occurs so very rarely, I think the current best-effort approach is
enough.
So far we didn't allow empty properties, but it makes sense to do so, for
example to distinguish empty data from lack of data. It also makes it easy to
override properties (back to the empty) value for specific cases.
Using `bootctl set-default @current` will set the default loader entry
to the currently booted entry as read from the `LoaderEntrySelected` EFI
variable.
Also `bootctl set-oneshot @current` will set the oneshot loader entry to
the current booted entry.
Correspondingly `@default` and `@oneshot` can be used to read from the
LoaderEntryDefault and LoaderEntryOneshot EFI variables.
Previously, .device units generated by SYSTEMD_ALIAS= udev properties
are not supported to specify devices for e.g. 'udevadm info'.
Before:
```
$ udevadm info sys-subsystem-net-devices-enp0s31f6.device
Unknown device "sys-subsystem-net-devices-enp0s31f6.device": No such device
```
After:
```
$ ./udevadm info sys-subsystem-net-devices-enp0s31f6.device
P: /devices/pci0000:00/0000:00:1f.6/net/enp0s31f6
L: 0
E: DEVPATH=/devices/pci0000:00/0000:00:1f.6/net/enp0s31f6
E: INTERFACE=enp0s31f6
E: IFINDEX=2
E: SUBSYSTEM=net
E: USEC_INITIALIZED=25317523
E: ID_NET_NAMING_SCHEME=v245
(snip)
```
By making them unsigned comparing them with other sizes is less likely
to trigger compiler warnings regarding signed/unsigned comparisons.
After all sizes (i.e. size_t) are generally assumed to be unsigned, so
these should be too.
Prompted-by: https://github.com/systemd/systemd/pull/17345#issuecomment-709402332
`route_get()` compares input with existing routes, however previously,
the input may did not have information about gateway. So, the
comparison result might be incorrect, and the foregoing set_put() might
return -EEXIST.
With the switch from log_debug() to log_debug_errno() in commit c413bb28df
systemd-update-done would fail without any error message if /etc
or /var were read-only. This restores the previous behaviour to
silently ignore these directories again.
sync() before committing a transient machine-id to disk. This will
ensure that any filesystem changes made by first-boot units will have
been persisted before the first boot is marked as completed.
Currently, a loss of power after the machine-id was written but before
all units with ConditionFirstBoot=yes ran would lead to the next boot
finding a valid machine-id, thus not being marked first boot and not
re-running these units.
To make the first boot mechanism more robust, instead of writing
/etc/machine-id very early, fill it with a marker value "uninitialized"
and overmount it with a transiently provisioned machine-id. Then, after
the first boots completes (when systemd-machine-id-commit.service runs),
write the real machine-id to disk.
This mechanism is of course only invoked on first boot. If a first boot
is not detected, the machine-id is handled as previously.
Fixes: #4511
If the first boot was aborted, /etc/machine-id might read as
"uninitialized" in some cases. Add a separate case for this
instead of printing a confusing error message.
When systemd-repart runs from initramfs, it reads out /etc/machine-id
from the rootfs as a seed for partition UUIDs. However, the machine-id
could be in an "uninitialized" state from a previous failed first boot.
In this situation the -ENOMEDIUM code-path (no machine-id set) should be
taken.
When nspawn starts an image, this image could be in any state, including
an aborted first boot. For this case, it needs to correctly handle the
situation like there was no machine-id at all.
Add a new ID128_PLAIN_OR_UNINIT format which treats the string
"uninitialized" like the file was empty and return -ENOMEDIUM. This
format should be used when reading an /etc/machine-id file from an image
that is not currently running.
p itself is never null. Because of this, we would always
call sd_notify() in cleanup, even though the intention was to only
call it if notify_start() was executed.
When /etc/machine-id contains the string "uninitialized" instead of
a valid machine-id, treat this like the file was missing and mark this
boot as the first (-> units with ConditionFirstBoot=yes will run).
Negative value means there is no match between a PCI device and any of
the slots. In the following commit we will extend this and value of 0
will indicate that there is a match between some slot and PCI device,
but that device is a PCI bridge.
let's add informational logging about each client requested signal
sending. While we are at, let's beef up error handling/log messages in
this case quite a bit: let's log errors both to syslog and report errors
back to client.
Fixes: #17254
--kill-who=main-fail never worked correctly, due to a copy and paste
mistake in ac5e3a505e, where the same item
was listed twice. The mistake was
later noticed, but fixed incorrectly, in
201f0c916d.
Let's list all *-fail types correctly, finally.
And while we are at it, add a nice comment and generate a prettier D-Bus
error about this.
Let's mark the whole /run/host hierarchy as something to ignore by PID 1
for generation of .mount units, i.e. consider it as "extrinsic".
By unifying container mgr supplied resources in one dir it's also easy
to exclude the whole lot from PID1's management inside the container.
This is the right thing to do, since from the payload's PoV these mounts
are just API and not manipulatable as they are established, managed and
owned by the container manager, not the payload.
(While we are it, also add the boot ID mount to the existing list, as
nspawn and other container managers overmount that too, typically, and
it is thus owned by the container manager and not the payload
typically.)
I can't think of any real vulnerability about this, but it still feels
better to check a variable with "secure" in its name with
secure_getenv() rather than plain getenv().
Paranoia FTW!
This might fix#17025:
> the call trace is
> bus_ensure_running -> sd_bus_process -> bus_process_internal -> process_closeing --> sd_bus_close
> |
> \-> process_match
We ended doing callouts to the Disconnected matches from bus_ensure_running()
and shouldn't. bus_ensure_running() should never do callouts. This change
should fix this however: once we notice that the connection is going down we
will now fail instantly with ENOTOCONN instead of calling any callbacks.
This should not change any behavior, as currently link_free_engines() is
always called after all addresses are dropped. But the function may be
used in other places in the future. So, let's also stop the clients.
Before this commit, event when Gateway=_dhcp4 or _ra is set, the
route was configured with 'protocol static', and other properties
specified by RouteTable=, RouteMTU=, or etc, were ignored.
This commit makes set the route protocol based on the protocol the
gateway address is obtained, and apply other settings if it is not
explicitly specified in the [Route] section.
bootctl implements three types of operation: those that work with an EFI
boot loader, those which work with any EFI boot loader that implements
the boot loader spec + interface, and finally those specific to sd-boot.
Previously the --help text and the man page mixed them all up. Let's put
them clearly in three separate sections however, to communicate clearly
what is supposed to work everywhere, and what is specific to
systemd-boot or boot loaders implementing the two specs.
This adjusts wording here and there, but is mostly just about
re-ordering existing docs, and putting them under new sections.
We close fds in two phases, first some and then the some more. When passing
a list of fds to exclude from closing to the closing function, we would
pass some in an array and the rest as separate arguments. For the fds which
should be excluded in both closing phases, let's always create the array
and put the relevant fds there. This has the advantage that if more fds to
exclude in both phases are added later, we don't need to add more positional
arguments.
The list passed to setup_pam() is not changed. I think we could pass more fds
to close there, but I'm leaving that unchanged.
The setting of FD_CLOEXEC on an already open fds is dropped. The fd is opened
in service_allocate_exec_fd() and there is no reason to suspect that it might
have been opened incorrectly. If some rogue code is unsetting our FD_CLOEXEC
bits, then it might flip any fd, no reason to single this one out.
We would return ENOENT, which is extremely confusing. Strace is not helpful because
no *file* is actually missing. So let's add some logs at debug level and also use
a custom return code. Let all user-facing utilities print a custom error message
in that case.