If /sys/class/OOO node is created and destroyed during booting (kernle driver initialization fails),
systemd-udev-trigger.service fails due to race condition.
***** race condition ***********************************************************************************
1. kernel driver create /sys/class/OOO
2. systemd-udev-trigger.service execues "/usr/bin/udevadm trigger --type=devices --action=add"
3. device_enumerator_scan_devices() => enumerator_scan_devices_all() => enumerator_scan_dir("class") =>
opendir("/sys/class") and iterate all subdirs ==> enumerator_scan_dir_and_add_devices("/sys/class/OOO")
4. kernel driver fails and destroy /sys/class/OOO
5. enumerator_scan_dir_and_add_devices("/sys/class/OOO") fails in opendir("/sys/class/OOO")
6. "systemd-udev-trigger.service" fails
7. udev coldplug fails and some device units not ready
8. mount units asociated with device units fail
9. local-fs.target fails
10. enters emergency mode
********************************************************************************************************
***** status of systemd-udev-trigger.service unit ******************************************************
$ systemctl status systemd-udev-trigger.service
systemd-udev-trigger.service - udev Coldplug all Devices
Loaded: loaded (/usr/lib/systemd/system/systemd-udev-trigger.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2020-01-02 13:16:54 KST; 22min ago
Docs: man:udev(7)
man:systemd-udevd.service(8)
Process: 2162 ExecStart=/usr/bin/udevadm trigger --type=subsystems --action=add (code=exited, status=0/SUCCESS)
Process: 2554 ExecStart=/usr/bin/udevadm trigger --type=devices --action=add (code=exited, status=1/FAILURE)
Main PID: 2554 (code=exited, status=1/FAILURE)
Jan 02 13:16:54 localhost udevadm[2554]: Failed to scan devices: No such file or directory
Jan 02 13:16:54 localhost systemd[1]: systemd-udev-trigger.service: Main process exited, code=exited, status=1/FAILURE
Jan 02 13:16:54 localhost systemd[1]: systemd-udev-trigger.service: Failed with result 'exit-code'.
Jan 02 13:16:54 localhost systemd[1]: Failed to start udev Coldplug all Devices.
*******************************************************************************************************
***** journal log with Environment=SYSTEMD_LOG_LEVEL=debug in systemd-udev-trigger.service ***********
Jan 01 21:57:20 localhost udevadm[2039]: sd-device-enumerator: Scanning /sys/bus
Jan 01 21:57:20 localhost udevadm[2522]: sd-device-enumerator: Scan all dirs
Jan 01 21:57:20 localhost udevadm[2522]: sd-device-enumerator: Scanning /sys/bus
Jan 01 21:57:21 localhost udevadm[2522]: sd-device-enumerator: Scanning /sys/class
Jan 01 21:57:21 localhost udevadm[2522]: sd-device-enumerator: Failed to scan /sys/class: No such file or directory
Jan 01 21:57:21 localhost udevadm[2522]: Failed to scan devices: No such file or directory
*******************************************************************************************************
Follow-up for 1cdbff1c84.
After the commit 1cdbff1c84, each entry .conf contains
redundant slash like the following:
```
$ cat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-5.9.8-200.fc33.x86_64.conf
title Fedora 33 (Thirty Three)
version 5.9.8-200.fc33.x86_64
machine-id xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
options root=/dev/nvme0n1p2 ro rootflags=subvol=system/fedora selinux=0 audit=0
linux //xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/5.9.8-200.fc33.x86_64/linux
initrd //xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/5.9.8-200.fc33.x86_64/initrd
```
Devices with multicast but without mac addresses i.e. tun devices
are not getting setuped correctly:
$ ip tuntap add mode tun dev tun0
$ ip addr show tun0
16: tun0: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
link/none
$ cat /etc/systemd/network/tun0.network
[Match]
Name = tun0
[Network]
Address=192.168.1.1/32
$ ./systemd-networkd
tun0: DHCP6 CLIENT: Failed to set identifier: Invalid argument
tun0: Failed
Otherwise if a daemon-reload happens somewhere between the enqueue of the job
start for the scope unit and scope_start() then u->pids might be lost and none
of the processes specified by "PIDs=" will be moved into the scope cgroup.
These three syscalls are internally used by libc's memory allocation
logic, i.e. ultimately back malloc(). Allocating a bit of memory is so
basic, it should just be in the default set.
This fixes a couple of issues with asan/msan and the seccomp tests: when
asan/msan is used some additional, large memory allocations take place
in the background, and unless mmap/mmap2/brk are allowlisted these will
fail, aborting the test prematurely.
This reverts commit 34136e1503.
Having the "%H" host name specifier in a DNSSD service name template
triggers a failed assertion during name template instantiation as
specifier_dnssd_host_name expects DnssdService in its userdata
pointer but finds NULL instead.
test_oomd_cgroup_context_acquire_and_insert reads the live cgroup data used
by the unit test. Under certain conditions, the memory pressure for the cgroup
can be non-zero (although most of the time it is 0 since these tests don't
generate much pressure).
Since these values are too dependent on the state of the system, remove the
checks. The type used is always >= 0 and test-psi-util already unit tests that
PSI values are parsed correctly from files so this test is redundant anyways.
Follow-up for ac24e418d9.
The original motivation of the commit and RFE #15339 is to start dhcpv6
client in managed mode when neither M nor O flag is set in the RA.
But, previously, if the setting is set to "always", then the DHCPv6
client is always started in managed mode even if O flag is set in the
RA. Such the behavior breaks RFC 7084.
The configuration of networkd has a DHCPv6Client setting in its
[IPv6AcceptRA] section, which, according to the man page, can be
a boolean, or the special value "always". The man page states
that "true" is the default.
The default value is implemented in src/network/networkd-network.c
by setting field ipv6_accept_ra_start_dhcp6_client of network to
true. However, this field is not a boolean, but an enum type
IPv6AcceptRAStartDHCP6Client (src/network/networkd-ndisc.h).
Setting ipv6_accept_ra_start_dhcp6_client to true effectively
corresponds to the enum value IPV6_ACCEPT_RA_START_DHCP6_CLIENT_ALWAYS,
resulting in the DHCPv6Client setting having the default value
"always".
This patch changes the initialisation to the correct enum value
IPV6_ACCEPT_RA_START_DHCP6_CLIENT_YES.
This is useful for development where overwriting files out side
the configured prefix will affect the host as well as stateless
systems such as NixOS that don't let packages install to /etc but handle
configuration on their own.
Alternative to https://github.com/systemd/systemd/pull/17501
tested with:
$ mkdir inst build && cd build
$ meson \
-Dcreate-log-dirs=false \
-Dsysvrcnd-path=$(realpath ../inst)/etc/rc.d \
-Dsysvinit-path=$(realpath ../inst)/etc/init.d \
-Drootprefix=$(realpath ../inst) \
-Dinstall-sysconfdir=false \
--prefix=$(realpath ../inst) ..
$ ninja install
":" is prettier, but meson 0.56+ doesn't like it:
src/systemd/meson.build:73: DEPRECATION: ":" is not allowed in test name "cc-sd-bus.h:c", it has been replaced with "_"
src/systemd/meson.build:73: DEPRECATION: ":" is not allowed in test name "cc-sd-bus.h:c-ansi", it has been replaced with "_"
...
Fixes#17568.
With the grandparent change to move most units to app.slice,
those units would be ordered After=app.slice which doesn't make any sense.
Actually they appear earlier, before the manager is even started, and
conceputally it doesn't seem useful to put them under any slice.
... when called with a valid environment variable name. This means that
any time we call it with a fixed string, it is guaranteed to return 0.
(Also when the variable is not present in the environment block.)
Coverity in CID#1435966 was complaining that s->enabled is not "restored" in
all cases. But the code was actually correct, since it should only be
"restored" in the error paths. But let's still make this prettier by not setting
the state before all operations that may fail are done.
We need to set .enabled for the prioq reshuffling operations, so move those down.
No functional change intended.
Previously it was very likely, when multiple contenders for the symlink
appear in parallel, that algorithm would select wrong symlink (i.e. one
with lower-priority).
Now the algorithm is much more defensive and when we detect change in
set of contenders for the symlink we reevaluate the selection. Same
happens when new symlink replaces already existing symlink that points
to different device node.
Note that st_mtime member of struct stat is defined as follows,
#define st_mtime st_mtim.tv_sec
Hence we omitted checking nanosecond part of the timestamp (struct
timespec) and possibly would miss modifications that happened within the
same second.
FixedRandomDelay=yes will use
`siphash24(sd_id128_get_machine() || MANAGER_IS_SYSTEM(m) || getuid() || u->id)`,
where || is concatenation, instead of a random number to choose a value between
0 and RandomizedDelaySec= as the timer delay.
This essentially sets up a fixed, but seemingly random, offset for each timer
iteration rather than having a random offset recalculated each time it fires.
Closes#10355
Co-author: Anita Zhang <the.anitazha@gmail.com>
Quoting the manual page of stime(2): "Starting with glibc 2.31, this function
is no longer available to newly linked applications and is no longer declared
in <time.h>."
If we encounter an RR that has no matching signature, then we don't know
whether it was expanded from a wildcard or not. We need to accept that
and not make the NSEC test fail, just skip over the RR.
The three answer sections can only carry up to UINT16_MAX entries, hence
put a hard upper limit on how far DnsAnswer can grow. The three count
fields in the DNS packet header are 16 bit only, hence the limit.
If code actually tries to add more than 64K RRs it will get ENOSPC with
this new checking.
And similar to DnsQuestion.
The current overflow checking is broken in the corner case of the strings'
combined length being exactly SIZE_MAX: After the loop, l would be SIZE_MAX,
but we're not testing whether the l+1 expression overflows.
Fix it by simply pre-accounting for the final '\0': initialize l to 1 instead
of 0.
The loops over (x, then all varargs, until a NULL is found) can be written much
simpler with an ordinary for loop. Just initialize the loop variable to x, test
that, and in the increment part, fetch the next va_arg(). That removes a level
of indentation, and avoids doing a separate strlen()/stpcpy() call for x.
While touching this code anyway, change (size_t)-1 to the more readable
SIZE_MAX.
This beefs up the READ_FULL_FILE_CONNECT_SOCKET logic of
read_full_file_full() a bit: when used a sender socket name may be
specified. If specified as NULL behaviour is as before: the client
socket name is picked by the kernel. But if specified as non-NULL the
client can pick a socket name to use when connecting. This is useful to
communicate a minimal amount of metainformation from client to server,
outside of the transport payload.
Specifically, these beefs up the service credential logic to pass an
abstract AF_UNIX socket name as client socket name when connecting via
READ_FULL_FILE_CONNECT_SOCKET, that includes the requesting unit name
and the eventual credential name. This allows servers implementing the
trivial credential socket logic to distinguish clients: via a simple
getpeername() it can be determined which unit is requesting a
credential, and which credential specifically.
Example: with this patch in place, in a unit file "waldo.service" a
configuration line like the following:
LoadCredential=foo:/run/quux/creds.sock
will result in a connection to the AF_UNIX socket /run/quux/creds.sock,
originating from an abstract namespace AF_UNIX socket:
@$RANDOM/unit/waldo.service/foo
(The $RANDOM is replaced by some randomized string. This is included in
the socket name order to avoid namespace squatting issues: the abstract
socket namespace is open to unprivileged users after all, and care needs
to be taken not to use guessable names)
The services listening on the /run/quux/creds.sock socket may thus
easily retrieve the name of the unit the credential is requested for
plus the credential name, via a simpler getpeername(), discarding the
random preifx and the /unit/ string.
This logic uses "/" as separator between the fields, since both unit
names and credential names appear in the file system, and thus are
designed to use "/" as outer separators. Given that it's a good safe
choice to use as separators here, too avoid any conflicts.
This is a minimal patch only: the new logic is used only for the unit
file credential logic. For other places where we use
READ_FULL_FILE_CONNECT_SOCKET it is probably a good idea to use this
scheme too, but this should be done carefully in later patches, since
the socket names become API that way, and we should determine the right
amount of info to pass over.
When selinux is enabled, the call of
manager_rtnl_enumerate_nexthop() fails.
This fix is to facilitate selinux hook handling for enumerating
nexthop.
In manager_rtnl_enumerate_nexthop() there is a check
if "Not supported" is returned by the send_netlink() call.
This check expects that -EOPNOTSUPP is returned,
the selinux hook seems to return -EINVAL instead.
This happens in kernel older than 5.3
(more specificallytorvalds/linux@65ee00a) as it does not support
nexthop handling through netlink.
And if SELinux is enforced in the order kernel, callingRTM_GETNEXTHOP
returns -EINVAL.
Thus adding a call in the manager_rtnl_enumerate_nexthop for the
extra return -EINVAL.
* Existing valid rule files written with KEY="value" are not affected
* Now, KEY=e"value\n" becomes valid. Where `\n` is a newline character
* Escape sequences supported by src/basic/escape.h:cunescape() is
supported