... when called with a valid environment variable name. This means that
any time we call it with a fixed string, it is guaranteed to return 0.
(Also when the variable is not present in the environment block.)
This fixes a race where a block device that pops up and immediately is
locked (such as a loopback device in preparation) might result in
udev never run any rules for it, and thus never turn on inotify watching
for it (as inotify watching is controlled via an option set via udev
rules), thus not noticing when the device is unlocked/closed again
(which is noticed via IN_CLOSE_WRITE inotify events).
This changes two things:
1. Whenever we encounter a locked block device we'll now inotify watch
it, so that it is guaranteed we'll notice when the BSD lock fd is
closed again, and will reprobe.
2. We'll now turn off inotify watching again once we realise the
udev rules don't actually want that. Previously, once watching a
device was enabled via a udev rule, it would be watched forever until
the device disappeared, even if the option was dropped by the rules
for later events.
Together this will make sure that we'll watch the device via inotify
in both of the following cases:
a) The block device has been BSD locked when udev wanted to look at it
b) The udev rules run for the last seen event for the device say so
In all other cases inotify is off for block devices.
This new behaviour both fixes the race, but also makes the most sense,
as the rules (when they are run) actually really control the watch state
now. And if someone BSD locks a block device then it should be OK to
inotify watch it briefly until the lock is released again as the user
this way more or less opts into the locking protocol.
The parent process may not perform any label operation, so the
database might not get updated on a SELinux policy change on its own.
Reload the label database once on a policy change, instead of n times
in every started child.
On systemd systems we generally don't need to chdir() to root, we don't
need to setup /dev/ ourselves (as PID 1 does that during earliest boot),
and we don't need to set the OOM adjustment values, as that's done via
unit files.
Hence, drop this. if people want to use udev from other init systems
they should do this on their own, I am very sure it's a good thing to do
it from outside of udevd, so that fewer privileges are required by udevd. In
particular the dev_setup() stuff is something that people who build
their own non-systemd distros want to set up themselves anyway, in
particular as they already have to mount devtmpfs themselves anyway.
Note that this only drops stuff that isn't really necessary for testing
stuff, i.e. process properties and settings that don't matter if you
quickly want to invoke udev from a terminal session to test something.
Since the separate binaries contain mostly the same code,
this almost halves the size of the installation.
before:
398K /bin/udevadm
391K /lib/systemd/systemd-udevd
after:
431K /bin/udevadm
0 /lib/systemd/systemd-udevd -> ../../bin/udevadm
Fixes: #14200
if someone implements https://systemd.io/BLOCK_DEVICE_LOCKING/ then we
shouldn't loudly complain about that.
This reverts back to the original behaviour from
3ebdb81ef0: when the lock is taken we
silently skip processing the device and sending out the messages for it.
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
Up to now each uevent logs the following things at debug level:
- Device is queued
- Processing device
- Device processed
However when the device is queued it might still have to wait for
earlier devices to be processed before being able to start being
processed itself. When analysing logs this dependency information is
quite cruicial, so add respective debug log calls.
If udevd receives an exit signal, it releases its reference on the udev
monitor in manager_exit(). If at this time a worker is hanging, and if
the event timeout for this worker expires before udevd exits, udevd
crashes in on_sigchld()->udev_monitor_send_device(), because the monitor
has already been freed.
Fix this by testing the validity of manager->monitor in on_sigchld().
If udevd receives an exit signal, it releases its reference on the udev
monitor in manager_exit(). If at this time a worker is hanging, and if
the event timeout for this worker expires before udevd exits, udevd
crashes in on_sigchld()->udev_monitor_send_device(), because the monitor
has already been freed.
Fix this by releasing the main process's monitor ref later, in
manager_free().
On some systems with lots of devices, device probing for certain drivers can
take a very long time. If systemd-udevd detects a timeout and kills the worker
running modprobe using SIGKILL, some devices will not be probed, or end up in
unusable state. The --event-timeout option can be used to modify the maximum
time spent in an uevent handler. But if systemd-udevd exits, it uses a
different timeout, hard-coded to 30s, and exits when this timeout expires,
causing all workers to be KILLed by systemd afterwards. In practice, this may
lead to workers being killed after significantly less time than specified with
the event-timeout. This is particularly significant during initrd processing:
systemd-udevd will be stopped by systemd when initrd-switch-root.target is
about to be isolated, which usually happens quickly after finding and mounting
the root FS.
If systemd-udevd is started by PID 1 (i.e. basically always), systemd will
kill both udevd and the workers after expiry of TimeoutStopSec. This is
actually better than the built-in udevd timeout, because it's more transparent
and configurable for users. This way users can avoid the mentioned boot problem
by simply increasing StopTimeoutSec= in systemd-udevd.service.
If udevd is not started by systemd (standalone), this is still an
improvement. udevd will kill hanging workers when the event timeout is
reached, which is configurable via the udev.event_timeout= kernel
command line parameter. Before this patch, udevd would simply exit with
workers still running, which would then become zombie processes.
With the timeout removed, the sd_event_now() assertion in manager_exit() can be
dropped.
We'd log to the "console", losing structured logs during configuration file parsing.
Let's be nice to journalctl users, and log to the journal immediately.
# udevadm control --property=HELLO=WORLD
Received udev control message (ENV), unsetting 'HELLO'
# udevadm control --property=HELLO=
Received udev control message (ENV), setting 'HELLO='
Oh no, it's busted. Let's try removing this one little negation real quick
to see if it helps...
# udevadm control --property=HELLO=WORLD
Received udev control message (ENV), setting 'HELLO=WORLD'
# udevadm control --property=HELLO=
Received udev control message (ENV), unsetting 'HELLO'
Feels much better now.
This does the following:
- rename enum udev_builtin_cmd -> UdevBuiltinCmd
- rename struct udev_builtin -> UdevBuiltin
- move type definitions to udev-rules.h
- move prototypes of functions defined in udev-rules.c to udev-rules.h
- drop to use strbuf
- propagate critical errors in applying rules,
- drop limitation for number of tokens per line.
Follow-up for faae64fa3d, which increased the
default number of udev workers per cpu regardless of how big the system is.
It's not really clear from the commit message if the new number of workers
improved the overall time for the boot process or only reduced the number of
times the max number of children limit was reached (and in this case
5406c36844 commit might have been more appropriate in the first place).
But systems with ~1000 CPUs are not rare these days and the worker numbers get
quite large with CPU factor of 8. Spawning more than 2000 workers can't be
healthy on any system, no matter how big.
Indeed the main mistake is the belief that udev is CPU-intensive, and thus the
number of allowed workers has to increase with the number of CPUs. It is not,
at probably has never been. It's I/O bound, and sometimes, bound by resources
such as locks.
This is an argument to:
- scale only weakly with the number of CPUs, and the rationale to switch back
to a scale factor C=2 but with a higher offset number which should affect
systems with a small number of CPUs only. With this patch applied the offset
is increased from O=8 to O=16.
- put an absolute maximum limit to make sure no more than 2048 workers are
spawned no matter how big the system is.
This still provides more workers for the laptop cases (where the number of CPUs
is limited), while avoiding sky-rocketing numbers for big systems.
Note that on most desktop systems, the memory limit will kick in. The following
table collects numbers about children-max. For each scenario, the first column
is the "cpu_limit" limit, and the second number is the minimum amount of memory
for the "cpu_limit" limit to become relevant (with less RAM, memory will limit
the number of children thus "mem_limit" will become the active limit).
| > v240 | < v240 | this patch |
CPUs | C = 8, O = 8 | C = 2, O = 8 | C = 2, O = 16 |
-------------------------------------------------------
1 | 16 2 | 10 1.3 | 18 2 |
2 | 24 3 | 12 1.5 | 20 2 |
4 | 40 5 | 16 2 | 24 3 |
8 | 72 9 | 24 3 | 32 4 |
16 | 136 17 | 40 5 | 48 5 |
64 | 520 65 | 136 17 | 144 18 |
1024 | 8200 1025 | 2056 263 | 2048 256 |
2048 |16392 2049 | 4104 513 | 2048 256 |
This patch is mainly based on Martin Wilck's analyze and comments.
When booting with "udev.log-priority=debug" for example, the output might be
spammed with messages like this:
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
systemd-udevd[23545]: maximum number (248) of children reached
While the message itself is useful, printing it per batch of events should be
enough.
Originally commented as "devices names might have changed/swapped in the meantime",
but may not. For safety, let's block the following events with same
devpath.
This may fix#6514.