We defined both $(VERSION) and $(PACKAGE_VERSION) with the same contents.
$(PACKAGE_VERSION) is slightly more descriptive, so settle on that, and
drop the other define.
For IBM PowerVM Virtual I/O network devices, we can build predictable names
based on the slot number passed as part of the OF "reg" property. Valid slot
numbers range between 2-32767, so we only need the bottom half of the unit
address passed.
For example:
/proc/device-tree/vdevice/l-lan@30000002
/proc/device-tree/vdevice/vnic@30000005
would initially map to something like:
/sys/devices/vio/30000002/net/eth0
/sys/devices/vio/30000005/net/eth1
and would then translate to env2 and env5
This patch ignores the bus number, as there should only ever be one bus, and
then remove leading zeros.
The builtin path id for virtio block devices has been changed
to use the bus id without a prefix "virtio-pci" to be
compatible with all virtio transport types.
In order to not break existing setups, the by-path symlinks for
virtio block devices on the PCI bus are reintroduced by udev rules.
The virtio-pci symlinks are considered to be deprecated and
should be replaced by the native PCI symlinks.
Example output for a virtio disk in PCI slot 7:
$ ls /dev/disk/by-path
pci-0000:00:07.0
pci-0000:00:07.0-part1
virtio-pci-0000:00:07.0
virtio-pci-0000:00:07.0-part1
See also
[1] https://lists.freedesktop.org/archives/systemd-devel/2017-February/038326.html
[2] https://lists.freedesktop.org/archives/systemd-devel/2017-March/038397.html
This reverts f073b1b but keeps the same symlinks for compatibility.
The CCW id_net_name_path detection didn't account for virtio
interfaces on the CCW bus. As a result the default interface
names for virtio-ccw interfaces would use the old eth<x>
format instead of enc<busid>.
Since virtio-pci interface naming follows the naming rules
of the parent bus, the names_ccw() logic was changed to apply
the CCW interface naming rules to virtio interfaces as well,
e.g. enc2000 for an interface with a CCW bus id 0.0.2000.
As virtio interfaces are apt to get the otherwise unusual
CCW bus id 0.0.0000, the last '0' is now preserved in this
case.
The virtio subsystem skipping loop has been moved from
names_pci() into a function skip_virtio() that can be reused
for all bus types with virtio network devices.
Since virtio-ccw interfaces use single CCW addresses the ccwgroup
requirement was relaxed and the C definitions were changed
accordingly.
strsignal() sucks, as it tries to generate human readable strings from
something that isn't really human readable by concept. Let's use
signal_to_string() instead, making this more grokkable. Difference is:
SIGINT gets translated → "SIGINT" rather than → "Interrupted".
Instead of using a temp buffer to replace whitespace in variable
substitutions, just allow util_replace_whitespace to replace in-place.
Add a comment to util_replace_whitespace indicating it is used to replace
in-place, to prevent accidental future breakage.
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
If the string_escape option is either unset or 'replace' (i.e. if it is
not 'none'), then enable whitespace replacement in SYMLINK variable
substitution values, as added in the last patch.
This will keep any whitespace that is directly contained in a SYMLINK
value, but will replace any whitespace that is added to the SYMLINK
value as a result of variable substitution (except $result/%c).
This fixes bug 4833.
If replace_whitespace is true, each substitution value has all its
whitespace removed/replaced by util_replace_whitespace (except the
SUBST_RESULT substitution - $result{} or %c{} - which handles spaces
itself as field separators). All existing callers are updated to
pass false, so no functional change is made by this patch.
This is needed so the SYMLINK assignment can replace any spaces
introduced through variable substitution, becuase the SYMLINK value is
a space-separated list of symlinks to create. Any variables that
contain spaces will thus unexpectedly change the symlink value from
a single symlink to multiple incorrectly-named symlinks.
This is used in the next patch, which enables the whitespace
replacement for SYMLINK variable substitution.
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
It is possible to specify only one quote in udev rules, which is not
detected as an invalid quoting (" instead of "" for empty string).
Technically this doesn't lead to a bug, because the string ends in two
terminating nul characters at this position, but a user should still be
reminded that his configuration is invalid.
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .
This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).
commit 0527f1c
3f1ac7a700ommit
35afb33
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
This reverts some changes introduced in d054f0a4d4.
xsprintf should be used in cases where we calculated the right buffer
size by hand (using DECIMAL_STRING_MAX and such), and never in cases where
we are printing externally specified strings of arbitrary length.
Fixes#4534.
Switch drivers uses phys_port_name attribute to pass front panel port
name to user. Use it to generate netdev names.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
- do not crash if an option without value is specified on the kernel command
line, e.g. "udev.log-priority" :P
- simplify the code a bit
- warn about unknown "udev.*" options — this should make it easier to spot
typos and reduce user confusion
As suggested here:
https://github.com/systemd/systemd/pull/4296#issuecomment-251911349
Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that
we can use a protocol-independent socket here if possible. This has the benefit
that our code will still work even if AF_INET/AF_INET6 is made unavailable (for
exmple via seccomp), at least on current kernels.
This appends the nvme name and namespace identifier attribute the the
PCI path for by-path links. Symlinks like the following are now present:
lrwxrwxrwx. 1 root root 13 Sep 16 12:12 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx. 1 root root 15 Sep 16 12:12 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1
Cc: Michal Sekletar <sekletar.m@gmail.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
ethtool_sset_info adding some extra space to it.
also fix valgrind warning
```
Unloaded link configuration context.
==31690==
==31690== HEAP SUMMARY:
==31690== in use at exit: 8,192 bytes in 2 blocks
==31690== total heap usage: 431 allocs, 429 frees, 321,164 bytes allocated
==31690==
==31690== 4,096 bytes in 1 blocks are still reachable in loss record 1 of 2
==31690== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==31690== by 0x166B32: mempool_alloc_tile (mempool.c:62)
==31690== by 0x166BBC: mempool_alloc0_tile (mempool.c:81)
==31690== by 0x15B8FC: hashmap_base_new (hashmap.c:732)
==31690== by 0x15B9F7: internal_hashmap_new (hashmap.c:766)
==31690== by 0x151291: conf_files_list_strv_internal (conf-files.c:103)
==31690== by 0x1514BA: conf_files_list_strv (conf-files.c:135)
==31690== by 0x13A1CF: link_config_load (link-config.c:227)
==31690== by 0x135B68: builtin_net_setup_link_init
(udev-builtin-net_setup_link.c:77)
==31690== by 0x1306B3: udev_builtin_init (udev-builtin.c:57)
==31690== by 0x11E984: adm_builtin (udevadm-test-builtin.c:72)
==31690== by 0x117B4D: run_command (udevadm.c:75)
```
Fixes#4080
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.
This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.
Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.
Callers of the 'udev monitor' tool expect to see output when
an event occurs. The stdio buffering defeats that. This patch
switches it to line buffering.
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
Since glibc is moving away from implicitly including sys/sysmacros.h
all the time via sys/types.h, include the header directly in more
places. This seems to cover most makedev/major/minor usage.
Also downgrade non-fatal warnings to log_warning.
Previously rule_add_key() would check the output array and log a cryptic
error and return -1. Most of the time the return value was ignored. This
does not seems right, because the buffer can overflow with enough rules.
It would also check if we have enough space for the *next* rule, even if
there might be not next rule, i.e. off-by-one.
Replace this with a check that we have enough space for a next rule before
we start parsing.
Normally using macros to alter flow is not allowed, but in this case I
think it is worth it, because it allows lots of boilerplate code to be
removed and hides repeated boring parameters, making function logic much
easier to follow.
If the attribute wasn't found, the last filename looked at was returned in
the input/output argument. This just seems bad style.
The return value was ignored, so change function to return void.
Usually, we place the #pragma once before the copyright blurb in header files,
but in a few cases we didn't. Move those around, so that we do the same thing
everywhere.
Running "udevadm test-builtin path_id /sys/devices/platform/" results
in a segmentation fault.
The problem is that udev_device_get_subsystem(dev) might return NULL
in a streq() call. Solve this problem by using streq_ptr() instead.
Enumeration of virtio buses is global and hence
non-deterministic. However, we are guaranteed there is never going to be
more than one virtio bus per parent PCI device. While populating
ID_PATH we simply skip virtio part of the syspath and we extend the path
using the sysname of the parent PCI device.
With this patch udev creates following by-path links for virtio-blk
device /dev/vda which contains two partitions.
ls -l /dev/disk/by-path/
total 0
lrwxrwxrwx 1 root root 9 Feb 9 10:47 virtio-pci-0000:00:05.0 -> ../../vda
lrwxrwxrwx 1 root root 10 Feb 9 10:47 virtio-pci-0000:00:05.0-part1 -> ../../vda1
lrwxrwxrwx 1 root root 10 Feb 9 10:47 virtio-pci-0000:00:05.0-part2 -> ../../vda2
See:
http://lists.linuxfoundation.org/pipermail/virtualization/2015-August/030328.htmlFixes#2501
The commmon case default qeth link is enccw0.0.0600 is rather long.
Thus strip leading zeros (which doesn't make the bus_id unstable),
similar to the PCI domain case.
Also 'ccw' is redundant on S/390, as there aren't really other buses
available which could have qeth driver interfaces. Not sure why this
code is even compiled on non-s390[x] platforms. But to distinguish from
e.g. MAC stable names shorten the suffix to just 'c'.
Thus enccw0.0.0600 becomes enc600.
fds will also be closed during manager cleanup in run, leading
to an error when we try to close them again. It is now possible
to "leak" the fds on error, but it's an unlikely event and we
will exit immediately anyway.
Fixes#2418.
Little change in practice, because the program will exit soon
afterwards, but the standard style of closing all fds is now followed.
Also gets rid of gcc warning about fd_ctrl and fd_uevent being
unitialized.
gcc is confused by the common idiom of
return errno ? -errno : -ESOMETHING
and thinks a positive value may be returned. Replace this condition
with errno > 0 to help gcc and avoid many spurious warnings. I filed
a gcc rfe a long time ago, but it hard to say if it will ever be
implemented [1].
Both conventions were used in the codebase, this change makes things
more consistent. This is a follow up to bcb161b023.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61846
We quite obviously check whether event->dev_db is nonnull, and
right after that call a function which asserts the same. Move
the call under the same if.
https://bugzilla.redhat.com/show_bug.cgi?id=1283971
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
Change the "out" parameter from uint8_t[8] to uint64_t. On architectures which
enforce pointer alignment this fixes crashes when we previously cast an
unaligned array to uint64_t*, and on others this should at least improve
performance as the compiler now aligns these properly.
This also simplifies the code in most cases by getting rid of typecasts. The
only place which we can't change is struct duid's en.id, as that is _packed_
and public API, so we can't enforce alignment of the "id" field and have to
use memcpy instead.
Improve and enhance the path_id udev builtin to correctly handle bus'
available on Linux on z Systems (s390).
Previously, the CCW bus and, in particular, any FCP devices on it, have
been treated separately. This commit integrates the CCW bus into the
device chain loop. FCP devices and their associated SCSI disks are now
handled through the common SCSI handling functions in path_id.
This implies also a change in the naming of the symbolic links created
by udev. So any backports of this commit to existing Linux distribution
must be done with care. If a backport is required, a udev rule must be
created to also create the "old-style" symbolic links.
Apart from the CCW bus, this commit adds bus support for the:
- ccwgroup bus which manages network devices, and
- ap bus which manages cryptographic adapters
- iucv bus which manages IUCV devices on z/VM
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
We don't use that anywhere any more. With the introduction of alias names it
also is not a proper mapping any more as several keys (e. g. KEY_COFFEE and
KEY_SCREENLOCK) have the same numerical mapping.
While it is currently possible to either not set MACAddressPolicy or set
it to a value different from "persistent" or "random", it is not obvious
that a user can do so. Add a policy, "none", which simply retains kernel
MAC addresses (same as not filling in the policy at all) and document it
so that users are aware of this setting.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Use %m where previously %s was used together with strerrno().
Fixes: e53fc357a9 "tree-wide: remove a number of invocations of
strerror() and replace by %m"
The TAG key can be used in rules for event matching. At the moment, it
does not support inequality tests. This patch enhances the key test to
validate the rule if it does not contain a given TAG (by TAG!="value").
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Turns this:
r = -errno;
log_error_errno(errno, "foo");
into this:
r = log_error_errno(errno, "foo");
and this:
r = log_error_errno(errno, "foo");
return r;
into this:
return log_error_errno(errno, "foo");
This seems to be an oversight from:
707b66c663
We have to return ENODATA instead of ENOENT if a requested entry is
non-present. Also fix the call-site in udev to check for these errors.
The recent cgroup-rework changed the error code for un-mounted cgroupfs to
ENOEXEC. Make sure udev ignores it just like ENOENT and does not spill
warnings on the screen.
Virtio buses are undeterministically enumerated, so we cannot use them as a basis
for deterministic naming (see bf81e792f3). However, we are guaranteed that there
is only ever one virtio bus for every parent device, so we can simply skip over
the virtio buses when naming the devices.
The partition-type flags are defined independently for every partition-type. Apply
them only to the types where they are defined, and not to the ESP, which does not
appear to share the same set of flags.
https://github.com/systemd/systemd/issues/920
- Add smack xattr lookup table
- Unify all of mac_smack_apply_xxx{_fd}() to mac_smack_apply() and
mac_smack_apply_fd().
- Add smack xattr read apis similar with apply apis as
mac_smack_read{_fd}().
Previously, if the event loop never ran before sd_event_now() would
fail. With this change it will instead fall back to invoking now(). This
way, the function cannot fail anymore, except for programming error when
invoking it with wrong parameters.
This takes into account the fact that many callers did not handle the
error condition correctly, and if the callers did, then they kept simply
invoking now() as fall back on their own. Hence let's shorten the code
using this call, and make things more robust, and let's just fall back
to now() internally.
Whether now() is used or the cache timestamp may still be detected via
the return value of sd_event_now(). If > 0 is returned, then the fall
back to now() was used, if == 0 is returned, then the cached value was
returned.
This patch also simplifies many of the invocations of sd_event_now():
the manual fall back to now() can be removed. Also, in cases where the
call is invoked withing void functions we can now protect the invocation
via assert_se(), acknowledging the fact that the call cannot fail
anymore except for programming errors with the parameters.
This change is inspired by #841.
free() cannot be used with const pointers. However, our _cleanup_free_
handler features cast logic that hides that qualifier, so we don't get a
warning.
The latest consolidation cleanup of write_string_file() revealed some users
of that helper which should have used write_string_file_no_create() in the
past but didn't. Basically, all existing users that write to files in /sys
and /proc should not expect to write to a file which is not yet existant.
Merge write_string_file(), write_string_file_no_create() and
write_string_file_atomic() into write_string_file() and provide a flags mask
that allows combinations of atomic writing, newline appending and automatic
file creation. Change all users accordingly.
Due to our _cleanup_ usage for the udev manager, it will be destroyed
after the "exit:" label has finished. Therefore, it is the last
destruction done in main(). This has two side-effects:
- mac_selinux is destroyed before the udev manager is, possible causing
use-after-free if the manager-cleanup accesses selinux data
- log_close() is called *before* the manager is destroyed, possibly
re-opening the log if you use --debug (and thus not re-applying the
--debug option)
Avoid this by moving the manager-handling into a new function called
run(). This function will be left before we enter the "exit:" label in
main(), hence, the manager object will be destroyed early.
Push the extraction of the envp + argv as close as possible to their use, to avoid code
duplication. As a sideeffect fix logging when delaing execution.
Commit v218-247-g11c6f69 broke the output of the utility. "%1$" PRIu64
"x" expands to "%1$lux", essentially "%lux", which shows the problem.
u and x cannot be combined, u wins as the type character, and x gets
emitted verbatim to stdout.
References: https://bugzilla.redhat.com/show_bug.cgi?id=1227503
Make sure we never close fds before we drop their related event-source.
This will cause horrible disruptions if the fd-num is re-used by someone
else. Under normal conditions, this should not cause any problems as the
close() will drop the fd from the epoll-set automatically. However, this
changes if you have any child processes with a copy of that fd.
This fixes issue #163.
Background:
If you create an epoll-set via epoll_create() (lets call it 'EFD')
you can add file-descriptors to it to watch for events. Whenever
you call EPOLL_CTL_ADD on a file-descriptor you want to watch, the
kernel looks up the attached "struct file" pointer, that this FD
refers to. This combination of the FD-number and the "struct file"
pointer is used as key to link it into the epoll-set (EFD).
This means, if you duplicate your file-descriptor, you can watch
this file-descriptor, too (because the duplicate will have a
different FD-number, hence, the combination of FD-number and
"struct file" is different as before).
If you want to stop watching an FD, you use EPOLL_CTL_DEL and pass
the FD to the kernel. The kernel again looks up your
file-descriptor in your FD-table to find the linked "struct file".
This FD-number and "struct file" combination is then dropped from
the epoll-set (EFD).
Last, but not least: If you close a file-descriptor that is linked
to an epoll-set, the kernel does *NOTHING* regarding the
epoll-set. This is a vital observation! Because this means, your
epoll_wait() calls will still return the metadata you used to
watch/subscribe your file-descriptor to events.
There is one exception to this rule: If the file-descriptor that
you just close()ed was the last FD that referred to the underlying
"struct file", then _all_ epoll-set watches/subscriptions are
destroyed. Hence, if you never dup()ed your FD, then a simple
close() will also unsubscribe it from any epoll-set.
With this in mind, lets look at fork():
Assume you have an epoll-set (EFD) and a bunch of FDs
subscribed to events on that EFD. If you now call fork(),
the new process gets a copy of your file-descriptor table.
This means, the whole table is copied and the "struct
file" reference of each FD is increased by 1. It is
important to notice that the FD-numbers in the child are
exactly the same as in the parent (eg., FD #5 in the child
refers to the same "struct file" as FD #5 in the parent).
This means, if the child calls EPOLL_CTL_DEL on an FD, the
kernel will look up the linked "struct file" and drop the
FD-number and "struct file" combination from the epoll-set
(EFD). However, this will effectively drop the
subscription that was installed by the parent.
To sum up: even though the child gets a duplicate of the
EFD and all FDs, the subscriptions in the EFD are *NOT*
duplicated!
Now, with this in mind, lets look at what udevd does:
Udevd has a bunch of file-descriptors that it watches in its
sd-event main-loop. Whenever a uevent is received, the event is
dispatched on its workers. If no suitable worker is present, a new
worker is fork()ed to handle the event. Inside of this worker, we
try to free all resources we inherited. However, the fork() call
is done from a call-stack that is never rewinded. Therefore, this
call stack might own references that it drops once it is left.
Those references we cannot deduce from the fork()'ed process;
effectively causing us to leak objects in the worker (eg., the
call to sd_event_dispatch() that dispatched our uevent owns a
reference to the sd_event object it used; and drops it again once
the function is left).
(Another example is udev_monitor_ref() for each 'worker' that is
also inherited by all children; thus keeping the udev-monitor and
the uevent-fd alive in all children (which is the real cause for
bug #163))
(The extreme variant is sd_event_source_unref(), which explicitly
keeps event-sources alive, if they're currently dispatched,
knowing that the dispatcher will free the event once done. But
if the dispatcher is in the parent, the child will never ever
free that object, thus leaking it)
This is usually not an issue. However, if such an object has a
file-descriptor embedded, this FD is left open and never closed in
the child.
In manager_exit(), if we now destroy an object (i.e., close its embedded
file-descriptor) before we destroy its related sd_event_source, then
sd-event will not be able to drop the FD from the epoll-set (EFD). This
is, because the FD is no longer valid at the time we call EPOLL_CTL_DEL.
Hence, the kernel cannot figure out the linked "struct file" and thus
cannot remove the FD-number plus "struct file" combination; effectively
leaving the subscription in the epoll-set.
Since we leak the uevent-fd in the children, they retain a copy of the FD
pointing to the same "struct file". Thus, the EFD-subscription are not
automatically removed by close() (as described above). Therefore, the main
daemon will still get its metadata back on epoll_watch() whenever an event
occurs (even though it already freed the metadata). This then causes the
free-after-use bug described in #163.
This patch fixes the order in which we destruct objects and related
sd-event-sources. Some open questions remain:
* Why does source_io_unregister() not warn on EPOLL_CTL_DEL failures?
This really needs to be turned into an assert_return().
* udevd really should not leak file-descriptors into its children. Fixing
this would *not* have prevented this bug, though (since the child-setup
is still async).
It's non-trivial to fix this, though. The stack-context of the caller
cannot be rewinded, so we cannot figure out temporary refs. Maybe it's
time to exec() the udev-workers?
* Why does the kernel not copy FD-subscriptions across fork()?
Or at least drop subscriptions if you close() your FD (it uses the
FD-number as key, so it better subscribe to it)?
Or it better used
FD+"struct file_table*"+"struct file*"
as key to not allow the childen to share the subscription table..
*sigh*
Seems like we have to live with that API forever.
This ports a lot of manual code over to sigprocmask_many() and friends.
Also, we now consistly check for sigprocmask() failures with
assert_se(), since the call cannot realistically fail unless there's a
programming error.
Also encloses a few sd_event_add_signal() calls with (void) when we
ignore the return values for it knowingly.
PROGRAM and IMPORT{program} uses the exit code of the spawn process to decide if a rule matches or not,
a failing process is hence normal operation and not something we should warn about.
We still warn about other types of failing processes.
Now that listen_fds() have been split out, we can safely move the allocation
of the manager object after doing the forking (the fork is done to notify legcay
init-systems that the fds are ready).
Subsequently, we can merge manager_listen() back into managre_new().
This entails a minor behaviour change: the application of permissions to
static device nodes now happens after the fork (but still before notifying
systemd about being ready).
This will simply silently fail on non-systemd systems, so there is no reason
to make it conditional.
Also make it clear that we notify systemd about being ready as the last step
before starting the event loop, whereas the forking might need to happen
earlier.
This should have no behavioural change, but it is odd to tie the cgroup cleaning to
whether or not we are passed sockets.
The point really is if we are guaranteed to be in a dedicated cgroup, so instead
check for our parent being PID1 (we already implicitly only do this on systemd
systems).
We used to block all signals, and restore the original signal mask before exec'ing
external processes.
Now we just block the signals we care about and unconditionally unblock all signals
before exec'ing.
A lot of touch screens use INPUT_PROP_DIRECT to indicate that touch input
maps directly to the underlying screen, while the BTN_TOUCH bit might not be
set.
This change switches to bools and separates bit flag evaluation from
decision making and application of udev properties, while hopefully
keeping the same semantics. Apart from using BTN_LEFT instead of BTN_MOUSE
for mouse detection.