This tests a couple of corner cases of the sd-event API including
changing priorities of existing event sources, as well as overflow
conditions of the inotify queue.
This adds a new call sd_event_add_inotify() which allows watching for
inotify events on specified paths.
sd-event will try to minimize the number of inotify fds allocated, and
will try to add file watches to the same inotify fd objects as far as
that's possible. Doing this kind of inotify object should optimize
behaviour in programs that watch a limited set of mostly independent
files as in most cases a single inotify object will suffice for watching
all files.
Traditionally, this kind of coalescing logic (i.e. that multiple event
sources are implemented on top of a single inotify object) was very hard
to do, as the inotify API had serious limitations: it only allowed
adding watches by path, and would implicitly merge watches installed on
the same node via different path, without letting the caller know about
whether such merging took place or not.
With the advent of O_PATH this issue can be dealt with to some point:
instead of adding a path to watch to an inotify object with
inotify_add_watch() right away, we can open the path with O_PATH first,
call fstat() on the fd, and check the .st_dev/.st_ino fields of that
against a list of watches we already have in place. If we find one we
know that the inotify_add_watch() will update the watch mask of the
existing watch, otherwise it will create a new watch. To make this
race-free we use inotify_add_watch() on the /proc/self/fd/ path of the
O_PATH fd, instead of the original path, so that we do the checking and
watch updating with guaranteed the same inode.
This approach let's us deal safely with inodes that may appear under
various different paths (due to symlinks, hardlinks, bind mounts, fs
namespaces). However it's not a perfect solution: currently the kernel
has no API for changing the watch mask of an existing watch -- unless
you have a path or fd to the original inode. This means we can "merge"
the watches of the same inode of multiple event sources correctly, but
we cannot "unmerge" it again correctly in many cases, as access to the
original inode might have been lost, due to renames, mount/unmount, or
deletions. We could in theory always keep open an O_PATH fd of the inode
to watch so that we can change the mask anytime we want, but this is
highly problematics, as it would consume too many fds (and in fact the
scarcity of fds is the reason why watch descriptors are a separate
concepts from fds) and would keep the backing mounts busy (wds do not
keep mounts busy, fds do). The current implemented approach to all this:
filter in userspace and accept that the watch mask on some inode might
be higher than necessary due to earlier installed event sources that
might have ceased to exist. This approach while ugly shouldn't be too
bad for most cases as the same inodes are probably wacthed for the same
masks in most implementations.
In order to implement priorities correctly a seperate inotify object is
allocated for each priority that is used. This way we get separate
per-priority event queues, of which we never dequeue more than a few
events at a time.
Fixes: #3982
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
We currently return -ENOMEDIUM when /etc/machine-id is empty, and -EINVAL when
it is all zeros. But -EINVAL is also used for invalid args. The distinction
between empty and all-zero is not very important, let's use the same return
code.
Also document -ENOENT and -ENOMEDIUM since they can be a bit surprising.
When we allocate an asynchronous match object we will allocate an
asynchronous bus call object to install the match server side.
Previously the call slot would be created as regular slot, i.e.
non-floating which meant installing the match even if it was itself
floating would result in a non-floating slot to be created internally,
which ultimately would mean the sd_bus object would be referenced by it,
and thus never be freed.
Let's fix that by making the match method callback floating in any case
as we have no interest in leaving the bus allocated beyond the match
slot.
Fixes: #8551
This new call allows explicit control of the "floating" state of a bus
slot object. This is useful for creating a bus slot object first,
retaining a reference to it, using it for making changes to the slot
object (for example, set a description) and then handing it over to
sd-bus for lifecycle management.
It's also useful to fix#8551.
This adds a small service "systemd-portabled" and a matching client
"portablectl", which implement the "portable service" concept.
The daemon implements the actual operations, is PolicyKit-enabled and is
activated on demand with exit-on-idle.
Both the daemon and the client are an optional build artifact, enabled
by default rhough.
Most our other parsing functions do this, let's do this here too,
internally we accept that anyway. Also, the closely related
load_env_file() and load_env_file_pairs() also do this, so let's be
systematic.
Let's tweak the assignment of errors a bit, and automatically abs()
errnos, similar to how log_error_errno() and friends does it.
Macros are fine to use, but regular functions usually preferable if
there's no reason to use macros, because they avoid multiple evaluation
and suchlike. Hence, let's just use a regular funciton for assigning
errors, instead of macros.
Follow-up for #8993
The protocol is that a string is serialized with the nul byte at the end, and
the terminator is included in length. We'd call strndup with offset 0, length
len1-1, and then a second time with offset len1, length len2-1, so in the end
the check was off by one. But let's require the terminating nul too, even if
we don't access it.
CID #1383035.
This is external data, even if trusted. We should not assert on it, but verify
and return proper error instead, which assert_return does. In particular,
write(2) says that a partial write could occur when interupted by a signal.
When compiled with asserts disabled, we could access memory outside of the
allocated buffer.
CID #1237671.
Follow-up for 1a96c8e1cc.
We were inconsitently using them in some cases, but in majority not.
Using assignment in assert_se is very common, not an exception like in
'if', so let's drop the extra parens everywhere.
The leak can be reproduced by running systemd-path --suffix .tmp under valgrind or asan:
$ ./build/systemd-path --suffix .tmp search-binaries
/usr/local/bin/.tmp:/usr/bin/.tmp:/usr/local/sbin/.tmp:/usr/sbin/.tmp:/home/vagrant/.local/bin/.tmp:/home/vagrant/bin/.tmp
=================================================================
==19177==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 56 byte(s) in 1 object(s) allocated from:
*0 0x7fd6adf72850 in malloc (/lib64/libasan.so.4+0xde850)
*1 0x7fd6ad2b93d2 in malloc_multiply ../src/basic/alloc-util.h:69
*2 0x7fd6ad2bafd2 in strv_split ../src/basic/strv.c:269
*3 0x7fd6ad42ba67 in search_from_environment ../src/libsystemd/sd-path/sd-path.c:409
*4 0x7fd6ad42bffe in get_search ../src/libsystemd/sd-path/sd-path.c:482
*5 0x7fd6ad42c55b in sd_path_search ../src/libsystemd/sd-path/sd-path.c:607
*6 0x7fd6ad42b3a2 in sd_path_home ../src/libsystemd/sd-path/sd-path.c:348
*7 0x55f59c65ebea in print_home ../src/path/path.c:97
*8 0x55f59c65f157 in main ../src/path/path.c:177
*9 0x7fd6abaea009 in __libc_start_main (/lib64/libc.so.6+0x21009)
Indirect leak of 68 byte(s) in 5 object(s) allocated from:
*0 0x7fd6adf72850 in malloc (/lib64/libasan.so.4+0xde850)
*1 0x7fd6abb5f689 in strndup (/lib64/libc.so.6+0x96689)
Indirect leak of 25 byte(s) in 1 object(s) allocated from:
*0 0x7fd6adf72850 in malloc (/lib64/libasan.so.4+0xde850)
*1 0x7fd6abb5f689 in strndup (/lib64/libc.so.6+0x96689)
*2 0x6c2e2f746e617266 (<unknown module>)
SUMMARY: AddressSanitizer: 149 byte(s) leaked in 7 allocation(s).
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
This cleans up handling of MTU values across the codebase. Previously
MTU values where stored sometimes in uint32_t, sometimes in uint16_t,
sometimes unsigned and sometimes in size_t. This now unifies this to
uint32_t across the codebase, as that's what netlink spits out, and what
the majority was already using.
Also, all MTU parameters are now parsed with config_parse_mtu() and
config_parse_ipv6_mtu() is dropped as it is now unneeded.
(Note there is one exception for the MTU typing: in the DCHPv4 code we
continue to process the MTU as uint16_t value, as it is encoded like
that in the protocol, and it's probably better stay close to the
protocol there.)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.