While investigating why some of my netlink calls would timeout I
stumbled upon the definition of the max write queue length. Finding this
constant made me believe we still had a write queue in the code - which
isn't true. The netlink write queue code was removed in #189.
When set, the offset specified for the vtable entry is passed to the
handler as-is, and is not added to the userdata pointer. This is useful
in case methods/properties are mixed on the same vtable, that expect to
operate relative to some object in memory and that expect pointers to
absolute memory, or that just want a number passed.
This also makes sure the control buffer is properly aligned. This
matters, as otherwise the control buffer might not be aligned and the
cmsg buffer counting might be off. The incorrect alignment is becoming
visible by using recvmsg_safe() as we suddenly notice the MSG_CTRUNC bit
set because of this.
That said, apparently this isn't enough to make this work on all
kernels. Since I couldn't figure this out, we now add 1K to the buffer
to be sure. We do this once already, also for a pktinfo structure
(though an IPv4/IPv6) one. I am puzzled by this, but this shouldn't
matter much. it works locally just fine, except for those ubuntu CI
kernels...
While we are at it, make some other changes too, to simplify and
modernize the function.
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
We need to use the CMSG_SPACE() macro to size the control buffers, not
CMSG_LEN(). The former is rounded up to next alignment boundary, the
latter is not. The former should be used for allocations, the latter for
encoding how much of it is actually initialized. See cmsg(3) man page
for details about this.
Given how confusing this is, I guess we don't have to be too ashamed
here, in most cases we actually did get this right.
Our hashmap and set helpers return a different code whenever an entry
already exists, so let's use this to avoid unsetting scan_uptodate when
not necessary.
Thus, the return convention for
sd_device_enumerator_add_match_subsystem,
sd_device_enumerator_add_match_sysattr,
sd_device_enumerator_add_match_property,
sd_device_enumerator_add_match_sysname,
sd_device_enumerator_add_match_tag,
device_enumerator_add_match_parent_incremental,
sd_device_enumerator_add_match_parent,
sd_device_enumerator_allow_uninitialized,
device_enumerator_add_match_is_initialized
is that "1" is returned if action was taken, and "0" on noop.
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.
systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.
Fixes: #2739
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
ubsan complains that we add an offset to a NULL ptr here in some cases.
Which isn't really a bug though, since we only use it as the end
condition for a for loop, but we can still fix it...
Fixes: #15522
A warning is emitted from sd_bus_message_{get,set}_priority. Those functions
are exposed by pystemd, so we have no easy way of checking if anything is
calling them.
Just making the functions always return without doing anything would be an
option, but then we could leave the caller with an undefined variable. So I
think it's better to make the functions emit a warnings and return priority=0
in the get operation.
Consumers of the sd-bus convenience API can't make convenience
helpers of their own without va_list variants.
This commit is a mechanical change splitting out the existing function
bodies into bare va_list variants having a 'v' suffixed to the names.
The original functions now simply create the va_list before forwarding
the call on to the va_list variant, and the va_list variants dispense
with those steps.
sd_bus_reply_method_errno already does the same two checks
(sd_bus_error_is_set(error), r < 0) internally. But it did them in opposite
order. The effect is the same, because sd_bus_reply_method_errno falls back to
sd_bus_reply_method_error, but it seems inelegant. So let's simplify
bus_maybe_reply_error() to offload the job fully to sd_bus_reply_method_errno().
No functional change.
So far we had various ad hoc APIs to query search paths:
systemd-analyze unit-paths, lookup_paths_log(), the pkgconfig file,
debug logs emitted by systemd-analyze cat-config.
But answering a simple question "what is the search path for tmpfiles,
sysusers, .network files, ..." is surprisingly hard.
I think we should have an api that makes it easy to query this. Pkgconfig is
not bad, but it is primarily a development tool, so it's not available in many
context. Also it can't provide support for paths which are influenced by
environment variables, and I'd like to be able to answer the question "what is
the search path for ..., assuming that VAR_FOO=... is set?".
Extending sd-path to support more of our internal paths seems to be most
flexible solution. We already have systemd-path which provides a nice
way to query, and we can add stuff like optional descriptions later on.
We we essentially get a nice programmatic and commmandline apis for the price
of one.
The two functions were duplicating a lot of functionality and more
importantly, they both had explicit lists of types which are search
paths. I want to add more types, and I don't want to have to remember
to add them to both lists.