If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.
systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.
Fixes: #2739
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
ubsan complains that we add an offset to a NULL ptr here in some cases.
Which isn't really a bug though, since we only use it as the end
condition for a for loop, but we can still fix it...
Fixes: #15522
A warning is emitted from sd_bus_message_{get,set}_priority. Those functions
are exposed by pystemd, so we have no easy way of checking if anything is
calling them.
Just making the functions always return without doing anything would be an
option, but then we could leave the caller with an undefined variable. So I
think it's better to make the functions emit a warnings and return priority=0
in the get operation.
Consumers of the sd-bus convenience API can't make convenience
helpers of their own without va_list variants.
This commit is a mechanical change splitting out the existing function
bodies into bare va_list variants having a 'v' suffixed to the names.
The original functions now simply create the va_list before forwarding
the call on to the va_list variant, and the va_list variants dispense
with those steps.
sd_bus_reply_method_errno already does the same two checks
(sd_bus_error_is_set(error), r < 0) internally. But it did them in opposite
order. The effect is the same, because sd_bus_reply_method_errno falls back to
sd_bus_reply_method_error, but it seems inelegant. So let's simplify
bus_maybe_reply_error() to offload the job fully to sd_bus_reply_method_errno().
No functional change.
So far we had various ad hoc APIs to query search paths:
systemd-analyze unit-paths, lookup_paths_log(), the pkgconfig file,
debug logs emitted by systemd-analyze cat-config.
But answering a simple question "what is the search path for tmpfiles,
sysusers, .network files, ..." is surprisingly hard.
I think we should have an api that makes it easy to query this. Pkgconfig is
not bad, but it is primarily a development tool, so it's not available in many
context. Also it can't provide support for paths which are influenced by
environment variables, and I'd like to be able to answer the question "what is
the search path for ..., assuming that VAR_FOO=... is set?".
Extending sd-path to support more of our internal paths seems to be most
flexible solution. We already have systemd-path which provides a nice
way to query, and we can add stuff like optional descriptions later on.
We we essentially get a nice programmatic and commmandline apis for the price
of one.
The two functions were duplicating a lot of functionality and more
importantly, they both had explicit lists of types which are search
paths. I want to add more types, and I don't want to have to remember
to add them to both lists.
I think the two names were both pretty bad. They did not give a proper hint
what the difference between the two functions is, and sd_path_home sounds like
it is somehow related to /home or home directories or whatever, when in fact
both functions return the same set of paths as either a colon-delimited string
or a strv. "_strv" suffix is used by various functions in sd-bus, so let's
reuse that.
Those functions are not public yet, so let's rename.
In those functions where bus defaults to the m->bus, we should also
resolve the magic parameters. And if neither called with bus=NULL
and an unattached message, return properly instead of crashing in assert
later.
It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().
socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
When authorizing via PolicyKit we want to process incoming method calls
twice: once to process and figure out that we need PK authentication,
and a second time after we aquired PK authentication to actually execute
the operation. With this new call sd_bus_enqueue_for_read() we have a
way to put an incoming message back into the read queue for this
purpose.
This might have other uses too, for example debugging.
Let's remove a function of questionnable utility.
strv_clear() frees the items of a string array, but not the array
itself. i.e. it half-drestructs a string array and makes it empty. This
is not too useful an operation since we almost never need to just do
that, we also want to free the whole thing. In fact, strv_clear() is
only used in one of our .c file, and there it appears like unnecessary
optimization, given that for each array with n elements it leaves the
number of free()s we need to at O(n) which is not really an optimization
at all (it goes from n+1 to n, that's all).
Prompted by the discussions on #14605
We don't need a seperate output parameter that is of type int. glibc() says
that the type is "unsigned", but the kernel thinks it's "int". And the
"alternative names" interface also uses ints. So let's standarize on ints,
since it's clearly not realisitic to have interface numbers in the upper half
of unsigned int range.
This only matters for the case where new networkctl is running against older
networkd. We should still handle the old error to avoid unnecessary warning
about speedmeeter being disabled.
This partially reverts commit e813de549b.
The combination of sd_netlink_message_enter_container() and
sd_netlink_message_read_string() only reads the last element if the attribute is
duplicated, such a situation easily happens for IFLA_ALT_IFNAME.
The function introduced here reads all matched attributes.
Properties marked this way really shouldn't be sent around willy-nilly,
that's what the flag is about, hence exclude it from InterfacesAdded
signals (and in fact anything that is a signal).
This allows marking messages that contain "sensitive" data with a flag.
If it's set then the messages are erased from memory when the message is
freed.
Similar, a flag may be set on vtable entries: incoming/outgoing message
matching the entry will then automatically be flagged this way.
This is supposed to be an easy method to mark messages containing
potentially sensitive data (such as passwords) for proper destruction.
(Note that this of course is only is as safe as the broker in between is
doing something similar. But let's at least not be the ones at fault
here.)
We already refuse sd_event_add_signal() if the specified signal is not
blocked, let's do this also for sd_event_add_child(), since we might
need signalfd() to implement this, and this means the signal needs to be
blocked.
This adds support for watching for process exits via Linux new pidfd
concept. This makes watching processes and killing them race-free if
properly used, fixing a long-standing UNIX misdesign.
This patch adds implicit and explicit pidfd support to sd-event: if a
process shall be watched and is specified by PID we will now internally
create a pidfd for it and use that, if available. Alternatively a new
constructor for child process event sources is added that takes pidfds
as input.
Besides mere watching of child processes via pidfd two additional
features are added:
→ sd_event_source_send_child_signal() allows sending a signal to the
process being watched in the safest way possible (wrapping
the new pidfd_send_signal() syscall).
→ sd_event_source_set_child_process_own() allows marking a process
watched for destruction as soon as the event source is freed. This is
currently implemented in userspace, but hopefully will become a kernel
feature eventually.
Altogether this means an sd_event_source object is now a safe and stable
concept for referencing processes in race-free way, with automatic
fallback to pre-pidfd kernels.
Note that this patch adds support for this only to sd-event, not to PID
1. That's because PID 1 needs to use waitid(P_ALL) for reaping any
process that might get reparented to it. This currently semantically
conflicts with pidfd use for watching processes since we P_ALL is
undirected and thus might reap process earlier than the pidfd notifies
process end, which is hard to handle. The kernel will likely gain a
concept for excluding specific pidfds from P_ALL watching, as soon as
that is around we can start making use of this in PID 1 too.
From v4.12 the kernel appends some attributes to netlink acks
containing a textual description of the error and other fields.
This makes sd-netlink parse the attributes.
Virtual filesystems such as sysfs or procfs use kernfs, and kernfs can work
with two sorts of virtual files.
One sort uses "seq_file", and the results of the first read are buffered for
the second read. The other sort uses "raw" reads which always go direct to the
device.
In the later case, the content of the virtual file must be retrieved with a
single read otherwise subsequent read might get the new value instead of
finding EOF immediately. That's the reason why the usage of fread(3) is
prohibited in this case as it always performs a second call to read(2) looking
for EOF which is subject to the race described previously.
Fixes: #13585.
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
sd-netlink is not public yet, so we can change the interface.
I did not touch interfaces of functions like sd_netlink_wait() and
sd_rtnl_message_new_link() which do not modify the object that is passed in,
because in the future we might want to change the code to e.g. take a
reference to the parent object or otherwise require a non-const reference.
It is natural that n_attiributes is less than type. But in that case,
the message does not contain any message about the type. So, we should
not abort execution with assertion, but just return -ENODATA.
This avoid the use of the global variable.
Also rename cgroup_unified_update() to cgroup_unified_cached() and
cgroup_unified_flush() to cgroup_unified() to better reflect their new roles.
This code is not part of the public API of sd-netlink, nor used by it
internally and hence should not be in the sd-netlink directory.
Also, move the test case for it to src/test/.
This tweaks match installation a bit: the match callbacks are now only
called for messages read after the AddMatch() reply was received and
never anything already read before. Thus, installing a match gives you a
time guarantee: only messages received after it will be matched.
This is useful when listening to PropertiesChanged signals as an example
to ensure that only changes after the point the match was installed are
honoured, nothing before.
If AddMatch() doesn't work, let's destroy the slot for it too as soon as
we received the failure for it.
This way the mere existance of the slot tells us whether the AddMatch()
method call is still pending or is complete.
Let's count incoming messages and attach the current counter when we
first read them to the message objects. This allows us to nicely order
messages later on.
Because it's not a device path and (slightly) bad things happen if it
gets confused with one:
$ udevadm info /sys/
Assertion 'device->devpath[0] == '/'' failed at
../src/libsystemd/sd-device/sd-device.c:958,
function sd_device_get_devpath(). Aborting.
Aborted (core dumped)