Before this commit bus messages had a single reference count: when it
reached zero the message would be freed. This simple approach meant a
cyclic dependency was typically seen: a message that was enqueued in a
bus connection object would reference the bus connection object but also
itself be referenced by the bus connection object. So far out strategy
to avoid cases like this was: make sure to process the bus connection
regularly so that messages don#t stay queued, and at exit flush/close
the connection so that the message queued would be emptied, and thus the
cyclic dependencies resolved. Im many cases this isn't done properly
however.
With this change, let's address the issue more systematically: let's
break the reference cycle. Specifically, there are now two types of
references to a bus message:
1. A regular one, which keeps both the message and the bus object it is
associated with pinned.
2. A "queue" reference, which is weaker: it pins the message, but not
the bus object it is associated with.
The idea is then that regular user handling uses regular references, but
when a message is enqueued on its connection, then this takes a "queue"
reference instead. This then means that a queued message doesn't imply
the connection itself remains pinned, only regular references to the
connection or a message associated with it do. Thus, if we end up in the
situation where a user allocates a bus and a message and enqueues the
latter in the former and drops all refs to both, then this will detect
this case and free both.
Note that this scheme isn't perfect, it only covers references between
messages and the busses they are associated with. If OTOH a bus message
is enqueued on a different bus than it is associated with cyclic deps
cannot be recognized with this simple algorithm, and thus if you enqueue
a message associated with a bus A on a bus B, and another message
associated with bus B on a bus A, a cyclic ref will be in effect and not
be discovered. However, given that this is an exotic case (though one
that happens, consider systemd-bus-stdio-bridge), it should be OK not to
cover with this, and people have to explicit flush all queues on exit in
that case.
Note that this commit only establishes the separate reference counters
per message. A follow-up commit will start making use of this from the
bus connection object.
Don't try to be smart, don't bypass the ref counting logic if there's no
real reason to.
This matters if we want to tweak the ref counting logic later.
Let's make sure our own code follows coding style and initializes all
return values on all types of success (and leaves it uninitialized in
all types of failure).
Let's do this like we usually do and size arrays with size_t.
We already do this for the "allocated" counter correctly, and externally
we expose the queue sizes as uint64_t anyway, hence there's really no
point in usigned "unsigned" internally.
Apparently this happens IRL. Let's carefully deal with issues like this:
when we overrun, let's not go back to zero but instead leave the highest
cookie bit set. We use that as indication that we are in "overrun
territory", and then are particularly careful with checking cookies,
i.e. that they haven't been used for still outstanding replies yet. This
should retain the quick cookie generation behaviour we used to have, but
permits dealing with overruns.
Replaces: #11804Fixes: #11809
Paths are limited to BUS_PATH_SIZE_MAX but the maximum size is anyway too big
to be allocated on the stack, so let's switch to the heap where there is a
clear way to understand if the allocation fails.
Even though the dbus specification does not enforce any length limit on the
path of a dbus message, having to analyze too long strings in PID1 may be
time-consuming and it may have security impacts.
In any case, the limit is set so high that real-life applications should not
have a problem with it.
FRA_TUN_ID is a 64 big endian integer. Fix the policy.
FRA_TUN_ID is unused by networkd, hence I think this bug
has no actual consequences.
Fixes: bce67bbee3
- RTA_OIF has no business in the routing-rule policy. It is numerical
identical to FRA_GOTO. Fix using the correct enum value. Note that
RTA_OIF/FRA_GOTO was not used by networkd, and the type was already
correct at uint32. So, there is no change in behavior.
- RTA_GATEWAY also does not belong to the routing-rules. It is numerical
identical to FRA_UNUSED2. Obviously, that value is unused as well,
so there is no actual change in behavior either. In particular
that is because:
- kernel would not send messages with FRA_UNUSED2 attribute.
- networkd would not try to parse/send RTA_GATEWAY/FRA_UNUSED2
attributes.
Fixes: bce67bbee3
Follow-up for a3ce813697 and
5ce41697bd.
Before a3ce813697, all properties in
src->properties and src->properties_db are mixed and copied to
dst->properties_db by device_copy_properties().
So, it is not necessary to store data from udev database file to
sd_device::properties_db before copying properties.
But now, properties are not mixed. So, the read data need to be
stored to also ::properties_db.
Fixes#11721.
dbus-daemon might have a slightly different idea of what a valid msg is
than us (for example regarding valid msg and field sizes). Let's hence
try to proceed if we can and thus drop messages rather than fail the
connection if we fail to validate a message.
Hopefully the differences in what is considered valid are not visible
for real-life usecases, but are specific to exploit attempts only.
Devices like the "Microsoft Microsoft® 2.4GHz Transceiver v9.0 Mouse" contain
characters higher than 127. That ® is correctly stored in the hwdb and passed
into the search field during query, but the comparison fails.
Our search string is a const char *, trie_string() returns a const char * but
the current character is cast to uint8_t. This causes anything over 127 to
fail the match. Fix this, we're dealing with characters everywhere here after
all.
When sd_device_enumerator_add_match_parent() is called
multiple times, then previously set parents are discarded.
This adds device_enumerator_add_match_parent_incremental() to make
sd-device-enumerator scan devices under all specified parents.
Note that for backward compatibility, sd_device_enumerator_add_match_parent()
and udev_enumerate_add_match_parent() still discard previous assignments.
Previously, device_copy_properties() copies all properties to both
sd_device::properties and ::properties_db. Thus, on move uevent,
also tentative properties, e.g. DEVPATH or INTERFACE, are stored to
::properties_db, and saved to udev database.
This makes such tentative properties be copied to only ::properties,
and thus not saved to udev database.
Fixes#9426.
It's similar to sd_bus_flush_close_unref() but doesn't do the flushing.
This is useful since this will still discnnect the connection properly
but not synchronously wait for the peer to take our messages.
Primary usecase is within _cleanup_() expressions where synchronously
waiting on the peer is not OK.
Found by inspecting results of running this small program:
int main(int argc, const char **argv) {
for (int i = 1; i < argc; i++) {
FILE *f;
char line[1024], prev[1024], *r;
int lineno;
prev[0] = '\0';
lineno = 1;
f = fopen(argv[i], "r");
if (!f)
exit(1);
do {
r = fgets(line, sizeof(line), f);
if (!r)
break;
if (strcmp(line, prev) == 0)
printf("%s:%d: error: dup %s", argv[i], lineno, line);
lineno++;
strcpy(prev, line);
} while (!feof(f));
fclose(f);
}
}
By b1c097af8d (#10239), the receive buffer
size for uevents was set by SO_RCVBUF at first, and fallback to
use SO_RCVBUFFORCE. So, as SO_RCVBUF limits to the buffer size
net.core.rmem_max, which is usually much smaller than 128MB udevd requests,
uevents buffer size was not sufficient.
This fixes the ordering of the request: SO_RCVBUFFORCE first, and
fallback to SO_RCVBUF. Then, udevd's uevent buffer size can be set to
128MB.
This also revert 903893237a.
Fixes#11314 and #10754.
Empty line between setting the output parameter and return is removed. I like
to think about both steps as part of returning from the function, and there's
no need to separate them.
Similarly, if we need to unset a pointer after successfully passing ownership,
use TAKE_PTR and do it immediately after the ownership change, without an empty
line inbetween.
It seems quite useful to provide this additional information in public exported
functions.
This is a c99 feature, not supported in C++. Without the check in _sd-common.h:
FAILED: test-bus-vtable-cc@exe/src_libsystemd_sd-bus_test-bus-vtable-cc.cc.o
...
In file included from ../src/libsystemd/sd-bus/test-bus-vtable-cc.cc:9:
In file included from ../src/systemd/sd-bus-vtable.h:26:
In file included from ../src/systemd/sd-bus.h:26:
../src/systemd/sd-id128.h:38:47: error: static array size is a C99 feature, not permitted in C++
char *sd_id128_to_string(sd_id128_t id, char s[static SD_ID128_STRING_MAX]);
^
In .c files, I opted to use the define for consistency, even though we don't support
compilation with a C++ compiler, so the unconditional keyword would work too.
As devpath may not be set yet.
When debug logging is enabled, log_device_*() calls
sd_device_get_sysname(). So, we should not assume that devpath is always
set.
Fixes#11258.
Continuation of a3ebe5eb620e49f0d24082876cafc7579261e64f:
in other places we sometimes use assert_se(), and sometimes normal error
handling. sigfillset and sigaddset can only fail if mask is NULL (which cannot
happen if we are passing in a reference), or if the signal number is invalid
(which really shouldn't happen when we are using a constant like SIGCHLD. If
SIGCHLD is invalid, we have a bigger problem). So let's simplify things and
always use assert_se() in those cases.
In sigset_add_many() we could conceivably pass an invalid signal, so let's keep
normal error handling here. The caller can do assert_se() around the
sigprocmask_many() call if appropriate.
'>= 0' is used for consistency with the rest of the codebase.
The idea was that those vars could be configured to 'no' to not install the .pc
files, or they could be set to '', and then they would be built but not
installed. This was inherited from the autoconf build system. This couldn't
work because '' is replaced by the default value. Also, having this level of
control doesn't seem necessary, since creating those files is very
quick. Skipping with 'no' was implemented only for systemd.pc and not the other
.pc files. Let's simplify things and skip installation if the target dir
is configured as 'no' for all .pc files.
Let's not use atoi() if we can simply provide the project version as a number.
In C code, this is the numerical project version. In substitutions in other
files, this is just the bare substitution.
The "PACKAGE_" prefix is from autotools, and is strange. We call systemd a
"project", and "package" is something that distros build. Let's rename.
PACKAGE_URL is renamed to PROJECT_URL for the same reasons and for consistency.
(This leave PACKAGE_VERSION as the stringified define for C code.)
_c is misleading because .h files should be included in those lists too
(this tells meson that the build outputs should be rebuilt if the header
files change).
Follow-up for 1437822638.
Normally, we don't care too much about what pahole reports. But this structure
could potentially be allocated for every device on the system, i.e. in a large
number of copies. 5 vs 7 cache lines is nice.
/* size: 400, cachelines: 7, members: 53 */
/* sum members: 330, holes: 12, sum holes: 70 */
/* last cacheline: 16 bytes */
/* size: 320, cachelines: 5, members: 53 */
/* bit holes: 1, sum bit holes: 6 bits */
/* bit_padding: 5 bits */
We had two very similar functions: device_read_db_aux and device_read_db,
and a number of wrappers for them:
device_read_db_aux
← device_read_db (in sd-device.c)
← all functions in sd-device.c, including sd_device_is_initialized
← device_read_db_force
← event_execute_rules_on_remove (in udev-event.c)
device_read_db (in device-private.c)
← functions in device_private.c (but not device_read_db_force):
device_get_devnode_{mode,uid,gid}
device_get_devlink_priority
device_get_watch_handle
device_clone_with_db
← called from udevadm, udev-{node,event,watch}.c
Before 7141e4f62c (sd-device: don't retry loading
uevent/db files more than once), the two implementations were the same. In that
commit, device_read_db_aux was changed. Those changes were reverted in the parent
commit, so the two implementations are now again the same except for superficial
differences. This commit removes device_read_db (in sd-device.c), and renames
device_read_db_aux to device_read_db_internal and makes everyone use this one
implementation. There should be no functional change.
This mostly reverts "sd-device: don't retry loading uevent/db files more than
once", 7141e4f62c. We will retry if we couldn't
access the file, but not if parsing failed.
Not re-reading the database at all just doesn't seem like a good idea. We have
two implementations of device_read_db, and one does that, and the other retries
to read the db. Re-reading seems more useful, since we can create the object
and then access properties as some later time when we know that the device has
been initialized and we can get useful results. Otherwise, we force the user to
destroy this object and create a new one.
This changes device_read_uevent_file() and device_read_db_aux(). See next
commit for description of where those functions are used.
From WordNet (r) 3.0 (2006) [wn]:
time-out
n 1: a brief suspension of play; "each team has two time-outs left"
From The Free On-line Dictionary of Computing (18 March 2015) [foldoc]:
timeout
A period of time after which an error condition is raised if
some event has not occured. A common example is sending a
message. If the receiver does not acknowledge the message
within some preset timeout period, a transmission error is
assumed to have occured.
This has been irritating me for quite a while: let's prefix these enum
values with a common prefix, like we do for almost all other enums.
No change in behaviour, just some renaming.
From the results of CIs in #11076, changing buffer size may cause
issue #10754. So, let's prohibit to change the size if it is already
bound.
This also reverts commit 986ab0d2dc.
If the socket fd is passed by PID1, then it is created by .socket unit
and we have already set sufficient option(s) for the socket.
So, let's not touch the passed socket.
This cleans up a bit how we set up things for the ELF section magic:
1. Let's always use our gcc macros, instead of __attribute__ directly
2. Align our structures to sizeof(void*), i.e. the pointer size, rather
than a fixed 8 or __BIGGEST_ALIGNMENT__. The former is unnecessarily
high for 32bit systems, the latter too high for 64bit systems. gcc
seems to use ptr alignment for static variables itself, hence this
should be good enough for us too. Moreover, the Linux kernel also
uses pointer alginment for all its ELF section registration magic,
hence this should be good enough for us too.
3. Let's always prefix the sections we create ourself with SYSTEMD_,
just to make clear where they come from.
4. Always align the pointer we start from when iterating through these
lists. This should be unnecessary, but makes things nicely
systematic, as we'll align all pointers we use to access these
sections properly.
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.
No code changes, just some rearranging of source files.
Whenever we invoke external, foreign code from code that has
RLIMIT_NOFILE's soft limit bumped to high values, revert it to 1024
first. This is a safety precaution for compatibility with programs using
select() which cannot operate with fds > 1024.
This commit adds the call to rlimit_nofile_safe() to all invocations of
exec{v,ve,l}() and friends that either are in code that we know runs
with RLIMIT_NOFILE bumped up (which is PID 1 and all journal code for
starters) or that is part of shared code that might end up there.
The calls are placed as early as we can in processes invoking a flavour
of execve(), but after the last time we do fd manipulations, so that we
can still take benefit of the high fd limits for that.