This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
As suggested here:
https://github.com/systemd/systemd/pull/4296#issuecomment-251911349
Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that
we can use a protocol-independent socket here if possible. This has the benefit
that our code will still work even if AF_INET/AF_INET6 is made unavailable (for
exmple via seccomp), at least on current kernels.
If we find duplicates in a property-lookup, make sure to order them by
their origin. That is, matches defined "later" take precedence over
earlier matches. The "later"-order is defined by file-name + line-number
combination. That is, if a match is defined below another one in the
same hwdb file, it takes precedence, same as if it is defined in a file
ordered after another one.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Extend the hwdb to store the source file-name and file-number for each
property. We simply extend the stored value struct with the new
information. It is fully backwards compatible and old readers will
continue to work.
The libudev/sd-hwdb reader is updated in a followup.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
It is not legal to use hard-coded types to calculate offsets. We must
always use the offsets of the hwdb header to calculate those. Otherwise,
we will break horribly if run on hwdb files written by other
implementations or written with future extensions.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Let's bump it further, as this the current limit turns out to be problematic
IRL. Let's bump it to more than twice what we know of is needed.
Fixes: #4068
Old libdbus has a feature that the process is terminated whenever the the bus
connection receives a disconnect. This is pretty useful on desktop apps (where
a disconnect indicates session termination), as well as on command line apps
(where we really shouldn't stay hanging in most cases if dbus daemon goes
down).
Add a similar feature to sd-bus, but make it opt-in rather than opt-out, like
it is on libdbus. Also, if the bus is attached to an event loop just exit the
event loop rather than the the whole process.
If the server side kicks us from the bus, from our view no names are on the bus
anymore, hence let's make sure to dispatch all tracking objects immediately.
In order to add a name to a bus tracking object we need to do some bus
operations: we need to check if the name already exists and add match for it.
Both are synchronous bus calls. While processing those we need to make sure
that the tracking object is not dispatched yet, as it might still be empty, but
is not going to be empty for very long.
hence, block dispatching by removing the object from the dispatch queue while
adding it, and readding it on error.
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface.
The two calls may be used by clients to pin a unit into memory, so that various
runtime properties aren't flushed out by the automatic GC. This is necessary
to permit clients to race-freely acquire runtime results (such as process exit
status/code or accumulated CPU time) on successful service termination.
Ref() and Unref() are fully recursive, hence act like the usual reference
counting concept in C. Taking a reference is a privileged operation, as this
allows pinning units into memory which consumes resources.
Transient units may also gain a reference at the time of creation, via the new
AddRef property (that is only defined for transient units at the time of
creation).
This adds an optional "recursive" counting mode to sd_bus_track. If enabled
adding the same name multiple times to an sd_bus_track object is counted
individually, so that it also has to be removed the same number of times before
it is gone again from the tracking object.
This functionality is useful for implementing local ref counted objects that
peers make take references on.
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies. This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is. In preparation, this patch renames cg_unified() to
cg_all_unified().
This patch doesn't cause any functional changes.
Accept both files with and without trailing newlines. Apparently some rkt
releases generated them incorrectly, missing the trailing newlines, and we
shouldn't break that.
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).
For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.
A simple way to test this is:
systemd-run -p DynamicUser=1 /bin/sleep 99999
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.
The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().
In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
This extends the existing event loop iteration counter to 64bit, and exposes it
via a new function sd_event_get_iteration(). This is helpful for cases like
issue #3612. After all, since we maintain the counter anyway, we might as well
expose it.
(This also fixes an unrelated issue in the man page for sd_event_wait() where
micro and milliseconds got mixed up)
The 'drivers' pseudo-subsystem needs special treatment. These pseudo-devices are
found under /sys/bus/drivers/, so needs the real subsystem encoded
in the device_id in order to be resolved.
The reader side already assumed this to be the case.
If "systemctl -H" is used, let's make sure we first terminate the bus
connection, and only then close the pager. If done in this order ssh will get
an EOF on stdin (as we speak D-Bus through ssh's stdin/stdout), and then
terminate. This makes sure the standard error we were invoked on is released by
ssh, and only that makes sure we don't deadlock on the pager which waits for
all clients closing its input pipe.
(Similar fixes for the various other xyzctl tools that support both pagers and
-H)
Fixes: #3543
On larger systems we might very well see messages with thousands of parts.
When we free them, we must avoid recursing into each part, otherwise we
very likely get stack overflows.
Fix sd_netlink_message_unref() to use an iterative approach rather than
recursion (also avoid tail-recursion in case it is not optimized by the
compiler).
The statemachine was unable to parse properties with empty values,
reported in [0].
When reaching the start of the KEY, we would unconditionally read
one more character before starting to look for the end-of-line.
Simply look for the end-of-line from the first character.
[0]: <https://bugzilla.redhat.com/show_bug.cgi?id=1338823>
This is now the recommended way to do monitoring by upstream D-Bus.
It's also allowed in the default policy, whereas eavesdrop is not
anymore, which effectively broke busctl on many systems.
That function doesn't draw anything on it's own, just returns a string, which
sometimes is more than one character. Also remove "DRAW_" prefix from character
names, TREE_* and ARROW and BLACK_CIRCLE are unambigous on their own, don't
draw anything, and are always used as an argument to special_glyph().
Rename "DASH" to "MDASH", as there's more than one type of dash.
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
Introduce
1. sd_rtnl_message_route_set_table to set table ID
2. sd_rtnl_message_route_set_family to set family
Both required to configure route properties.
* sd-netlink: permit RTM_DELLINK messages with no ifindex
This is useful for removing network interfaces by name.
* nspawn: explicitly remove veth links we created after use
Sometimes the kernel keeps veth links pinned after the namespace they have been
joined to died. Let's hence explicitly remove veth links after use.
Fixes: #2173
Also, expose this via the "journalctl --file=-" syntax for STDIN. This feature
remains undocumented though, as it is probably not too useful in real-life as
this still requires fds that support mmaping and seeking, i.e. does not work
for pipes, for which reading from STDIN is most commonly used.
Before we invoke now(CLOCK_BOOTTIME), let's make sure we actually have that
clock, since now() will otherwise hit an assert.
Specifically, let's refuse CLOCK_BOOTTIME early in sd-event if the kernel
doesn't actually support it.
This is a follow-up for #3037, and specifically:
https://github.com/systemd/systemd/pull/3037#issuecomment-210199167
It was added in 2.6.39, and causes an assertion to fail when running in mock
hosted on 2.6.32-based RHEL-6:
Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at systemd/src/basic/time-util.c:70, function now(). Aborting.
If the SD_BUS_CREDS_SUPPLEMENTARY_GIDS value is requested, the pid is
queried to find out the supplementary gids value from /proc/pid/status.
Otherwise sd_bus_creds_get_supplementary_gids() won't work unless some
other value in mask triggered fetching the pid information.
We don't allow using config symlinks to enable units, but the error message we
printed was awful. Fix that, and generate a more readable error.
Fixes#3010.
strjoina() is unsafe to be used in an unbounded loop as alloca() has no error
reporting. Thus devices with a large number of tags or devlinks trigger a
segfault in device_properties_prepare() due to overflowing the stack.
Rewrite the building of the "tags" and "devlinks" strings using
GREEDY_REALLOC() and strpcpy() to work with arbitrarily long strings. This also
avoids re-copying the entire string in each loop iteration.
Before this commit we always appended one final ":" to "tags". Change this to
start with an iniital ":" and for each tag append instead of prepend a ":".
This unifies what happens for the first and all subsequent tags so that we can
use a for loop.
Fixes#2954
Many subsystems define own pager_open_if_enabled() function which
checks '--no-pager' command line argument and open pager depends
on its value. All implementations of pager_open_if_enabled() are
the same. Let's merger this function with pager_open() from the
shared/pager.c and remove pager_open_if_enabled() from all subsytems
to prevent code duplication.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
This reworks the sd-lldp substantially, simplifying things on one hand, and
extending the logic a bit on the other.
Specifically:
- Besides the sd_lldp object only one other object is maintained now,
sd_lldp_neighbor. It's used both as storage for literal LLDP packets, and for
maintainging info about peers in the database. Separation between packet, TLV
and chassis data is not maintained anymore. This should be a major
simplification.
- The sd-lldp API has been extended so that a couple of per-neighbor fields may
be queried directly, without iterating through the object. Other fields that
may appear multiple times, OTOH have to be iterated through.
- The maximum number of entries in the neighbor database is now configurable
during runtime.
- The generation of callbacks from sd_lldp objects is more restricted:
callbacks are only invoked when actual data changed.
- The TTL information is now hooked with a timer event, so that removals from
the neighbor database due to TTLs now result in a callback event.
- Querying LLDP neighbor database will now return a strictly ordered array, to
guarantee stability.
- A "capabilities" mask may now be configured, that selects what type of LLDP
neighbor data is collected. This may be used to restrict collection of LLDP
info about routers instead of all neighbors. This is now exposed via
networkd's LLDP= setting.
- sd-lldp's API to serialize the collected data to text files has been removed.
Instead, there's now an API to extract the raw binary data from LLDP neighbor
objects, as well as one to convert this raw binary data back to an LLDP
neighbor object. networkd will save this raw binary data to /run now, and the
client side can simply parse the information.
- support for parsing the more exotic TLVs has been removed, since we are not
using that. Instead there are now APIs to extract the raw data from TLVs.
Given how easy it is to parse the TLVs clients should do so now directly
instead of relying on our APIs for that.
- A lot of the APIs that parse out LLDP strings have been simplified so that
they actually return strings, instead of char arrays with a length. To deal
with possibly dangerous characters the strings are escaped if needed.
- APIs to extract and format the chassis and port IDs as strings has been
added.
- lldp.h has been simplified a lot. The enums are anonymous now, since they
were never used as enums, but simply as constants. Most definitions we don't
actually use ourselves have eben removed.
Usually, we place the #pragma once before the copyright blurb in header files,
but in a few cases we didn't. Move those around, so that we do the same thing
everywhere.
As kdbus won't land in the anticipated way, the bus-proxy is not needed in
its current form. It can be resurrected at any time thanks to the history,
but for now, let's remove it from the sources. If we'll have a similar tool
in the future, it will look quite differently anyway.
Note that stdio-bridge is still available. It was restored from a version
prior to f252ff17, and refactored to make use of the current APIs.
ISO/IEC 9899:1999 §7.21.1/2 says:
Where an argument declared as size_t n specifies the length of the array
for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a call
shall still have valid values, as described in 7.1.4.
In base64_append_width memcpy was called as memcpy(x, NULL, 0). GCC 4.9
started making use of this and assumes This worked fine under -O0, but
does something strange under -O3.
This patch fixes a bug in base64_append_width(), fixes a possible bug in
journal_file_append_entry_internal(), and makes use of the new function
to simplify the code in other places.
This commit changes the mapping of the BUS_ERROR_UNIT_MASKED error to ESHUTDOWN. This error is used whenever the
transaction engine is asked to operate on a masked unit. ESHUTDOWN is what is used for the similar case when the unit
file enable/disable logic hits a masked unit file, hence is a natural candidate to be used here too.
Background: before this patch both "job type not applicable" and "unit masked" where mapped to EBADR, which
transaction_add_job_and_dependencies() then checked for. It actually wanted to check exclusively for the former error
condition, not the latter but due to the same mapping this failed to work.
This patch semi-undoes an accidental change made in caffa4ef70, however restores the
error number to ESHUTDOWN instead of the original ENOSYS (for the reasons indicated above).
To make this easier to grok for the future, I added comments to explaining which error conditions are checked for.
Fixes: #2315
This adds two new calls to get the list of all journal fields names currently in use.
This is the low-level support to implement the feature requested in #2176 in a more optimized way.
This should simplify handling of time events in clients and is in-line with the USEC_INFINITY macro we already have.
This way setting a timeout to 0 indicates "elapse immediately", and a timeout of USEC_INFINITY "elapse never".
Previously, .network files only knew a vaguely defined "Domains=" concept, for which the documentation declared it was
the "DNS domain" for the network connection, without specifying what that means.
With this the Domains setting is reworked, so that there are now "routing" domains and "search" domains. The former are
to be used by resolved to route DNS request to specific network interfaces, the latter is to be used for searching
single-label hostnames with (in addition to being used for routing). Both settings are configured in the "Domains="
setting. Normal domain names listed in it are now considered search domains (for compatibility with existing setups),
while those prefixed with "~" are considered routing domains only. To route all lookups to a specific interface the
routing domain "." may be used, referring to the root domain. An alternative syntax for this is the "*", as was already
implemented before using the "wildcard" domain concept.
This commit adds proper parsers for this new logic, and exposes this via the sd-network API. This information is not
used by resolved yet, this will be added in a later commit.
This is useful for alternative network management solutions (such as NetworkManager) to push DNS configuration data
into resolved.
The calls will fail should networkd already have taken possesion of a link, so that the bus API is only available if
we don't get the data from networkd.
Setting of dst_id was based on interplay of two booleans,
making the logic hard to follow (for humans and compilers alike).
gcc was confused and emmitted a warning about an uninitialized
variable. Rework the code to make it obvious that dst_id is
set properly.
sd_event_now() is a public function, so we must check all
arguments for validity. Update man page and add tests.
Sample debug message:
Assertion 'IN_SET(clock, CLOCK_REALTIME, CLOCK_REALTIME_ALARM, CLOCK_MONOTONIC, CLOCK_BOOTTIME, CLOCK_BOOTTIME_ALARM)' failed at src/libsystemd/sd-event/sd-event.c:2719, function sd_event_now(). Ignoring.
Go over the entries in the map and check that they make sense.
Tests are added. In the future we might want to do additional
checks, e.g. verifying that the error names are in the expected
format.
errno_from_name used an unusual return convention where 0 meant
"not found". This tripped up config_parse_syscall_errno(),
which would treat that as success. Return -EINVAL instead,
and adjust bus_error_name_to_errno() for the new convention.
Also remove a goto which was used as a simple if and clean
up surroudning code a bit.
Set SD_EVENT_PROFILE_DELAYS to activate accounting and periodic logging
of the distribution of delays between sd_event_run() calls.
Time spent in dispatching as well as time spent outside of
sd_event_run() is measured and accounted for. Every 5 seconds a
logarithmic histogram loop iteration delays since 5 seconds previous is
logged.
This is useful in identifying the frequency and magnitude of latencies
affecting the event loop, which should be kept to a minimum.
If we already degraded the feature level below DO don't bother with sending requests for DS, DNSKEY, RRSIG, NSEC, NSEC3
or NSEC3PARAM RRs. After all, we cannot do DNSSEC validation then anyway, and we better not press a legacy server like
this with such modern concepts.
This also has the benefit that when we try to validate a response we received using DNSSEC, and we detect a limited
server support level while doing so, all further auxiliary DNSSEC queries will fail right-away.
Fixes:
$ make valgrind-tests TESTS=test-bus-cleanup
==6363== 9 bytes in 1 blocks are possibly lost in loss record 1 of 28
==6363== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
==6363== by 0x197D12: hexmem (hexdecoct.c:79)
==6363== by 0x183083: bus_socket_start_auth_client (bus-socket.c:639)
==6363== by 0x1832A0: bus_socket_start_auth (bus-socket.c:678)
==6363== by 0x183438: bus_socket_connect (bus-socket.c:705)
==6363== by 0x14B0F2: bus_start_address (sd-bus.c:1053)
==6363== by 0x14B592: sd_bus_start (sd-bus.c:1134)
==6363== by 0x14B95E: sd_bus_open_system (sd-bus.c:1235)
==6363== by 0x1127E2: test_bus_open (test-bus-cleanup.c:42)
==6363== by 0x112AAE: main (test-bus-cleanup.c:87)
==6363==
...
$ ./libtool --mode=execute valgrind ./test-bus-cleanup
==6584== LEAK SUMMARY:
...
==6584== possibly lost: 10,566 bytes in 27 blocks
Since we honour RFC5011 revoked keys it might happen we end up with an
empty trust anchor, or one where there's no entry for the root left.
With this patch the logic is changed what to do in this case.
Before this patch we'd end up requesting the root DS, which returns with
NODATA but a signed NSEC we cannot verify, since the trust anchor is
empty after all. Thus we'd return a DNSSEC result of "missing-key", as
we lack a verified version of the key.
With this patch in place, look-ups for the root DS are explicitly
recognized, and not passed on to the DNS servers. Instead, if
downgrade-ok mode is on an unsigned NODATA response is synthesized, so
that the validator code continues under the assumption the root zone was
unsigned. If downgrade-ok mode is off a new transaction failure is
generated, that makes this case recognizable.
Fixes:
```
$ ./configure ... --enable-dbus
$ make
$ make valgrind-tests TESTS=test-bus-marshal
...
==25301== 51 bytes in 1 blocks are definitely lost in loss record 7 of 18
==25301== at 0x4C2DD9F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25301== by 0x5496B8C: ??? (in /lib/x86_64-linux-gnu/libdbus-1.so.3.14.3)
==25301== by 0x54973E3: _dbus_string_append_printf_valist (in /lib/x86_64-linux-gnu/libdbus-1.so.3.14.3)
==25301== by 0x547E5C2: _dbus_set_error_valist (in /lib/x86_64-linux-gnu/libdbus-1.so.3.14.3)
==25301== by 0x547E73E: dbus_set_error (in /lib/x86_64-linux-gnu/libdbus-1.so.3.14.3)
==25301== by 0x548969A: dbus_message_demarshal (in /lib/x86_64-linux-gnu/libdbus-1.so.3.14.3)
==25301== by 0x115C1A: main (test-bus-marshal.c:244)
==25301==
```
Previously, if we couldn't reach a server via UDP we'd generate an
MAX_ATTEMPTS transaction result, but if we couldn't reach it via TCP
we'd generate a RESOURCES transaction result. While it is OK to generate
two different errors I think, "RESOURCES" is certainly a misnomer.
Introduce a new transaction result "CONNECTION_FAILURE" instead.
Printing the pointer variable really doesn't help, so drop that.
Instead, add a string lookup table for the EventSourceType enum, and print
the type of event source in case of errors.
We need to check the same thing in multiple tests. Use a shared
macro to make it easier to update the list of errnos.
Change the errno code for "unitialized cgroup fs" for ENOMEDIUM.
Exec format error looks like something more serious.
This fixes test-execute invocation in mock.
Let's distuingish the cases where our code takes an active role in
selinux management, or just passively reports whatever selinux
properties are set.
mac_selinux_have() now checks whether selinux is around for the passive
stuff, and mac_selinux_use() for the active stuff. The latter checks the
former, plus also checks UID == 0, under the assumption that only when
we run priviliged selinux management really makes sense.
Fixes: #1941
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.