Since commit 36c16a7cdd ("core: rework unit timeout handling, and add
new setting RuntimeMaxSec=") TimeoutSec=0 in mount units has
cause the mount to timeout immediately instead of never as documented.
There is a similar problem with Socket.TimeoutSec and Swap.TimeoutSec.
These are easily fixed using config_parse_sec_fix_0().
Automount.TimeoutIdleSec looks like it could have the same problem,
but doesn't because the kernel treats '0' as 'no timeout'.
It handle USEC_INFINITY correctly only because that constant has
the value '-1', and when round up, it becomes zero.
To avoid possible confusion, use config_parse_sec_fix_0() as well, and
explicitly handle USEC_INFINITY.
If a process accesses an autofs filesystem while systemd is in the
middle of starting the mount unit on top of it, it is possible for the
autofs_ptype_missing_direct request from the kernel to be received after
the mount unit has been fully started:
systemd forks and execs mount ...
... access autofs, blocks
mount exits ...
systemd receives SIGCHLD ...
... kernel sends request
systemd receives request ...
systemd needs to respond to this request, otherwise the kernel will
continue to block access to the mount point.
Otherwise we'll hit an assert sooner or later.
This requires us to initialize ->where even if we come back in "masked"
mode, as otherwise we don't know how to operate on the automount and
detach it.
Fixes: #5441
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
If the corresponding mount unit is deserialized after the automount unit
then the expire event is set up in automount_trigger_notify(). However, if
the mount unit is deserialized first then the automount unit is still in
state AUTOMOUNT_DEAD and automount_trigger_notify() aborts without setting
up the expire event.
Explicitly call automount_start_expire() during coldplug to make sure that
the expire event is set up as necessary.
Fixes#4249.
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.
With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.
This addresses the issue pointed out here:
https://github.com/systemd/systemd/pull/3818#discussion_r73433520
All pending tokens are already serialized correctly and will be handled
when the mount unit is done.
Without this a 'daemon-reload' cancels all pending tokens. Any process
waiting for the mount will continue with EHOSTDOWN.
This can happen when the mount unit waits for it's dependencies, e.g.
network, devices, fsck, etc.
This basically reverts 7b2fd9d512 ("core:
remove duplicate code in automount_update_mount()").
This was not duplicate code. The expire_tokens need to be handled as well:
Send 0 == success for MOUNT_DEAD (umount successful), do nothing for
MOUNT_UNMOUNTING (not yet done) and an error for everything else.
Otherwise the automount logic will assume unmounting is not done and will
not send any new requests for mounting. As a result, the corresponding
mount unit is never mounted.
Without this, automounts with TimeoutIdleSec= are broken. Once the idle
timeout triggered a umount, any access to the corresponding filesystem
hangs forever.
Fixes#3332.
Port the progagation logic to the generic Unit->trigger_notify() callback logic
in the unit vtable, that is called for a unit not only when the triggered unit
of it changes state but also when a job for that unit finishes. This, firstly
allows us to make the code a bit cleaner and more generic, but more
importantly, allows us to notice correctly when a mount job fails, and
propagate that back to autofs client processes.
Fixes: #2181
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.
The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.
Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.
Note that restores the "start-limit-hit" result code that existed before
6bf0f408e4 already in the service code. However,
it's not introduced for all units that have a result code concept.
Fixes#3166.
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.
Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
Now that we don't have RequiresOverridable= and RequisiteOverridable=
dependencies anymore, we can get rid of tracking the "override" boolean
for jobs in the job engine, as it serves no purpose anymore.
While we are at it, fix some error messages we print when invoking
functions that take the override parameter.
It's nicer to hide the check away in the various
xyz_add_default_dependencies() calls, rather than making it explicit in
the caller, and thus require deeper nesing.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
When starting a transient service, allow setting stdin/stdout/stderr fds
for it, by passing them in via the bus.
This also simplifies some of the serialization code for units.
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.
Also ports over machinectl to make use of this.
This reverts commit d4d00020d6. The idea of
the commit is broken and needs to be reworked. We really cannot reduce
the bus-addresses to a single address. We always will have systemd with
native clients and legacy clients at the same time, so we also need both
addresses at the same time.
We should not fall back to dbus-1 and connect to the proxy when kdbus
returns an error that indicates that kdbus is running but just does not
accept new connections because of quota limits or something similar.
Using is_kdbus_available() in libsystemd/ requires it to move from
shared/ to libsystemd/.
Based on a patch from David Herrmann:
https://github.com/systemd/systemd/pull/886
The expire timeout must be started/stopped if the corresponding mount unit
changes its state, e.g. it is started via local-fs.target or stopped by a
manual umount.
Return the token immediately instead. Otherwise the token is never returned
to the kernel, because the umount job is a noop and will not trigger a
state change.
The timer value for automount unit specified with TimeoutIdleSec= is rounded
up to one second if that directive is set to 0.
Fix this by bailing early in automount_enter_runnning() in case no timeout is
requested.
The autofs kernel idle logic requires us to poll the kernel for
idleness. This is of course suboptimal, but cannot be fixed without
kernel change.
Currently the polling frequency is set to 1/10 of the idle timeout. This
is quite high, as seen in #571. Let's lower this to 1/3.
It's primarily just a property of the Manager object after all, and we
try to refer to PID 1 as "manager" instead of "systemd", hence let's to
stick to this here too.
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.
Also contains a couple of other logging improvements:
- Drops a couple of strerror() invocations in favour of using %m.
- Not only .mount units now warn if a symlinks exist for the mount
point already, .automount units do that too, now.
- A few invocations of log_struct() that didn't actually pass any
additional structured data have been replaced by simpler invocations
of log_unit_info() and friends.
- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
that works like LOG_MESSAGE() but prefixes the message with the unit
name. Similar, there's now LOG_LINK_MESSAGE() and
LOG_NETDEV_MESSAGE().
- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
LOG_NETDEV_INTERFACE() macros have been added that generate the
necessary per object fields. The old log_unit_struct() call has been
removed in favour of these new macros used in raw log_struct()
invocations. In addition to removing one more function call this
allows generated structured log messages that contain two object
fields, as necessary for example for network interfaces that are
joined into another network interface, and whose messages shall be
indexed by both.
- The LOG_ERRNO() macro has been removed, in favour of
log_struct_errno(). The latter has the benefit of ensuring that %m in
format strings is properly resolved to the specified error number.
- A number of logging messages have been converted to use
log_unit_info() instead of log_info()
- The client code in sysv-generator no longer #includes core code from
src/core/.
- log_unit_full_errno() has been removed, log_unit_full() instead takes
an errno now, too.
- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
avoid double evaluation of their parameters
A variety of changes:
- Make sure all our calls distuingish OOM from other errors if OOM is
not the only error possible.
- Be much stricter when parsing escaped paths, do not accept trailing or
leading escaped slashes.
- Change unit validation to take a bit mask for allowing plain names,
instance names or template names or an combination thereof.
- Refuse manipulating invalid unit name
Usually when using loop_read(), we want to read the full buffer.
Add a helper that mirrors loop_write(), and returns 0 when full buffer
was read, and an error otherwise.
Use -ENODATA for the short read, to distinguish it from a read error.
Because the order of coldplugging is not defined, we can reference a
not-yet-coldplugged unit and read its state while it has not yet been
set to a meaningful value.
This way, already active units may get started again.
We fix this by deferring such actions until all units have been at
least somehow coldplugged.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
If we scale our buffer to be wide enough for the format string, we
should expect that the calculation was correct.
char_array_0() invocations are removed, since snprintf nul-terminates
the output in any case.
A similar wrapper is used for strftime calls, but only in timedatectl.c.
Unit _start() and _stop() implementations can fail with -EAGAIN to delay
execution temporarily. Thus, we should not output status messages before
invoking these calls, but after, and only when we know that the
invocation actually made a change.
Containers do not really support .device, .automount or .swap units;
Systems compiled without support for swap do not support .swap units;
Systems without kdbus do not support .busname units.
With this change attempts to start a unsupported unit types will result
in an immediate "unsupported" job result, which is a lot more
descriptive then before. Also, attempts to start device units in
containers will now immediately fail instead of causing jobs to be
enqueued that never go away.
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
- Rename log_meta() → log_internal(), to follow naming scheme of most
other log functions that are usually invoked through macros, but never
directly.
- Rename log_info_object() to log_object_info(), simply because the
object should be before any other parameters, to follow OO-style
programming style.
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
safe_close_pair() is more like safe_close(), except that it handles
pairs of fds, and doesn't make and misleading allusion, as it works
similarly well for socketpairs() as for pipe()s...
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
Previously the returned object of constructor functions where sometimes
returned as last, sometimes as first and sometimes as second parameter.
Let's clean this up a bit. Here are the new rules:
1. The object the new object is derived from is put first, if there is any
2. The object we are creating will be returned in the next arguments
3. This is followed by any additional arguments
Rationale:
For functions that operate on an object we always put that object first.
Constructors should probably not be too different in this regard. Also,
if the additional parameters might want to use varargs which suggests to
put them last.
Note that this new scheme only applies to constructor functions, not to
all other functions. We do give a lot of freedom for those.
Note that this commit only changes the order of the new functions we
added, for old ones we accept the wrong order and leave it like that.
It is nicer to predefine patterns using configure time check instead of
using casts everywhere.
Since we do not need to use any flags, include "%" in the format instead
of excluding it like PRI* macros.
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.
This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.
This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.
With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.
https://bugs.freedesktop.org/show_bug.cgi?id=69740
When a trigger unit wants to know if a stop is queued for it, we should
just check precisely that and do not check whether it is actually
stopped already. This is because we use these checks usually from state
change calls where the state variables are not updated yet.
This change splits unit_pending_inactive() into two calls
unit_inactive_or_pending() and unit_stop_pending(). The former checks
state and pending jobs, the latter only pending jobs.
Instead of having explicit type-specific callbacks that inform the
triggering unit when a triggered unit changes state, make this generic
so that state changes are forwarded betwee any triggered and triggering
unit.
Also, get rid of UnitRef references from automount, timer, path units,
to the units they trigger and rely exclsuively on UNIT_TRIGGER type
dendencies.
Note: I did s/MANAGER/SYSTEMD/ everywhere, even though it makes the
patch quite verbose. Nevertheless, keeping MANAGER prefix in some
places, and SYSTEMD prefix in others would just lead to confusion down
the road. Better to rip off the band-aid now.
Old: systemd[1]: Got direct mount request for ffff88003bb10c00, triggered by 14476 (fuser)
New: systemd[1]: Got direct mount request on /dev/mqueue, triggered by 2177 (ls)