In 4c2ef32767 we enabled propagating
triggered unit state to the triggering unit for service units in more
load states, so that we don't accidentally stop tracking state
correctly.
Do the same for our other triggering unit states: automounts, paths, and
timers.
Also, make this an assertion rather than a simple test. After all it
should never happen that we get called for half-loaded units or units of
the wrong type. The load routines should already have made this
impossible.
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
From a report in https://bugzilla.redhat.com/show_bug.cgi?id=1861463:
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Failed to load configuration: No such file or directory
usb-gadget.target: Trying to enqueue job usb-gadget.target/start/fail
usb-gadget.target: Failed to load configuration: No such file or directory
Assertion '!bus_error_is_dirty(e)' failed at src/libsystemd/sd-bus/bus-error.c:239, function bus_error_setfv(). Ignoring.
sys-devices-platform-soc-2100000.bus-2184000.usb-ci_hdrc.0-udc-ci_hdrc.0.device: Failed to enqueue SYSTEMD_WANTS= job, ignoring: Unit usb-gadget.target not found.
I *think* this is the place where the reuse occurs: we call
bus_unit_validate_load_state(unit, e) twice in a row.
When a command asks to load a unit directly and it is in state
UNIT_NOT_FOUND, and the cache is outdated, we refresh it and
attempto to load again.
Use the same logic when building up a transaction and a dependency in
UNIT_NOT_FOUND state is encountered.
Update the unit test to exercise this code path.
This reverts commit 097537f07a.
At least Fedora and Debian have already reverted this at the distro
level because it causes more problems than it solves. Arch is debating
reverting it as well [0] but would strongly prefer that this happens
upstream first. Fixes#15188.
[0] https://bugs.archlinux.org/task/66458
After a larger transaction, e.g. after bootup, we're left with an empty hashmap
with hundreds of buckets. Long-term, it'd be better to size hashmaps down when
they are less than 1/4 full, but even if we implement that, jobs hashmap is
likely to be empty almost always, so it seems useful to deallocate it once the
jobs count reaches 0.
We would flip to status=temporary mode on the first error, and then switch back
to status=auto after the initial transaction was done. This isn't very useful,
because usually all the messages about successfully started units and not
related to the original failure. In fact, all those messages most likely cause
the information about the prime error to scroll off screen. And if the user
requested quiet boot, there's no reason to think that they care about those
success messages.
Also, when logging about dependency cycles, treat this similarly to a unit
error and show the message even if the status is "soft disabled" (before we
wouldn't show it in that case).
In the steps given in #13850, the resulting graph looks like:
C (Anchor) -> B -> A
Since B is inactive, it will be flagged as redundant and removed from
the transaction, causing A to get garbage collected. The proposed fix is
to not mark nodes as redundant if doing so causes a relevant node to be
garbage collected.
Fixes#13850
When we are validating a transaction, we take into account declared
ordering between job units. However, since JOB_STOP goes always first
regardless of the ordering constraint between respective units, we may
detect some false cycles in the transaction which would not prevent the
execution though.
Use the same logic in transaction checking as we use for job execution.
In particular, let's not use gotos that jump up, i.e. are loops. gotos
that jump down for the purpose of clean-up are cool, but using them for
loops is evil.
No change in behaviour, just some refactoring.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Since bb28e68477 parsing failures of
certain unit file settings will result in load failures of units. This
introduces a new load state "bad-setting" that is entered in precisely
this case.
With this addition error messages on bad settings should be a lot more
explicit, as we don't have to show some generic "errno" error in that
case, but can explicitly say that a bad setting is at fault.
Internally this unit load state is entered as soon as any configuration
loader call returns ENOEXEC. Hence: config parser calls should return
ENOEXEC now for such essential unit file settings. Turns out, they
generally already do.
Fixes: #9107
Let's use a switch() statement, cover more cases with pretty messages.
Also let's rename it to "validate", as that's more specific that
"check", as it implies checking for a "valid"/"good" state, which is
what this function does.
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.
The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.
Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.
Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:
1. We can fix UDEV_WANTS dependency generation: so far we kept adding
dependencies configured that way, but if a device lost such a
dependency we couldn't them again as there was no scheme for removing
of dependencies in place.
2. We can implement "pin-pointed" reload of unit files. If we know what
dependencies were created as result of configuration in a unit file,
then we know what to flush out when we want to reload it.
3. It's useful for debugging: "systemd-analyze dump" now shows
this information, helping substantially with understanding how
systemd's dependency tree came to be the way it came to be.
Example log:
Jul 22 15:55:21 fedora systemd[1]: a1.service: Found ordering cycle on a2.service/start
Jul 22 15:55:21 fedora systemd[1]: a1.service: Found dependency on a3.service/start
Jul 22 15:55:21 fedora systemd[1]: a1.service: Found dependency on a1.service/start
Jul 22 15:55:21 fedora systemd[1]: a1.service: Job a2.service/start deleted to break ordering cycle starting with a1.service/start
Jul 22 15:55:21 fedora systemd[1]: Starting a1.service...
Jul 22 15:55:21 fedora systemd[1]: Started a1.service.
Example log entry:
Sat 2017-07-22 15:55:21.372389 EDT [s=0004bb6302d94ac3aa69987fb6157338;i=9ae;b=a96eb6153d4f4f3686c7b4
_BOOT_ID=a96eb6153d4f4f3686c7b4db8a432908
_MACHINE_ID=ad18f69b80264b52bb3b766240742383
_HOSTNAME=fedora
PRIORITY=3
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
_UID=0
_GID=0
_PID=1
_TRANSPORT=journal
_CAP_EFFECTIVE=3fffffffff
_COMM=systemd
_EXE=/usr/lib/systemd/systemd
_SYSTEMD_CGROUP=/init.scope
_SYSTEMD_UNIT=init.scope
_SYSTEMD_SLICE=-.slice
_SELINUX_CONTEXT=system_u:system_r:kernel_t:s0
CODE_FILE=../src/core/transaction.c
CODE_FUNC=transaction_verify_order_one
UNIT=a3.service
UNIT=a1.service
UNIT=a2.service
CODE_LINE=430
MESSAGE=a1.service: Job a2.service/start deleted to break ordering cycle starting with a1.service
_CMDLINE=/usr/lib/systemd/systemd --system --deserialize 28
_SOURCE_REALTIME_TIMESTAMP=1500753321372389
This should make it easier to see when any of the units are involved in an
ordering cycle.
Fixes#6336.
v2:
- also update the "Unable to break cycle" message.
Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The
introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts
running (e.g. it groups all Exec* commands of a service or spans waiting for a
device period.)
Unit.JobRunningTimeoutSec is intended to be used by default instead of
Unit.JobTimeoutSec for device units where such behavior causes less confusion
(consider a job for a _netdev mount device, with this change the timeout will
start ticking only after the network is ready).
This is important if a job was queued for a unit but not yet started.
Without this, the job will be canceled and is never executed even though
IgnoreOnIsolate it set to 'true'.
We currently generate log message about unit being started even when
unit was started already and job didn't do anything. This is because job
was requested explicitly and hence became anchor job of the transaction
thus we could not eliminate it. That is fine but, let's not pollute
journal with useless log messages.
$ systemctl start systemd-resolved
$ systemctl start systemd-resolved
$ systemctl start systemd-resolved
Current state:
$ journalctl -u systemd-resolved | grep Started
May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution.
May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution.
After patch applied:
$ journalctl -u systemd-resolved | grep Started
May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution.
Fixes#1723
If the error code ever leaks (we print the strerror error instead of providing
our own), the message for ESHUTDOWN is "Cannot send after transport endpoint
shutdown", which can be misleading. In particular it suggest that some
mishandling of the dbus connection occured. Let's change that to ERFKILL which
has the advantage that a) it sounds implausible as actual error, b) has the
connotation of disabling something manually.
This replaces the old function call manager_is_reloading_or_reexecuting() which
was used only at very few places. Use the new macro wherever we check whether
we are reloading. This should hopefully make things a bit more readable, given
the nature of Manager:n_reloading being a counter.
This commit changes the mapping of the BUS_ERROR_UNIT_MASKED error to ESHUTDOWN. This error is used whenever the
transaction engine is asked to operate on a masked unit. ESHUTDOWN is what is used for the similar case when the unit
file enable/disable logic hits a masked unit file, hence is a natural candidate to be used here too.
Background: before this patch both "job type not applicable" and "unit masked" where mapped to EBADR, which
transaction_add_job_and_dependencies() then checked for. It actually wanted to check exclusively for the former error
condition, not the latter but due to the same mapping this failed to work.
This patch semi-undoes an accidental change made in caffa4ef70, however restores the
error number to ESHUTDOWN instead of the original ENOSYS (for the reasons indicated above).
To make this easier to grok for the future, I added comments to explaining which error conditions are checked for.
Fixes: #2315
If a unit was pulled by a Wants= dependency but its unit file was not
present then we logged this as an error.
However Wants= might be used to configure a soft/optional dependency
on another unit, ie. start an optional service only if it's installed
otherwise simply skip it. In this case emitting an error doesn't look
appropriate.
But it's still an error if the optional dependency exists but its
activation fails for any reasons.