With the grandparent change to move most units to app.slice,
those units would be ordered After=app.slice which doesn't make any sense.
Actually they appear earlier, before the manager is even started, and
conceputally it doesn't seem useful to put them under any slice.
This changes the default from putting all units into the root slice to
placing them into the app slice in the user manager. The advantage is
that we get the right behaviour in most cases, and we'll need special
case handling in all other cases anyway.
Note that we have currently defined that applications *should* start
their unit names with app-, so we could also move only these by creating
a drop-in for app-.scope and app-.service.
However, that would not answer the question on how we should manage
session.slice. And we would end up placing anything that does not fit
the system (e.g. anything started by dbus-broker currently) into the
root slice.
let's add informational logging about each client requested signal
sending. While we are at, let's beef up error handling/log messages in
this case quite a bit: let's log errors both to syslog and report errors
back to client.
Fixes: #17254
--kill-who=main-fail never worked correctly, due to a copy and paste
mistake in ac5e3a505e, where the same item
was listed twice. The mistake was
later noticed, but fixed incorrectly, in
201f0c916d.
Let's list all *-fail types correctly, finally.
And while we are at it, add a nice comment and generate a prettier D-Bus
error about this.
Instead of assuming that more-recently modified directories have higher mtime,
just look for any mtime changes, up or down. Since we don't want to remember
individual mtimes, hash them to obtain a single value.
This should help us behave properly in the case when the time jumps backwards
during boot: various files might have mtimes that in the future, but we won't
care. This fixes the following scenario:
We have /etc/systemd/system with T1. T1 is initially far in the past.
We have /run/systemd/generator with time T2.
The time is adjusted backwards, so T2 will be always in the future for a while.
Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'.
Nevertheless, T1 < T1' << T2.
We would consider our cache to be up-to-date, falsely.
We really only care if the cache has been reloaded between the time when we
last attempted to load this unit and now. So instead of recording the actual
time we try to load the unit, just store the timestamp of the cache. This has
the advantage that we'll notice if the cache mtime jumps forward or backward.
Also rename fragment_loadtime to fragment_not_found_time. It only gets set when
we failed to load the unit and the old name was suggesting it is always set.
In https://bugzilla.redhat.com/show_bug.cgi?id=1871327
(and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930
and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try
to load a non-existent unit over and over from transaction_add_job_and_dependencies().
My understanding is that the clock was in the future during inital boot,
so cache_mtime is always in the future (since we don't touch the fs after initial boot),
so no matter how many times we try to load the unit and set
fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime,
so manager_unit_cache_should_retry_load() always returns true.
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
The unit_add_siblings_to_cgroup_realize_queue does more than mere
siblings queueing, hence define a family of a unit as (immediate)
children of the unit and immediate children of all ancestors.
Working with this abstraction simplifies the queuing calls and it
shouldn't change the functionality.
Merge members mask invalidation into
unit_add_siblings_to_cgroup_realize_queue, this way unit_realize_cgroup
needn't be called with members mask invalidation.
We have to retain the members mask invalidation in unit_load -- although
active units would have cgroups (re)realized (unit_load queues for
realization), the realization would happen with potentially stale mask.
unit_free(u) realizes direct parent and invalidates members mask of all
ancestors. This isn't sufficient in v1 controller hierarchies since
siblings of the freed unit may have existed only because of the removed
unit.
We cannot be lazy about the siblings because if parent(u) is also
removed, it'd migrate and rmdir cgroups for siblings(u). However,
realized masks of siblings(u) won't reflect this change.
This was a non-issue earlier, because we weren't removing cgroup
directories properly (effectively matching the stale realized mask),
removal failed because of tasks left by missing migration (see previous
commit).
Therefore, ensure realization of all units necessary to clean up after
the free'd unit.
Fixes: #14149
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.
Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
There's some inconsistency in the what is considered a masked unit:
some places (i.e. load-fragment.c) use `null_or_empty()` while others
check if the file path is symlinked to "/dev/null". Since the latter
doesn't account for things like non-absolute symlinks to "/dev/null",
this commit switches the check for "/dev/null" to use `null_or_empty_path()`
When the system is under heavy load, it can happen that the unit cache
is refreshed for an unrelated reason (in the test I simulate this by
attempting to start a non-existing unit). The new unit is found and
accounted for in the cache, but it's ignored since we are loading
something else.
When we actually look for it, by attempting to start it, the cache is
up to date so no refresh happens, and starting fails although we have
it loaded in the cache.
When the unit state is set to UNIT_NOT_FOUND, mark the timestamp in
u->fragment_loadtime. Then when attempting to load again we can check
both if the cache itself needs a refresh, OR if it was refreshed AFTER
the last failed attempt that resulted in the state being
UNIT_NOT_FOUND.
Update the test so that this issue reproduces more often.
To set up a verity/cryptsetup RootImage the forked child needs to
ioctl /dev/mapper/control and create a new mapper.
If PrivateDevices=yes and/or DevicePolicy=closed are used, this is
blocked by the cgroup setting, so add an exception like it's done
for loop devices (and also add a dependency on the kernel modules
implementing them).
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.
Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
unit_choose_id() is about marking one of the aliases of the unit as the main
name. With the preparatory work in previous patches, all aliases of the unit
must have the same instance, so the operation to update the instance is a noop.
They were added recently in acd1987a18. We can
make them more informative by using unit_type_to_string() and not repeating
unit names as much. Also, %m should not be used together with SYNTHETIC_ERRNO().
We would check that the instance is present in both units (or missing in both).
But when it is defined, it should be the same in both. The comment in the code
was explicitly saying that differing instance strings are allowed, but this
mostly seems to be a left-over from old times. The man page is pretty clear:
> the instance (if any) is always uniquely defined for a given unit and all its
> aliases.
We allocated the names set for each unit, but in the majority of cases, we'd
put only one name in the set:
$ systemctl show --value -p Names '*'|grep .|grep -v ' '|wc -l
564
$ systemctl show --value -p Names '*'|grep .|grep ' '|wc -l
16
So let's add a separate .id field, and only store aliases in the set, and only
create the set if there's at least one alias. This requires a bit of gymnastics
in the code, but I think this optimization is worth the trouble, because we
save one object for many loaded units.
In particular set_complete_move() wasn't very useful because the target
unit would always have at least one name defined, i.e. the optimization to
move the whole set over would never fire.
Only log at LOG_INFO level, i.e. make this informational. During start
let's leave it at LOG_WARNING though.
Of course, it's ugly leaving processes around like that either in start
or in stop, but at start its more dangerous than on stop, so be tougher
there.
Prompted by the discussions in #15180.
This is a bit more complex than I hoped, since for PID 1 we need to pass
in the synethetic environment block in we generate on demand.
Let's go one step further and upgrade implicitly. Usually =syslog
assignments are historic artifacts only. Let's upgrade the lines
automatically, and politely suggest people update their unit
files/configuration (and drop the lines altogether, without
replacement).
Fixes: #15807
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.