When a command asks to load a unit directly and it is in state
UNIT_NOT_FOUND, and the cache is outdated, we refresh it and
attempto to load again.
Use the same logic when building up a transaction and a dependency in
UNIT_NOT_FOUND state is encountered.
Update the unit test to exercise this code path.
Because this was left unset, the unit_write_setting() function was
refusing to write out the automount-specific TimeoutIdleSec= and
DirectoryMode= settings when creating transient automount units.
Set it to the proper value in line with other unit types.
There's some inconsistency in the what is considered a masked unit:
some places (i.e. load-fragment.c) use `null_or_empty()` while others
check if the file path is symlinked to "/dev/null". Since the latter
doesn't account for things like non-absolute symlinks to "/dev/null",
this commit switches the check for "/dev/null" to use `null_or_empty_path()`
When the system is under heavy load, it can happen that the unit cache
is refreshed for an unrelated reason (in the test I simulate this by
attempting to start a non-existing unit). The new unit is found and
accounted for in the cache, but it's ignored since we are loading
something else.
When we actually look for it, by attempting to start it, the cache is
up to date so no refresh happens, and starting fails although we have
it loaded in the cache.
When the unit state is set to UNIT_NOT_FOUND, mark the timestamp in
u->fragment_loadtime. Then when attempting to load again we can check
both if the cache itself needs a refresh, OR if it was refreshed AFTER
the last failed attempt that resulted in the state being
UNIT_NOT_FOUND.
Update the test so that this issue reproduces more often.
To set up a verity/cryptsetup RootImage the forked child needs to
ioctl /dev/mapper/control and create a new mapper.
If PrivateDevices=yes and/or DevicePolicy=closed are used, this is
blocked by the cgroup setting, so add an exception like it's done
for loop devices (and also add a dependency on the kernel modules
implementing them).
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it.
https://tools.ietf.org/html/draft-knodel-terminology-02https://lwn.net/Articles/823224/
This gets rid of most but not occasions of these loaded terms:
1. scsi_id and friends are something that is supposed to be removed from
our tree (see #7594)
2. The test suite defines an API used by the ubuntu CI. We can remove
this too later, but this needs to be done in sync with the ubuntu CI.
3. In some cases the terms are part of APIs we call or where we expose
concepts the kernel names the way it names them. (In particular all
remaining uses of the word "slave" in our codebase are like this,
it's used by the POSIX PTY layer, by the network subsystem, the mount
API and the block device subsystem). Getting rid of the term in these
contexts would mean doing some major fixes of the kernel ABI first.
Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
This fixes commit db2b8d2e28 that
rectified parsing empty values but broke parsing explicit infinity.
Intended parsing semantics will be captured in a testcase in a follow up
commit.
Ref: #16248
On error, we'd just free the object, and not close the fd.
While at it, let's use set_ensure_consume() to make sure we don't leak
the object if it was already in the set. I'm not sure if that condition
can be achieved.
This reverts commit 53aa85af24.
The reason is that that patch changes the dbus api to be different than
the types declared by introspection api.
Replaces #16122.
This reverts commit 097537f07a.
At least Fedora and Debian have already reverted this at the distro
level because it causes more problems than it solves. Arch is debating
reverting it as well [0] but would strongly prefer that this happens
upstream first. Fixes#15188.
[0] https://bugs.archlinux.org/task/66458
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.
Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
Cache it early in startup of the system manager, right after `/run/systemd` is
created, so that further access to it can be done without accessing the EFI
filesystem at all.
The only way to control "ShowStatus" property programmatically was to use the
signal API and wait until the property "ShowStatus" switched to the new value.
This interface is rather cumbersome to use and doesn't allow to temporarily
override the current setting and later restore the overridden value in
race-free manner.
The new method also accepts the empty string as argument which allows to
restore the initial value of ShowStatus, ie the value before it was overridden
by this method.
Fixes: #11447.
unit_choose_id() is about marking one of the aliases of the unit as the main
name. With the preparatory work in previous patches, all aliases of the unit
must have the same instance, so the operation to update the instance is a noop.
Upon an incoming connection for an accepting socket, we'd create a unit like
foo@0.service, then figure out that the instance name should be e.g. "0-41-0",
and then add the name foo@0-41-0.service to the unit. This obviously violates
the rule that any service needs to have a constance instance part.
So let's reverse the order: we first determine the instance name and then
create the unit with the correct name from the start.
There are two cases where we don't know the instance name:
- analyze-verify: we just do a quick check that the instance unit can be
created. So let's use a bogus instance string.
- selinux: the code wants to load the service unit to extract the ExecStart path
and query it for the selinux label. Do the same as above.
Note that in both cases it is possible that the real unit that is loaded could
be different than the one with the bogus instance value, for example if there
is a dropin for a specific instance name. We can't do much about this, since we
can't figure out the instance name in advance. The old code had the same
shortcoming.
They were added recently in acd1987a18. We can
make them more informative by using unit_type_to_string() and not repeating
unit names as much. Also, %m should not be used together with SYNTHETIC_ERRNO().
We would check that the instance is present in both units (or missing in both).
But when it is defined, it should be the same in both. The comment in the code
was explicitly saying that differing instance strings are allowed, but this
mostly seems to be a left-over from old times. The man page is pretty clear:
> the instance (if any) is always uniquely defined for a given unit and all its
> aliases.
We allocated the names set for each unit, but in the majority of cases, we'd
put only one name in the set:
$ systemctl show --value -p Names '*'|grep .|grep -v ' '|wc -l
564
$ systemctl show --value -p Names '*'|grep .|grep ' '|wc -l
16
So let's add a separate .id field, and only store aliases in the set, and only
create the set if there's at least one alias. This requires a bit of gymnastics
in the code, but I think this optimization is worth the trouble, because we
save one object for many loaded units.
In particular set_complete_move() wasn't very useful because the target
unit would always have at least one name defined, i.e. the optimization to
move the whole set over would never fire.
Let systemd load a set of pre-compiled AppArmor profile files from a policy
cache at /etc/apparmor/earlypolicy. Maintenance of that policy cache must be
done outside of systemd.
After successfully loading the profiles systemd will attempt to change to a
profile named systemd.
If systemd is already confined in a profile, it will not load any profile files
and will not attempt to change it's profile.
If anything goes wrong, systemd will only log failures. It will not fail to
start.
Let's make sure $XDG_RUNTIME_DIR for the user instance and /run for the
system instance is always organized the same way: the "inaccessible"
device nodes should be placed in a subdir of either called "systemd" and
a subdir of that called "inaccessible".
This way we can emphasize the common behaviour, and only differ where
really necessary.
Follow-up for #13823
dm-verity support in dissect-image at the moment is restricted to GPT
volumes.
If the image a single-filesystem type without a partition table (eg: squashfs)
and a roothash/verity file are passed, set the verity flag and mark as
read-only.
The usual behaviour when a timeout expires is to terminate/kill the
service. This is what user usually want in production systems. To debug
services that fail to start/stop (especially sporadic failures) it
might be necessary to trigger the watchdog machinery and write core
dumps, though. Likewise, it is usually just a waste of time to
gracefully stop a stuck service. Instead it might save time to go
directly into kill mode.
This commit adds two new options to services: TimeoutStartFailureMode=
and TimeoutStopFailureMode=. Both take the same values and tweak the
behavior of systemd when a start/stop timeout expires:
* 'terminate': is the default behaviour as it has always been,
* 'abort': triggers the watchdog machinery and will send SIGABRT
(unless WatchdogSignal was changed) and
* 'kill' will directly send SIGKILL.
To handle the stop failure mode in stop-post state too a new
final-watchdog state needs to be introduced.
The fact that m->show_status was serialized/deserialized made impossible any
further customisation of this setting via system.conf. IOW the value was
basically always locked unless it was changed via signals.
This patch reworks the handling of m->show_status but also makes sure that if a
new value was changed via the signal API then this value is kept and preserved
accross PID1 reexecuting or reloading.
Note: this effectively means that once the value is set via the signal
interface, it can be changed again only through the signal API.
The name 'manager_get_show_status()' suggests that the function simply reads
the property 'show_status' of the manager and hence returns a 'StatusType'
value.
However it was doing more than that since it contained the logic (based on
'show_status' but also on the state of the manager) to figure out if status
message could be emitted to the console.
Hence this patch renames the function to 'manager_should_show_status()'. The
previous name will be reused in a later patch to effectively return the value
of 'show_status' property.
No functional change.
Actually, it is the same kind of problem as in d910f4c . Basically, we
need to return 1 on success code path in slice_freezer_action().
Otherwise we dispatch DBus return message too soon.
Fixes: #16050
Six years ago we declared it obsolete and removed it from the docs
(c073a0c4a5) and added a note about it in
NEWS. Two years ago we add warning messages about it, indicating the
feature will be removed (41b283d0f1) and
mentioned it in NEWS again.
Let's now kill it for good.
This is a follow-up for 9f83091e3c.
Instead of reading the mtime off the configuration files after reading,
let's do so before reading, but with the fd we read the data from. This
is not only cleaner (as it allows us to save one stat()), but also has
the benefit that we'll detect changes that happen while we read the
files.
This also reworks unit file drop-ins to use the common code for
determining drop-in mtime, instead of reading system clock for that.
Currently, an empty assignment of Memory{Low,Min}= directives would be
interpretted as setting it to global default, i.e. zero. However, if we
set a runtime protection value on a unit that inherits parent's
DefaultMemory{Low,Min}=, it is not possible to revert it back to the
state where the DefaultMemory{Low,Min}= is propagated from parent
slice(s).
This patch changes the semantics of the empty assignments to explicitly
nullify any value set by the user previously. Since DBus API uses
uint64_t where 0 is a valid configuration, the patch modifies DBus API
by exploiting the variant type of property value to pass the NULL value.
When MemoryLow= or MemoryMin= is set, it is interpretted as setting the
values to infinity. This is inconsistent with the default initialization
to 0.
It'd be nice to interpret the empty assignment as fallback to
DefaultMemory* of parent slice, however, current DBus API cannot convey
such a NULL value, so stick to simply interpretting that as hard-wired
default.
The time-based cache allows starting a new unit without an expensive
daemon-reload, unless there was already a reference to it because of
a dependency or ordering from another unit.
If the cache is out of date, check again if we can load the
fragment.
ROOTPREFIX doesn't include the trailing /, hence add it in where needed.
Also, given that sysctl.d/, binfmt.d/, sysusers.d/ are generally
accessed before /var/ is up they should use ROOTPREFIX rather than
PREFIX. Fix that.
After a larger transaction, e.g. after bootup, we're left with an empty hashmap
with hundreds of buckets. Long-term, it'd be better to size hashmaps down when
they are less than 1/4 full, but even if we implement that, jobs hashmap is
likely to be empty almost always, so it seems useful to deallocate it once the
jobs count reaches 0.
Possibly fixes#15220. (There might be another leak. I'm still investigating.)
The leak would occur when the path cache was rebuilt. So in normal circumstances
it wouldn't be too bad, since usually the path cache is not rebuilt too often. But
the case in #15220, where new unit files are created in a loop and started, the leak
occurs once for each unit file:
$ for i in {1..300}; do cp ~/.config/systemd/user/test0001.service ~/.config/systemd/user/test$(printf %04d $i).service; systemctl --user start test$(printf %04d $i).service;done
Suppose a service has WatchdogSec set to 2 seconds in its unit file. I
then start the service and WatchdogUSec is set correctly:
% systemctl --user show psi-notify -p WatchdogUSec
WatchdogUSec=2s
Now I call `sd_notify(0, "WATCHDOG_USEC=10000000")`. The new timer seems
to have taken effect, since I only send `WATCHDOG=1` every 4 seconds,
and systemd isn't triggering the watchdog handler. However, `systemctl
show` still shows WatchdogUSec as 2s:
% systemctl --user show psi-notify -p WatchdogUSec
WatchdogUSec=2s
This seems surprising, since this "original" watchdog timer isn't the
one taking effect any more. This patch makes it so that we instead
display the new watchdog timer after sd_notify(WATCHDOG_USEC):
% systemctl --user show psi-notify -p WatchdogUSec
WatchdogUSec=10s
Fixes#15726.
Only log at LOG_INFO level, i.e. make this informational. During start
let's leave it at LOG_WARNING though.
Of course, it's ugly leaving processes around like that either in start
or in stop, but at start its more dangerous than on stop, so be tougher
there.
Whenever we pick up a new line in /proc/self/mountinfo and want to
synthesize a new mount unit from it, let's say which one it is.
Moreover, downgrade the log message when we encounter a mount point with
an overly long name to LOG_WARNING, since it's generally fine to ignore
such mount points.
Also, attach a catalog entry to explain the situation further.
Prompted-By: #15221
That's reduce the number of functions dealing with configuration
parsing/loading and should make the code simpler especially since this function
was used only once.
No functional change.
Most complexity of this patch is due to the fact that some manager settings
(basically the watchdog properties) can be set at runtime and in this case the
runtime values must be retained over daemon-reload or daemon-reexec.
For consistency sake, all watchdog properties behaves now the same way, that
is:
- Values defined by config files can be overridden by writing the new value
through their respective D-BUS properties. In this case, these values are
preserved over reload/reexec until the special value '0' or USEC_INFINITY
is written, which will then restore the last values loaded from the config
files. If the restored value is '0' or 'USEC_INFINITY', the watchdogs will
be disabled and the corresponding device will be closed.
- Reading the properties from a user instance will return the USEC_INFINITY
value as these properties are only meaningful for PID1.
- Writing to one of the watchdog properties of a user instance's will be a
NOP.
Fixes: #15453