v2:
- if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary
to preserve backwards compatibility (and keep KillSignal= generally useful).
(The interesting bits about the what and why are in a comment in the
patch, please have a look there instead of looking here in the commit
msg).
Fixes: #10872
Our IO handler is only installed for one fd, hence there's no reason to
conditionalize on it again.
Also, split out the draining into a helper function of its own.
This wraps a few common steps. It is defined as inline function instead of in a
.c file to avoid having a .c file. With a .c file, we would have three choices:
- either link it into libshared, but then then libshared would have to be
linked to libmount.
- or compile the .c file into each target separately. This has the disdvantage
that configuration of every target has to be updated and stuff will be compiled
multiple times anyway, which is not too different from keeping this in the
header file.
- or create a new convenience library just for this. This also has the disadvantage
that the every target would have to be updated, and a separate library for a
10 line function seems overkill.
By keeping everything in a header file, we compile this a few times, but
otherwise it's the least painful option. The compiler can optimize most of the
function away, because it knows if 'source' is set or not.
It seems better to use just a single parsing algorithm for /proc/self/mountinfo.
Also, unify the naming of variables in all places that use mnt_table_next_fs().
It makes it easier to compare the different call sites.
Use a trivial header file to share mnt_free_tablep and mnt_free_iterp.
It would be nicer put this in mount-util.h, but libmount.h is not in the
default include path, and the build system would have to be adjusted to pass
pkg-config include path in various places, and it's just not worth the trouble.
A separate header file works nicely.
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.
However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.
If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
Just some renaming, no change in behaviour.
Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
As device units will be reloaded by systemd whenever the corresponding device generates a "changed" event, if the mount unit / cryptsetup service is wanted by its device unit, the former can be restarted by systemd unexpectedly after the user stopped them explicitly. It is not sensible at all and can be considered dangerous. Neither is the behaviour conventional (as `auto` in fstab should only affect behaviour on boot and `mount -a`) or ever documented at all (not even in systemd, see systemd.mount(5) and crypttab(5)).
The problem was introduced in a37422045fbb68ad68f734e5dc00e0a5b1759773:
we have a unit which has a fragment, and when we'd update it based on
/proc/self/mountinfo, we'd say that e.g. What=/dev/loop8 has origin-fragment.
This commit changes two things:
- origin-fragment is changed to origin-mountinfo-implicit
- when we stop a unit, mountinfo information is flushed and all deps based
on it are dropped.
The second step is important, because when we restart the unit, we want to
notice that we have "fresh" mountinfo information. We could keep the old info
around and solve this in a different way, but keeping stale information seems
inelegant.
Fixes#11342.
This reverts commit 89f9752ea0.
This patch causes various problems during boot, where a "mount storm" occurs
naturally. Current approach is flakey, and it seems very risky to push a
feature like this which impacts boot right before a release. So let's revert
for now, and consider a more robust solution after later.
Fixes#11209.
> https://github.com/systemd/systemd/pull/11196#issuecomment-448523186:
"Reverting 89f9752ea0 and fcfb1f775e fixes this test."
The starting of mount units requires that changes to
/proc/self/mountinfo be processed before the SIGCHILD from the
completion of /sbin/mount is processed, as described by the comment
/* Note that due to the io event priority logic, we can be sure the new mountinfo is loaded
* before we process the SIGCHLD for the mount command. */
The recently-added mount-storm protection can defeat this as it
will sometimes deliberately delay processing of /proc/self/mountinfo.
So we need to disable mount-storm protection when a mount unit is starting.
We do this by keeping a counter of the number of pending
mounts, and disabling the protection when this is non-zero.
Thanks to @asavah for finding and reporting this problem.
If we create 2000 mounts (on a 1-CPU qemu VM) with
mkdir -p /MNT/{1..2000}
time for i in {1..2000}; do mount --bind /etc /MNT/$i ; done
it takes around 20 seconds to complete. Much of this time is taken up
by systemd repeatedly processing /proc/self/mountinfo.
If I disable the processing, the time drops to about 4 seconds.
I have reports that on a larger system with multiple active user sessions, each
with it's own systemd, the impact can be higher.
One particular use-case where a large number of mounts can be expected in quick
succession is when the "clearcase" SCM starts up.
This patch modifies the handling up events from /proc/self/mountinfo so
that systemd backs off when a storm is detected. Specifically the time to process
mountinfo is measured, and the process will not be repeated until 10 times
that duration has passed. This ensures systemd won't use more than 10% of
real time processing mountinfo.
With this patch, my test above takes about 5 seconds.
For services (and other units) we generally follow the rule that at the
beginning of each cycle, i.e. when the INACTIVE/FAILED state is left for
ACTIVATING/ACTIVE we flush out various state variables. Mount units
handled this differently so far when the unit state change was effected
outside of systemd: in that case these variables would be flushed out
when going back to INACTIVE/FAILED already.
Let's fix that, and flush out this state always during the activating
transition, not during the deactivating transition.
We never know what the changes triggered by mount_set_state() do to the
unit. Let's be safe and copy the device path into our set, so that we
are safe against that.
It doesn't matter what kind of precise failure we had earlier with
loading the unit, let's report that it loaded successfully now, after
all the kernel is an OK source for that, like any other.
Whenever we notice a change on an existing /proc/self/mountinfo line,
let's update the deps generated from it. For that, let's flush out the
old deps generated this way, and add in the new ones.
This takes benefit of the fact that today (unlike a comment this patch
removes says) we can remove deps in a somewhat reasonable way.
Let's set 'from_proc_self_mountinfo' right away, since we know its from
there. This is important so that when the load queue is dispatched (and
thus mount_load() called) this
fact is already known.
In a previous commit we added logic that mount_add_extras() (or more
precisely mount_add_default_dependencies()) adds in dependencies on
remote-fs.target and local-fs.target, hence we can drop this from
mount_setup_new_unit() and let the usual load queue dispatching take
care of this.
Let's initialize two fields with free_and_strdup() rather than directly
with strdup(). The fields should not be initialized so far, but it's
still nicer to be prepared for futzre code changes and always free
what's stored before replacing it.
If we can't process a specific line in /proc/self/mountinfo we should
log about it (which we do), but this should not affect other lines, nor
further processing of mount units. Let's keep these failures local.
Fixes: #10874
We need to make sure that the slice property is initialized whenever
mount_load() is invoked, even if we fail to load things properly off
disk. This is important since we generally don't allow changing the
slice after a unit has been started. But given that we must track the
state of external objects with mount units we must hence initialize the
property no matter what.
This deps are very similar to the -pre deps, hence establish them at the
same place, in particular as they should only be generated if default
deps are on.
This allows us to later on remove similar code that adds in these deps
whenever /proc/self/mountinfo changes.
This allows clients to follow our internal state changes safely.
Previously, quick state changes (for example, when we restart a unit due
to Restart= after it quickly transitioned through DEAD/FAILED states)
would be coalesced into one bus signal event, with this change there's
the guarantee that all state changes after the unit was announced ones
are reflected on th bus.
Note we only do this kind of guaranteed flushing only for unit state
changes, not for other unit property changes, where clients still have
to expect coalescing. This is because the unit state is a very
important, high-level concept.
Fixes: #10185