If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
In all the other cases, I think the code was clearer with the static table.
Here, not so much. And because of the existing dump code, the vtables cannot
be made static and need to remain exported. I still think it's worth to do the
change to have the cmdline introspection, but I'm disappointed with how this
came out.
This way we shuld be able to order mounts properly against their backing
services in case complex storage is used (i.e. LUKS), even if the device
path used for mounting the devices is different from the expected device
node of the backing service.
Specifically, if we have a LUKS device /dev/mapper/foo that is mounted
by this name all is trivial as the relationship can be established a
priori easily. But if it is mounted via a /dev/disk/by-uuid/ symlink or
similar we only can relate the device node generated to the one mounted
at the moment the device is actually established. That's because the
UUID of the fs is stored inside the encrypted volume and thus not
knowable until the volume is set up. This patch tries to improve on this
situation: a implicit After=blockdev@.target dependency is generated for
all mounts, based on the data from /proc/self/mountinfo, which should be
the actual device node, with all symlinks resolved. This means that as
soon as the mount is established the ordering via blockdev@.target will
work, and that means during shutdown it is honoured, which is what we
are looking for.
Note that specifying /etc/fstab entries via UUID= for LUKS devices still
sucks and shouldn't be done, because it means we cannot know which LUKS
device to activate to make an fs appear, and that means unless the
volume is set up at boot anyway we can't really handle things
automatically when putting together transactions that need the mount.
This extends on d253a45e1c, and instead of
merging just a single flag from previous mount entries of
/proc/self/mountinfo for the same path we merge all three.
This shouldn't change behaviour, but I think make things more readable.
Previously we'd set MOUNT_PROC_IS_MOUNTED unconditionally, we still do.
Previously we'd inherit MOUNT_PROC_JUST_MOUNTED from a previous entry on
the same line, we still do.
MOUNT_PROC_JUST_CHANGED should generally stay set too. Why that? If we
have two mount entries on the same mount point we'd first process one
and then the other, and the almost certainly different mount parameters
of the two would mean we'd set MOUNT_PROC_JUST_CHANGED for the second.
And with this we'll definitely do that still.
This also adds a comment explaining the situation a bit, and why we get
into this situation.
When starting a mount unit, systemd invokes mount command and moves the
unit's internal state to "mounting". Then it watches for updates of
/proc/self/mountinfo. When the expected mount entry newly appears in
mountinfo, the unit internal state is changed to "mounting-done".
Finally, when systemd finds the mount command has finished, it checks
whether the unit internal state is "mounting-done" and changes the state
to "mounted".
If the state was not "mounting-done" in the last step though mount command
was successfully finished, the unit is marked as "failed" with following
log messages:
Mount process finished, but there is no mount.
Failed with result 'protocol'.
If daemon-reload is done in parallel with starting mount unit, it is
possible that things happen in following order and result in above failure.
1. the mount unit state changes to "mounting"
2. daemon-reload saves the unit state
3. kernel completes the mount and /proc/self/mountinfo is updated
4. daemon-reload restores the saved unit state, that is "mounting"
5. systemd notices the mount command has finished but the unit state
is still "mounting" though it should be "mounting-done"
mount_setup_existing_unit() should take into account that MOUNT_MOUNTING
is transitional state and set MOUNT_PROC_JUST_MOUNTED flag if the unit
comes from /proc/self/mountinfo so that mount_process_proc_self_mountinfo()
later can make state transition from "mounting" to "mounting-done".
Fixes: #10872
Similar, refuse triggering deps on units that cannot trigger.
And rework how we ignore After= dependencies on device units, to work
the same way.
See: #14142
No functional change, just adjusting code to follow the same pattern
everywhere. In particular, never call _verify() on an already loaded unit,
but return early from the caller instead. This makes the code a bit easier
to follow.
unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional()
are really the same, with one minor difference in behaviour. Let's drop
the second function.
"_optional" in the name suggests that it's the "dropin" part that is optional.
(Which it is, but in this case, we mean the fragment to be optional.)
I think the new version with a flag is easier to understand.
This is the most basic consumer of the new systemd-vs-kernel checker,
both acting as a reasonable standalone exerciser of the code, and also
as a way for easy inspection of deviations from systemd internal state.
v2:
- if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary
to preserve backwards compatibility (and keep KillSignal= generally useful).
(The interesting bits about the what and why are in a comment in the
patch, please have a look there instead of looking here in the commit
msg).
Fixes: #10872
Our IO handler is only installed for one fd, hence there's no reason to
conditionalize on it again.
Also, split out the draining into a helper function of its own.
This wraps a few common steps. It is defined as inline function instead of in a
.c file to avoid having a .c file. With a .c file, we would have three choices:
- either link it into libshared, but then then libshared would have to be
linked to libmount.
- or compile the .c file into each target separately. This has the disdvantage
that configuration of every target has to be updated and stuff will be compiled
multiple times anyway, which is not too different from keeping this in the
header file.
- or create a new convenience library just for this. This also has the disadvantage
that the every target would have to be updated, and a separate library for a
10 line function seems overkill.
By keeping everything in a header file, we compile this a few times, but
otherwise it's the least painful option. The compiler can optimize most of the
function away, because it knows if 'source' is set or not.
It seems better to use just a single parsing algorithm for /proc/self/mountinfo.
Also, unify the naming of variables in all places that use mnt_table_next_fs().
It makes it easier to compare the different call sites.
Use a trivial header file to share mnt_free_tablep and mnt_free_iterp.
It would be nicer put this in mount-util.h, but libmount.h is not in the
default include path, and the build system would have to be adjusted to pass
pkg-config include path in various places, and it's just not worth the trouble.
A separate header file works nicely.
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.
However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.
If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
Just some renaming, no change in behaviour.
Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
As device units will be reloaded by systemd whenever the corresponding device generates a "changed" event, if the mount unit / cryptsetup service is wanted by its device unit, the former can be restarted by systemd unexpectedly after the user stopped them explicitly. It is not sensible at all and can be considered dangerous. Neither is the behaviour conventional (as `auto` in fstab should only affect behaviour on boot and `mount -a`) or ever documented at all (not even in systemd, see systemd.mount(5) and crypttab(5)).
The problem was introduced in a37422045fbb68ad68f734e5dc00e0a5b1759773:
we have a unit which has a fragment, and when we'd update it based on
/proc/self/mountinfo, we'd say that e.g. What=/dev/loop8 has origin-fragment.
This commit changes two things:
- origin-fragment is changed to origin-mountinfo-implicit
- when we stop a unit, mountinfo information is flushed and all deps based
on it are dropped.
The second step is important, because when we restart the unit, we want to
notice that we have "fresh" mountinfo information. We could keep the old info
around and solve this in a different way, but keeping stale information seems
inelegant.
Fixes#11342.
This reverts commit 89f9752ea0.
This patch causes various problems during boot, where a "mount storm" occurs
naturally. Current approach is flakey, and it seems very risky to push a
feature like this which impacts boot right before a release. So let's revert
for now, and consider a more robust solution after later.
Fixes#11209.
> https://github.com/systemd/systemd/pull/11196#issuecomment-448523186:
"Reverting 89f9752ea0 and fcfb1f775e fixes this test."
The starting of mount units requires that changes to
/proc/self/mountinfo be processed before the SIGCHILD from the
completion of /sbin/mount is processed, as described by the comment
/* Note that due to the io event priority logic, we can be sure the new mountinfo is loaded
* before we process the SIGCHLD for the mount command. */
The recently-added mount-storm protection can defeat this as it
will sometimes deliberately delay processing of /proc/self/mountinfo.
So we need to disable mount-storm protection when a mount unit is starting.
We do this by keeping a counter of the number of pending
mounts, and disabling the protection when this is non-zero.
Thanks to @asavah for finding and reporting this problem.
If we create 2000 mounts (on a 1-CPU qemu VM) with
mkdir -p /MNT/{1..2000}
time for i in {1..2000}; do mount --bind /etc /MNT/$i ; done
it takes around 20 seconds to complete. Much of this time is taken up
by systemd repeatedly processing /proc/self/mountinfo.
If I disable the processing, the time drops to about 4 seconds.
I have reports that on a larger system with multiple active user sessions, each
with it's own systemd, the impact can be higher.
One particular use-case where a large number of mounts can be expected in quick
succession is when the "clearcase" SCM starts up.
This patch modifies the handling up events from /proc/self/mountinfo so
that systemd backs off when a storm is detected. Specifically the time to process
mountinfo is measured, and the process will not be repeated until 10 times
that duration has passed. This ensures systemd won't use more than 10% of
real time processing mountinfo.
With this patch, my test above takes about 5 seconds.
For services (and other units) we generally follow the rule that at the
beginning of each cycle, i.e. when the INACTIVE/FAILED state is left for
ACTIVATING/ACTIVE we flush out various state variables. Mount units
handled this differently so far when the unit state change was effected
outside of systemd: in that case these variables would be flushed out
when going back to INACTIVE/FAILED already.
Let's fix that, and flush out this state always during the activating
transition, not during the deactivating transition.
We never know what the changes triggered by mount_set_state() do to the
unit. Let's be safe and copy the device path into our set, so that we
are safe against that.
It doesn't matter what kind of precise failure we had earlier with
loading the unit, let's report that it loaded successfully now, after
all the kernel is an OK source for that, like any other.
Whenever we notice a change on an existing /proc/self/mountinfo line,
let's update the deps generated from it. For that, let's flush out the
old deps generated this way, and add in the new ones.
This takes benefit of the fact that today (unlike a comment this patch
removes says) we can remove deps in a somewhat reasonable way.
Let's set 'from_proc_self_mountinfo' right away, since we know its from
there. This is important so that when the load queue is dispatched (and
thus mount_load() called) this
fact is already known.
In a previous commit we added logic that mount_add_extras() (or more
precisely mount_add_default_dependencies()) adds in dependencies on
remote-fs.target and local-fs.target, hence we can drop this from
mount_setup_new_unit() and let the usual load queue dispatching take
care of this.