Commit graph

360 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek be32732168 basic/set: let set_put_strdup() create the set with string hash ops
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.

hashmap_put_strdup() was already doing something similar.

(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)

No functional change intended.
2020-05-06 16:54:06 +02:00
Zbigniew Jędrzejewski-Szmek f6e9aa9e45 pid1: convert to the new scheme
In all the other cases, I think the code was clearer with the static table.
Here, not so much. And because of the existing dump code, the vtables cannot
be made static and need to remain exported. I still think it's worth to do the
change to have the cmdline introspection, but I'm disappointed with how this
came out.
2020-05-05 22:40:37 +02:00
Lennart Poettering 44b0d1fd59 core: add implicit ordering dep on blockdev@.target from all mount units
This way we shuld be able to order mounts properly against their backing
services in case complex storage is used (i.e. LUKS), even if the device
path used for mounting the devices is different from the expected device
node of the backing service.

Specifically, if we have a LUKS device /dev/mapper/foo that is mounted
by this name all is trivial as the relationship can be established a
priori easily. But if it is mounted via a /dev/disk/by-uuid/ symlink or
similar we only can relate the device node generated to the one mounted
at the moment the device is actually established. That's because the
UUID of the fs is stored inside the encrypted volume and thus not
knowable until the volume is set up. This patch tries to improve on this
situation: a implicit After=blockdev@.target dependency is generated for
all mounts, based on the data from /proc/self/mountinfo, which should be
the actual device node, with all symlinks resolved. This means that as
soon as the mount is established the ordering via blockdev@.target will
work, and that means during shutdown it is honoured, which is what we
are looking for.

Note that specifying /etc/fstab entries via UUID= for LUKS devices still
sucks and shouldn't be done, because it means we cannot know which LUKS
device to activate to make an fs appear, and that means unless the
volume is set up at boot anyway we can't really handle things
automatically when putting together transactions that need the mount.
2020-01-21 20:23:44 +01:00
Lennart Poettering 5de0acf40d core: let's be defensive, /dev/nfs is also a special mount source, filter it out 2020-01-21 20:23:34 +01:00
Lennart Poettering 219f3cd941 core: drop _pure_ from static functions
For static functions the compiler should figure this out on its own.
2020-01-21 20:23:30 +01:00
Lennart Poettering 0879fbd6fe mount: make checks on perpetual mount units more lax
We don#t really care where perpetual mounts are mounted from, since they
have to exist since before we run anyway.
2020-01-17 15:09:18 +01:00
Lennart Poettering 4bb68f2fee core: on each iteration processing /proc/self/mountinfo merge all discovery flags for each path
This extends on d253a45e1c, and instead of
merging just a single flag from previous mount entries of
/proc/self/mountinfo for the same path we merge all three.

This shouldn't change behaviour, but I think make things more readable.

Previously we'd set MOUNT_PROC_IS_MOUNTED unconditionally, we still do.

Previously we'd inherit MOUNT_PROC_JUST_MOUNTED from a previous entry on
the same line, we still do.

MOUNT_PROC_JUST_CHANGED should generally stay set too. Why that? If we
have two mount entries on the same mount point we'd first process one
and then the other, and the almost certainly different mount parameters
of the two would mean we'd set MOUNT_PROC_JUST_CHANGED for the second.
And with this we'll definitely do that still.

This also adds a comment explaining the situation a bit, and why we get
into this situation.
2020-01-15 17:42:12 +01:00
Zbigniew Jędrzejewski-Szmek 7c286cd6a6
Merge pull request #14505 from poettering/refuse-on-failure
refuse OnFailure= deps on units that have no failure state
2020-01-14 14:19:04 +01:00
Jun'ichi Nomura 1d086a6e59 mount: mark an existing "mounting" unit from /proc/self/mountinfo as "just_mounted"
When starting a mount unit, systemd invokes mount command and moves the
unit's internal state to "mounting".  Then it watches for updates of
/proc/self/mountinfo.  When the expected mount entry newly appears in
mountinfo, the unit internal state is changed to "mounting-done".
Finally, when systemd finds the mount command has finished, it checks
whether the unit internal state is "mounting-done" and changes the state
to "mounted".
If the state was not "mounting-done" in the last step though mount command
was successfully finished, the unit is marked as "failed" with following
log messages:
  Mount process finished, but there is no mount.
  Failed with result 'protocol'.

If daemon-reload is done in parallel with starting mount unit, it is
possible that things happen in following order and result in above failure.
  1. the mount unit state changes to "mounting"
  2. daemon-reload saves the unit state
  3. kernel completes the mount and /proc/self/mountinfo is updated
  4. daemon-reload restores the saved unit state, that is "mounting"
  5. systemd notices the mount command has finished but the unit state
     is still "mounting" though it should be "mounting-done"

mount_setup_existing_unit() should take into account that MOUNT_MOUNTING
is transitional state and set MOUNT_PROC_JUST_MOUNTED flag if the unit
comes from /proc/self/mountinfo so that mount_process_proc_self_mountinfo()
later can make state transition from "mounting" to "mounting-done".

Fixes: #10872
2020-01-14 12:15:09 +01:00
Lennart Poettering c80a9a33d0 core: clearly refuse OnFailure= deps on units that can't fail
Similar, refuse triggering deps on units that cannot trigger.

And rework how we ignore After= dependencies on device units, to work
the same way.

See: #14142
2020-01-09 11:03:53 +01:00
Lennart Poettering bf7eedbf8f mount: do not update exec deps on mountinfo changes
Fixes: #13978
2019-11-16 13:53:48 +01:00
Lennart Poettering b8e5776d38 mount: extend list of extrinsic mounts a bit 2019-11-16 13:53:48 +01:00
Franck Bui d336ba9fa6 core: drop 'wants' parameter from unit_add_node_dependency()
Since Wants dependency is no more automagically added to swap and mount units,
this parameter is no more used hence this patch drops it.
2019-10-28 18:51:23 +01:00
Zbigniew Jędrzejewski-Szmek 75193d4128 core: adjust load functions for other unit types to be more like service
No functional change, just adjusting code to follow the same pattern
everywhere. In particular, never call _verify() on an already loaded unit,
but return early from the caller instead. This makes the code a bit easier
to follow.
2019-10-11 13:46:05 +02:00
Zbigniew Jędrzejewski-Szmek c362077087 core: turn unit_load_fragment_and_dropin_optional() into a flag
unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional()
are really the same, with one minor difference in behaviour. Let's drop
the second function.

"_optional" in the name suggests that it's the "dropin" part that is optional.
(Which it is, but in this case, we mean the fragment to be optional.)
I think the new version with a flag is easier to understand.
2019-10-11 10:45:33 +02:00
Chris Down bc0623df16 cgroup: analyze: Report memory configurations that deviate from systemd
This is the most basic consumer of the new systemd-vs-kernel checker,
both acting as a reasonable standalone exerciser of the code, and also
as a way for easy inspection of deviations from systemd internal state.
2019-10-03 15:06:25 +01:00
Zbigniew Jędrzejewski-Szmek a232ebcc2c core: add support for RestartKillSignal= to override signal used for restart jobs
v2:
- if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary
  to preserve backwards compatibility (and keep KillSignal= generally useful).
2019-10-02 14:01:25 +02:00
Zbigniew Jędrzejewski-Szmek de5ae832f2
Merge pull request #13439 from yuwata/core-support-systemctl-clean-more
core: support systemctl clean more
2019-09-13 16:15:02 +02:00
Yu Watanabe 17e9d53d87 core/mount: support "systemctl clean" for mount units 2019-08-28 23:09:54 +09:00
Yu Watanabe 52a12341f9 core: make RuntimeDirectoryPreserve= works with non-service units 2019-08-23 00:08:16 +09:00
Zbigniew Jędrzejewski-Szmek 5cc2cd1cd8 pid1: always log successfull process termination quietly
Fixes #13372.
2019-08-22 09:09:45 +02:00
Lennart Poettering 9ddaa3e459 mount: rename update_parameters_proc_self_mount_info() → update_parameters_proc_self_mountinfo()
let's name the call like the file in /proc is actually called.
2019-07-18 17:03:11 +02:00
Lennart Poettering 350804867d mount: rescan /proc/self/mountinfo before processing waitid() results
(The interesting bits about the what and why are in a comment in the
patch, please have a look there instead of looking here in the commit
msg).

Fixes: #10872
2019-07-18 17:03:11 +02:00
Lennart Poettering fcd8e119c2 mount: simplify /proc/self/mountinfo handler
Our IO handler is only installed for one fd, hence there's no reason to
conditionalize on it again.

Also, split out the draining into a helper function of its own.
2019-07-18 17:03:10 +02:00
Zbigniew Jędrzejewski-Szmek e2857b3d87 Add helper function for mnt_table_parse_{stream,mtab}
This wraps a few common steps. It is defined as inline function instead of in a
.c file to avoid having a .c file. With a .c file, we would have three choices:
- either link it into libshared, but then then libshared would have to be
  linked to libmount.
- or compile the .c file into each target separately. This has the disdvantage
  that configuration of every target has to be updated and stuff will be compiled
  multiple times anyway, which is not too different from keeping this in the
  header file.
- or create a new convenience library just for this. This also has the disadvantage
  that the every target would have to be updated, and a separate library for a
  10 line function seems overkill.

By keeping everything in a header file, we compile this a few times, but
otherwise it's the least painful option. The compiler can optimize most of the
function away, because it knows if 'source' is set or not.
2019-04-23 23:29:29 +02:00
Zbigniew Jędrzejewski-Szmek 13dcfe4661 shared/mount-util: convert to libmount
It seems better to use just a single parsing algorithm for /proc/self/mountinfo.

Also, unify the naming of variables in all places that use mnt_table_next_fs().
It makes it easier to compare the different call sites.
2019-04-23 23:29:29 +02:00
Zbigniew Jędrzejewski-Szmek 9d1b2b2252 pid1,shutdown: do not cunescape paths from libmount
The test added in previous commit shows that libmount does the unescaping
internally.
2019-04-09 09:07:40 +02:00
Zbigniew Jędrzejewski-Szmek fb36b1339b shared: add a single definition of libmount cleanup functions
Use a trivial header file to share mnt_free_tablep and mnt_free_iterp.
It would be nicer put this in mount-util.h, but libmount.h is not in the
default include path, and the build system would have to be adjusted to pass
pkg-config include path in various places, and it's just not worth the trouble.
A separate header file works nicely.
2019-04-05 10:18:21 +02:00
Zbigniew Jędrzejewski-Szmek ee36fed438 core: avoid unnecessary cast 2019-03-28 09:45:19 +01:00
Franck Bui f75f613d25 core: reduce the number of stalled PIDs from the watched processes list when possible
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.

However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.

If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
2019-03-20 10:51:49 +01:00
Lennart Poettering 97a3f4ee05 core: rename unit_{start_limit|condition|assert}_test() to unit_test_xyz()
Just some renaming, no change in behaviour.

Background: I'd like to add more functions unit_test_xyz() that test
various things, hence let's streamline the naming a bit.
2019-03-18 16:06:36 +01:00
Tom Yan d0fe45cb15 mount: remove unused mount_is_auto and mount_is_automount 2019-02-15 00:16:54 +08:00
Tom Yan 142b8142d7 mount/generators: do not make unit wanted by its device unit
As device units will be reloaded by systemd whenever the corresponding device generates a "changed" event, if the mount unit / cryptsetup service is wanted by its device unit, the former can be restarted by systemd unexpectedly after the user stopped them explicitly. It is not sensible at all and can be considered dangerous. Neither is the behaviour conventional (as `auto` in fstab should only affect behaviour on boot and `mount -a`) or ever documented at all (not even in systemd, see systemd.mount(5) and crypttab(5)).
2019-02-15 00:16:54 +08:00
Stephan E ac8956efa2 Update mount.c
typo in output
2019-02-13 00:41:57 +09:00
Lennart Poettering a90d944359
Merge pull request #11562 from yuwata/fix-11558
core/mount: do not add Before=local-fs.target or remote-fs.target if nofail mount option is set
2019-01-26 14:46:48 +01:00
Zbigniew Jędrzejewski-Szmek c52c2dc64f pid1: fix cleanup of stale implicit deps based on /proc/self/mountinfo
The problem was introduced in a37422045fbb68ad68f734e5dc00e0a5b1759773:
we have a unit which has a fragment, and when we'd update it based on
/proc/self/mountinfo, we'd say that e.g. What=/dev/loop8 has origin-fragment.
This commit changes two things:
- origin-fragment is changed to origin-mountinfo-implicit
- when we stop a unit, mountinfo information is flushed and all deps based
  on it are dropped.

The second step is important, because when we restart the unit, we want to
notice that we have "fresh" mountinfo information. We could keep the old info
around and solve this in a different way, but keeping stale information seems
inelegant.

Fixes #11342.
2019-01-26 14:40:50 +01:00
Yu Watanabe 8c8203db90 core/mount: do not add Before=local-fs.target or remote-fs.target if nofail mount option is set
Follow-up for d54bab90e6.

Fixes #11558.
2019-01-26 12:00:18 +01:00
Zbigniew Jędrzejewski-Szmek b7bbf89025 core/mount: move static function earlier in file
No functional change.
2019-01-18 14:04:26 +01:00
Yu Watanabe d253a45e1c core/mount: make mount_setup_existing_unit() not drop MOUNT_PROC_JUST_MOUNTED flag from units
This fixes a bug introduced by ec88d1ea05.

Fixes #11362.
2019-01-09 12:51:00 +01:00
Zbigniew Jędrzejewski-Szmek ec8126d723 Revert "core/mount: minimize impact on mount storm."
This reverts commit 89f9752ea0.

This patch causes various problems during boot, where a "mount storm" occurs
naturally. Current approach is flakey, and it seems very risky to push a
feature like this which impacts boot right before a release. So let's revert
for now, and consider a more robust solution after later.

Fixes #11209.

> https://github.com/systemd/systemd/pull/11196#issuecomment-448523186:
"Reverting 89f9752ea0 and fcfb1f775e fixes this test."
2018-12-19 11:37:41 +01:00
Zbigniew Jędrzejewski-Szmek e36db50075 Revert "mount: disable mount-storm protection while mount unit is starting."
This reverts commit fcfb1f775e.
2018-12-19 11:32:17 +01:00
NeilBrown fcfb1f775e mount: disable mount-storm protection while mount unit is starting.
The starting of mount units requires that changes to
/proc/self/mountinfo be processed before the SIGCHILD from the
completion of /sbin/mount is processed, as described by the comment
  /* Note that due to the io event priority logic, we can be sure the new mountinfo is loaded
   * before we process the SIGCHLD for the mount command. */

The recently-added mount-storm protection can defeat this as it
will sometimes deliberately delay processing of /proc/self/mountinfo.

So we need to disable mount-storm protection when a mount unit is starting.
We do this by keeping a counter of the number of pending
mounts, and disabling the protection when this is non-zero.

Thanks to @asavah for finding and reporting this problem.
2018-12-19 00:44:19 +01:00
NeilBrown 89f9752ea0 core/mount: minimize impact on mount storm.
If we create 2000 mounts (on a 1-CPU qemu VM) with
  mkdir -p /MNT/{1..2000}
  time for i in {1..2000}; do mount --bind /etc /MNT/$i ; done

it takes around 20 seconds to complete.  Much of this time is taken up
by systemd repeatedly processing /proc/self/mountinfo.
If I disable the processing, the time drops to about 4 seconds.

I have reports that on a larger system with multiple active user sessions, each
with it's own systemd, the impact can be higher.

One particular use-case where a large number of mounts can be expected in quick
succession is when the "clearcase" SCM starts up.

This patch modifies the handling up events from /proc/self/mountinfo so
that systemd backs off when a storm is detected.  Specifically the time to process
mountinfo is measured, and the process will not be repeated until 10 times
that duration has passed.  This ensures systemd won't use more than 10% of
real time processing mountinfo.

With this patch, my test above takes about 5 seconds.
2018-12-16 12:38:40 +01:00
Lennart Poettering 7eba1463de mount: flush out cycle state on DEAD→MOUNTED only, not the other way round
For services (and other units) we generally follow the rule that at the
beginning of each cycle, i.e. when the INACTIVE/FAILED state is left for
ACTIVATING/ACTIVE we flush out various state variables. Mount units
handled this differently so far when the unit state change was effected
outside of systemd: in that case these variables would be flushed out
when going back to INACTIVE/FAILED already.

Let's fix that, and flush out this state always during the activating
transition, not during the deactivating transition.
2018-12-07 17:35:32 +01:00
Lennart Poettering ec88d1ea05 mount: replace three closely related mount flags into a proper flags enum
We pass these flags around, and even created a structure for them. Let's
fix things properly, and make them a flags value of its own.
2018-12-07 17:35:32 +01:00
Lennart Poettering b6418dc94e mount: strdup() device paths we collect
We never know what the changes triggered by mount_set_state() do to the
unit. Let's be safe and copy the device path into our set, so that we
are safe against that.
2018-12-07 17:35:32 +01:00
Lennart Poettering f8064c4fda mount: when the kernel reports a mount to be established reset all kinds of load failures
It doesn't matter what kind of precise failure we had earlier with
loading the unit, let's report that it loaded successfully now, after
all the kernel is an OK source for that, like any other.
2018-12-07 17:35:32 +01:00
Lennart Poettering a37422045f mount: regenerate all deps whenever a mount's parameters changes
Whenever we notice a change on an existing /proc/self/mountinfo line,
let's update the deps generated from it. For that, let's flush out the
old deps generated this way, and add in the new ones.

This takes benefit of the fact that today (unlike a comment this patch
removes says) we can remove deps in a somewhat reasonable way.
2018-12-07 17:35:32 +01:00
Lennart Poettering 6d7e89b070 mount: when allocating a Mount object based on /proc/self/mountinfo mark it so
Let's set 'from_proc_self_mountinfo' right away, since we know its from
there. This is important so that when the load queue is dispatched (and
thus mount_load() called) this
fact is already known.
2018-12-07 17:35:32 +01:00
Lennart Poettering 26e35b164b mount: let mount_add_extras() take care of remote-fs.target deps
In a previous commit we added logic that mount_add_extras() (or more
precisely mount_add_default_dependencies()) adds in dependencies on
remote-fs.target and local-fs.target, hence we can drop this from
mount_setup_new_unit() and let the usual load queue dispatching take
care of this.
2018-12-07 17:34:29 +01:00