This is analogous to 8d3ae2bd4c, except that now
src/core/umount.c not src/core/mount.c is converted.
Might help with https://bugzilla.redhat.com/show_bug.cgi?id=1554943, or not.
In the patch, mnt_free_tablep and mnt_free_iterp are declared twice. It'd
be nicer to define them just once in mount-setup.h, but then libmount.h would
have to be included there. libmount.h seems to be buggy, and declares some
defines which break other headers, and working around this is more pain than
the two duplicate lines. So let's live with the duplication for now.
This fixes memleak of MountPoint in mount_points_list_get() on error, not that
it matters any.
There is little point in logging about unmounting errors if the
exact mountpoint will be successfully unmounted in a later retry
due unmounts below it having been removed.
Additionally, don't log those errors if we are going to switch back
to a initrd, because that one is also likely to finalize the remaining
mountpoints. If not, it will log errors then.
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.
This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why #8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
Remount, and subsequent umount, attempts can hang for inaccessible network
based mount points. This can leave a system in a hard hang state that
requires a hard reset in order to recover. This change moves the remount,
and umount attempts into separate child processes. The remount and umount
operations will block for up to 90 seconds (DEFAULT_TIMEOUT_USEC). Should
those waits fail, the parent will issue a SIGKILL to the child and continue
with the shutdown efforts.
In addition, instead of only reporting some additional errors on the final
attempt, failures are reported as they occur.
The linux umount2() systemcall accepts a MNT_FORCE flags
which some filesystems honor, particularly FUSE and various
network filesystems such as NFS.
These filesystems can sometimes wait for an indefinite period
for a response from an external service, and the wait if
sometimes "uninterruptible" meaning that the process cannot be
killed.
Using MNT_FORCE causes any such request that are outstanding to
be aborted. This normally allows the waiting process to
be killed. It will then realease and reference it has to the
filesytem, this allowing the filesystem to be unmounted.
If there remain active references to the filesystem, MNT_FORCE
is *not* forcefull enough to unmount the filesystem anyway.
By the time that umount_all() is run by systemd-shutdown, all
filesystems *should* be unmounted, and sync() will have been
called. Anything that remains cannot be unmounted in a
completely clean manner and just nees to be dealt with as firmly
as possible. So use MNT_FORCE and try to explain why in the
comment.
Also enhance an earlier comment to explain why umount2() is
safe even though mount(MNT_REMOUNT) isn't.
After previous output from systemd-shutdown indicated a bug, my attention
was drawn to redundant output lines. Did they indicate an anomaly?
It turns out to be an expected, harmless result of the current code. But
we don't have much justification to run such redundant operations. Let's
remove the confusing redundant message.
We can stop trying to remount a directory read-only once its mount entry
has successfully been changed to "ro". We can simply let the kernel keep
track of this for us. I don't bother to try and avoid re-parsing the
mountinfo. I appreciate snappy shutdowns, but this code is already
intricate and buggy enough (see issue 7131).
(Disclaimer: At least for the moment, you can't _rely_ on always seeing
suspicious output from systemd-shutdown. By default, you can expect the
kernel to truncate the log output of systemd-shutdown. Ick ick ick!
Because /dev/kmsg is rate-limited by default. Normally it prints a message
"X lines supressed", but we tend to shut down before the timer expires
in this case).
Before:
systemd-shutdown[1]: Remounting '/' read-only with options 'seclabel...
EXT4-fs (vda3): re-mounted. Opts: data=ordered
systemd-shutdown[1]: Remounting '/' read-only with options 'seclabel, ...
EXT4-fs (vda3): re-mounted. Opts: data=ordered
After:
systemd-shutdown[1]: Remounting '/' read-only with options 'seclabel, ...
EXT4-fs (vda3): re-mounted. Opts: data=ordered
I also tested with `systemctl reboot --force`, plus a loopback mount to
cause one of the umounts to fail initially. In this case another 2 lines
of output are removed (out of a larger number of lines).
The assumption was that nothing changes in the final attempt. This
would be confusing if a filesystem with a process in uninterruptible
sleep suddenly became un-stuck for the final attempt, but we still give
up and don't try to e.g. unmount any parent mounts.
I don't know how possible that is. But the code will be easier to read
without an assumption that it does not attempt to justify.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
systemd-shutdown is run after the network is stopped,
so remounting a network filesystem read-only can hang.
A simple umount is the most useful thing that can
be done for a network filesystem once the network is down.
https://bugzilla.redhat.com/show_bug.cgi?id=1227736#c49
We counted how many filesystems could not be unmounted, but only for those
filesystems which we tried to unmount. Since we only remount / ro, without
attempting to unmount, we would emit a confusing error message:
Remounting '/' read-only with options 'seclabel,space_cache,subvolid=5,subvol=/'.
Remounting '/' read-only with options 'seclabel,space_cache,subvolid=5,subvol=/'.
Remounting '/' read-only with options 'seclabel,space_cache,subvolid=5,subvol=/'.
All filesystems unmounted.
Warn when remount-ro fails, and for filesystems which we won't try to unmount,
include the failure to remount-ro in n_failed.
A few minor cleanups:
- remove unecessary goto which jumps to the next line anyway
- always calculate n_failed, even if log_error is false. This causes no change
in behaviour, but I think the code is easier to follow, since the log setting
cannot influence other logic.
Compiling against the dm-ioctl.h header as provided by the Linux kernel
will embed the DM interface version number. Running an older kernel can
result in an error like this on shutdown:
Could not detach DM dm-11: ioctl mismatch, kernel(4.34.4), user(4.35.4)
Work around this by shipping a local copy of dm-ioctl.h. We need at
least the version from 3.13 for DM_DEFERRED_REMOVE [1], so bump the
requirements in README accordingly.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2c140a246dc0bc085b98eddde978060fcec1080cFixes: #5492
A mount within /run/initramfs is indicative that the mount was
created by initramfs init and will be unmounted by initramfs
shutdown.
It is unlikely that such a mount point would even be unmountable
by the the main system, for example in the case of the root file-
system being loop-mounted from a file in a /run/initramfs mount.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
man 2 mount says that the mountflags and data parameteres should
match the original values except for the desired changes. We only
bother with the mount options since the only flags we can change
are MS_RDONLY, MS_SYNCHRONOUS and MS_MANDLOCK; which shouldn't
matter too much.
Fixes: #351
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
Introduce a proper enum, and don't pass around string ids anymore. This
simplifies things quite a bit, and makes virtualization detection more
similar to architecture detection.
Change cunescape() to return a normal error code, so that we can
distuingish OOM errors from parse errors.
This also adds a flags parameter to control whether "relaxed" or normal
parsing shall be done. If set no parse failures are generated, and the
only reason why cunescape() can fail is OOM.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
After all, mounts below these directories are pretty much guaranteed to
be virtual, and it's hence unnecessary to unmount them during shutdown.
Moreover, in less-priviliged containers we might lack the rights to
unmount them, hence don't even try.
http://lists.freedesktop.org/archives/systemd-devel/2015-January/027113.html
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.