The fstab entry may contain comment/application-specific options, like
for example x-systemd.automount or x-initrd.mount.
With the recent switch to libmount, the mount options during remount are
now gathered via mnt_fs_get_options(), which returns the merged fstab
options with the effective options in mountinfo.
Unfortunately if one of these application-specific options are set in
fstab, the remount will fail with -EINVAL.
In systemd 238:
Remounting '/test-x-initrd-mount' read-only in with options
'errors=continue,user_xattr,acl'.
In systemd 239:
Remounting '/test-x-initrd-mount' read-only in with options
'errors=continue,user_xattr,acl,x-initrd.mount'.
Failed to remount '/test-x-initrd-mount' read-only: Invalid argument
So instead of using mnt_fs_get_options(), we're now using both
mnt_fs_get_fs_options() and mnt_fs_get_vfs_options() and merging the
results together so we don't get any non-relevant options from fstab.
Signed-off-by: aszlig <aszlig@nix.build>
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.
In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.
Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.
Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.
Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
example.swaps with "(deleted)" does not cause bogus entries in the list now,
but a memleak in libmount instead. The memleaks is not very important since
this code is run just once.
Reported as https://github.com/karelzak/util-linux/issues/596.
$ build/test-umount
...
/* test_swap_list("/proc/swaps") */
path=/var/tmp/swap o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
/* test_swap_list("/home/zbyszek/src/systemd/test/test-umount/example.swaps") */
path=/some/swapfile o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
==26912==
==26912== HEAP SUMMARY:
==26912== in use at exit: 16 bytes in 1 blocks
==26912== total heap usage: 1,546 allocs, 1,545 frees, 149,008 bytes allocated
==26912==
==26912== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==26912== at 0x4C31C15: realloc (vg_replace_malloc.c:785)
==26912== by 0x55C5D8C: _IO_vfscanf (in /usr/lib64/libc-2.26.so)
==26912== by 0x55D8AEC: vsscanf (in /usr/lib64/libc-2.26.so)
==26912== by 0x55D25C3: sscanf (in /usr/lib64/libc-2.26.so)
==26912== by 0x53236D0: mnt_table_parse_stream (in /usr/lib64/libmount.so.1.1.0)
==26912== by 0x53249B6: mnt_table_parse_file (in /usr/lib64/libmount.so.1.1.0)
==26912== by 0x10D157: swap_list_get (umount.c:194)
==26912== by 0x10B06E: test_swap_list (test-umount.c:34)
==26912== by 0x10B24B: main (test-umount.c:56)
==26912==
==26912== LEAK SUMMARY:
==26912== definitely lost: 16 bytes in 1 blocks
==26912== indirectly lost: 0 bytes in 0 blocks
==26912== possibly lost: 0 bytes in 0 blocks
==26912== still reachable: 0 bytes in 0 blocks
==26912== suppressed: 0 bytes in 0 blocks
This is analogous to 8d3ae2bd4c, except that now
src/core/umount.c not src/core/mount.c is converted.
Might help with https://bugzilla.redhat.com/show_bug.cgi?id=1554943, or not.
In the patch, mnt_free_tablep and mnt_free_iterp are declared twice. It'd
be nicer to define them just once in mount-setup.h, but then libmount.h would
have to be included there. libmount.h seems to be buggy, and declares some
defines which break other headers, and working around this is more pain than
the two duplicate lines. So let's live with the duplication for now.
This fixes memleak of MountPoint in mount_points_list_get() on error, not that
it matters any.
There is little point in logging about unmounting errors if the
exact mountpoint will be successfully unmounted in a later retry
due unmounts below it having been removed.
Additionally, don't log those errors if we are going to switch back
to a initrd, because that one is also likely to finalize the remaining
mountpoints. If not, it will log errors then.
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.
This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why #8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
Remount, and subsequent umount, attempts can hang for inaccessible network
based mount points. This can leave a system in a hard hang state that
requires a hard reset in order to recover. This change moves the remount,
and umount attempts into separate child processes. The remount and umount
operations will block for up to 90 seconds (DEFAULT_TIMEOUT_USEC). Should
those waits fail, the parent will issue a SIGKILL to the child and continue
with the shutdown efforts.
In addition, instead of only reporting some additional errors on the final
attempt, failures are reported as they occur.
The linux umount2() systemcall accepts a MNT_FORCE flags
which some filesystems honor, particularly FUSE and various
network filesystems such as NFS.
These filesystems can sometimes wait for an indefinite period
for a response from an external service, and the wait if
sometimes "uninterruptible" meaning that the process cannot be
killed.
Using MNT_FORCE causes any such request that are outstanding to
be aborted. This normally allows the waiting process to
be killed. It will then realease and reference it has to the
filesytem, this allowing the filesystem to be unmounted.
If there remain active references to the filesystem, MNT_FORCE
is *not* forcefull enough to unmount the filesystem anyway.
By the time that umount_all() is run by systemd-shutdown, all
filesystems *should* be unmounted, and sync() will have been
called. Anything that remains cannot be unmounted in a
completely clean manner and just nees to be dealt with as firmly
as possible. So use MNT_FORCE and try to explain why in the
comment.
Also enhance an earlier comment to explain why umount2() is
safe even though mount(MNT_REMOUNT) isn't.
After previous output from systemd-shutdown indicated a bug, my attention
was drawn to redundant output lines. Did they indicate an anomaly?
It turns out to be an expected, harmless result of the current code. But
we don't have much justification to run such redundant operations. Let's
remove the confusing redundant message.
We can stop trying to remount a directory read-only once its mount entry
has successfully been changed to "ro". We can simply let the kernel keep
track of this for us. I don't bother to try and avoid re-parsing the
mountinfo. I appreciate snappy shutdowns, but this code is already
intricate and buggy enough (see issue 7131).
(Disclaimer: At least for the moment, you can't _rely_ on always seeing
suspicious output from systemd-shutdown. By default, you can expect the
kernel to truncate the log output of systemd-shutdown. Ick ick ick!
Because /dev/kmsg is rate-limited by default. Normally it prints a message
"X lines supressed", but we tend to shut down before the timer expires
in this case).
Before:
systemd-shutdown[1]: Remounting '/' read-only with options 'seclabel...
EXT4-fs (vda3): re-mounted. Opts: data=ordered
systemd-shutdown[1]: Remounting '/' read-only with options 'seclabel, ...
EXT4-fs (vda3): re-mounted. Opts: data=ordered
After:
systemd-shutdown[1]: Remounting '/' read-only with options 'seclabel, ...
EXT4-fs (vda3): re-mounted. Opts: data=ordered
I also tested with `systemctl reboot --force`, plus a loopback mount to
cause one of the umounts to fail initially. In this case another 2 lines
of output are removed (out of a larger number of lines).
The assumption was that nothing changes in the final attempt. This
would be confusing if a filesystem with a process in uninterruptible
sleep suddenly became un-stuck for the final attempt, but we still give
up and don't try to e.g. unmount any parent mounts.
I don't know how possible that is. But the code will be easier to read
without an assumption that it does not attempt to justify.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
systemd-shutdown is run after the network is stopped,
so remounting a network filesystem read-only can hang.
A simple umount is the most useful thing that can
be done for a network filesystem once the network is down.
https://bugzilla.redhat.com/show_bug.cgi?id=1227736#c49
We counted how many filesystems could not be unmounted, but only for those
filesystems which we tried to unmount. Since we only remount / ro, without
attempting to unmount, we would emit a confusing error message:
Remounting '/' read-only with options 'seclabel,space_cache,subvolid=5,subvol=/'.
Remounting '/' read-only with options 'seclabel,space_cache,subvolid=5,subvol=/'.
Remounting '/' read-only with options 'seclabel,space_cache,subvolid=5,subvol=/'.
All filesystems unmounted.
Warn when remount-ro fails, and for filesystems which we won't try to unmount,
include the failure to remount-ro in n_failed.
A few minor cleanups:
- remove unecessary goto which jumps to the next line anyway
- always calculate n_failed, even if log_error is false. This causes no change
in behaviour, but I think the code is easier to follow, since the log setting
cannot influence other logic.
Compiling against the dm-ioctl.h header as provided by the Linux kernel
will embed the DM interface version number. Running an older kernel can
result in an error like this on shutdown:
Could not detach DM dm-11: ioctl mismatch, kernel(4.34.4), user(4.35.4)
Work around this by shipping a local copy of dm-ioctl.h. We need at
least the version from 3.13 for DM_DEFERRED_REMOVE [1], so bump the
requirements in README accordingly.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2c140a246dc0bc085b98eddde978060fcec1080cFixes: #5492
A mount within /run/initramfs is indicative that the mount was
created by initramfs init and will be unmounted by initramfs
shutdown.
It is unlikely that such a mount point would even be unmountable
by the the main system, for example in the case of the root file-
system being loop-mounted from a file in a /run/initramfs mount.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
man 2 mount says that the mountflags and data parameteres should
match the original values except for the desired changes. We only
bother with the mount options since the only flags we can change
are MS_RDONLY, MS_SYNCHRONOUS and MS_MANDLOCK; which shouldn't
matter too much.
Fixes: #351
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.