Commit Graph

54 Commits

Author SHA1 Message Date
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Yu Watanabe 6ef8df2ba8 mount-util: call mount_option_mangle() in mount_verbose() 2018-02-21 09:06:53 +09:00
Yu Watanabe 9e7f941acb mount-util: add mount_option_mangle()
This is used in the later commits.
2018-02-21 09:06:47 +09:00
Lennart Poettering 548f69375e tree-wide: use path_hash_ops instead of string_hash_ops whenever we key by a path
Let's make use of our new hash_ops!
2018-02-12 11:07:55 +01:00
Lennart Poettering fbd0b64f44
tree-wide: make use of new STRLEN() macro everywhere (#7639)
Let's employ coccinelle to do this for us.

Follow-up for #7625.
2017-12-14 19:02:29 +01:00
Lennart Poettering 35bbbf85e0 basic: turn off stdio locking for a couple of helper calls
These helper calls are potentially called often, and allocate FILE*
objects internally for a very short period of time, let's turn off
locking for them too.
2017-12-14 10:46:19 +01:00
Lennart Poettering 93719c6b0e mount-util: shorten the loop a bit (#7545)
The loop preparation and part of the loop contents are actually the
same, let's merge this.

Also, it's so much fun tweaking around in the name_to_handle_at() code,
let's do more of it with this patch!

(This also adds two NULL assignments, that aren't strictly necessary.
However, I figured its safer to place them in there, just in case the
for() condition is changed later. After all the freeing of the handle
and the invalidation of the cleanup-controller pointer to it are
otherwise really far away from each other...)
2017-12-06 13:19:03 +09:00
Lennart Poettering 2d3a5a73e0 nspawn: make sure images containing an ESP are compatible with userns -U mode
In -U mode we might need to re-chown() all files and directories to
match the UID shift we want for the image. That's problematic on fat
partitions, such as the ESP (and which is generated by mkosi's
--bootable switch), because fat of course knows no UID/GID file
ownership natively.

With this change we take benefit of the uid= and gid= mount options FAT
knows: instead of chown()ing all files and directories we can just
specify the right UID/GID to use at mount time.

This beefs up the image dissection logic in two ways:

1. First of all support for mounting relevant file systems with
   uid=/gid= is added: when a UID is specified during mount it is used for
   all applicable file systems.

2. Secondly, two new mount flags are added:
   DISSECT_IMAGE_MOUNT_ROOT_ONLY and DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY.
   If one is specified the mount routine will either only mount the root
   partition of an image, or all partitions except the root partition.
   This is used by nspawn: first the root partition is mounted, so that
   we can determine the UID shift in use so far, based on ownership of
   the image's root directory. Then, we mount the remaining partitions
   in a second go, this time with the right UID/GID information.
2017-12-05 13:49:12 +01:00
Lennart Poettering 01a7e0a14d mount-util: do not use the official MAX_HANDLE_SZ (#7523)
If we'd use the system header's version of MAX_HANDLE_SZ then our code
would break on older kernels as soon as the value is increased, as old
kernels refuse larger buffers with EINVAL.
2017-12-03 12:18:33 +01:00
Lennart Poettering 1a2d4d7084
Merge pull request #7237 from keszybz/growfs
Create and grow filesystems
2017-12-01 17:58:58 +01:00
Lennart Poettering 976c047841 mount-util: tape over name_to_handle_at() flakiness (#7517)
Apparently, the kernel returns EINVAL on NFS4 sometimes, even if we do
everything right, let's fallback in that case and find a different
approach to determine if something's a mount point.

See discussion at:

https://github.com/systemd/systemd/issues/7082#issuecomment-348001289
2017-12-01 12:59:16 +01:00
Zbigniew Jędrzejewski-Szmek b12d25a8d6 util-lib: use trailing slash in chase_symlinks, fd_is_mount_point, path_is_mount_point
The kernel will reply with -ENOTDIR when we try to access a non-directory under
a name which ends with a slash. But our functions would strip the trailing slash
under various circumstances. Keep the trailing slash, so that

path_is_mount_point("/path/to/file/") return -ENOTDIR when /path/to/file/ is a file.

Tests are added for this change in behaviour.

Also, when called with a trailing slash, path_is_mount_point() would get
"" from basename(), and call name_to_handle_at(3, "", ...), and always
return -ENOENT. Now it'll return -ENOTDIR if the mount point is a file, and
true if it is a directory and a mount point.

v2:
- use strip_trailing_chars()

v3:
- instead of stripping trailing chars(), do the opposite — preserve them.
2017-11-30 20:43:25 +01:00
Lennart Poettering 4739fc554d mount-util: fix bad indenting 2017-11-23 13:28:06 +01:00
Lennart Poettering c83b20d73b mount-util: EOVERFLOW might have other causes than buffer size issues
When we get EOVERFLOW this might be caused by untriggered nfs4 mounts
(see discussion at
https://github.com/systemd/systemd/pull/7395#issuecomment-346164481 and
further down).

Handle this nicely by falling back to fdinfo-based mntid determination.

Fixes: #7082
2017-11-23 13:28:06 +01:00
Lennart Poettering 0d9bcb7c37 mount-util: fix error propagation in fd_fdinfo_mnt_id() 2017-11-23 13:28:06 +01:00
Lennart Poettering fc010b01e7 mount-util: drop exponential buffer growing in name_to_handle_at_loop()
So, it appears name_to_handle_at() always returns the right buffer size
on EOVERFLOW, when it's returned due to a too small buffer. Let's rely
on that exclusively for sizing the buffer, and let's drop the
exponential buffer growing.

The new logic is now: if we see EOVERFLOW and the returned size has
increased, resize our buffer and try again. But if it didn't increase,
then propagate the EOVERFLOW as it likely has other causes.
2017-11-23 13:28:06 +01:00
Lennart Poettering c2a986d509 mount-util: add new path_get_mnt_id() call that queries the mnt ID of a path
This is a simple wrapper around name_to_handle_at_loop() and
fd_fdinfo_mnt_id() to query the mnt ID of a path. It uses
name_to_handle_at() where it can, and falls back to to
fd_fdinfo_mnt_id() where that doesn't work.

This is a best-effort thing of course, since neither name_to_handle_at()
nor the fdinfo logic work on all kernels.
2017-11-21 11:37:12 +01:00
Lennart Poettering cbfb8679dd mount-util: add name_to_handle_at_loop() wrapper around name_to_handle_at()
As it turns out MAX_HANDLE_SZ is a lie, the handle buffer we pass into
name_to_handle_at() might need to be larger than MAX_HANDLE_SZ, and we
thus need to invoke name_to_handle_at() in a loop, growing the buffer as
needed.

This adds a new wrapper name_to_handle_at_loop() around
name_to_handle_at() that does the necessary looping, and ports over all
users.

Fixes: #7082
2017-11-21 11:37:12 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek 5991ce44dc udevadm,basic: replace nulstr_contains with STR_IN_SET (#6965)
STR_IN_SET is a newer approach which is easier to write and read, and which
seems to result in space savings too:

before:
4949848 build/src/shared/libsystemd-shared-234.so
 350704 build/systemctl
4967184 build/systemd
 826216 build/udevadm

after:
4949848 build/src/shared/libsystemd-shared-234.so
 350704 build/systemctl
4966888 build/systemd
 826168 build/udevadm
2017-10-04 19:32:12 +02:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00
Lennart Poettering 7941e2189b mount-util: add fusectl to list of API VFS 2017-09-29 14:36:33 +02:00
Lennart Poettering 154d22695a dissect: split list of discard-supporting fs out into mount-util.c
Let's manage the list of file systems that do a specific thing at one
place, following similar naming.

No functional changes.
2017-09-29 14:36:29 +02:00
Lennart Poettering 896f937f58 dissect: automatically mark partitions read-only that have a read-only file system
Specifically, squashfs and iso9660 are always read-only, hence make sure
we never even think about mounting them writable.
2017-09-29 14:36:29 +02:00
Yu Watanabe e2be442e79 systemd-mount: allow to specify an arbitrary string for arg_mount_what when vfs is used
Fixes #6591.
2017-09-04 10:55:51 +09:00
Timothée Ravier ac9de0b379 core: open /proc/self/mountinfo early to allow mounts over /proc (#5985)
Enable masking the /proc folder using the 'InaccessiblePaths' unit
option.

This also slightly simplify mounts setup as the bind_remount_recursive
function will only open /proc/self/mountinfo once.

This is based on the suggestion at:
https://lists.freedesktop.org/archives/systemd-devel/2017-April/038634.html
2017-05-19 14:38:40 +02:00
Lennart Poettering 059c35f507 mount-util: accept that name_to_handle_at() might fail with EPERM (#5499)
Container managers frequently block name_to_handle_at(), returning
EACCES or EPERM when this is issued. Accept that, and simply fall back
to to fdinfo-based checks.

Note that we accept either EACCES or EPERM here, as container managers
can choose the error code and aren't very good on agreeing on just one.

(note that this is a non-issue with nspawn, as we permit
name_to_handle_at() there, only block open_by_handle_at(), which should
be sufficiently safe).
2017-03-01 11:35:05 -05:00
Lennart Poettering afe682bc7f util-lib: make verbose_mount() grok MS_MOVE
Let's print a proper message if we see MS_MOVE.
2016-12-20 20:00:09 +01:00
Zbigniew Jędrzejewski-Szmek c73838280c Modify mount_propagation_flags_from_string to return a normal int code
This means that callers can distiguish an error from flags==0,
and don't have to special-case the empty string.
2016-12-17 13:57:04 -05:00
Lennart Poettering 835552511e core: hook up MountFlags= to the transient unit logic
This makes "systemd-run -p MountFlags=shared -t /bin/sh" work, by making
MountFlags= to the list of properties that may be accessed transiently.
2016-12-13 21:22:13 +01:00
Lennart Poettering c4f4fce79e fs-util: add flags parameter to chase_symlinks()
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
2016-12-01 00:25:51 +01:00
Lennart Poettering e187369587 tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
2016-12-01 00:25:51 +01:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Brian J. Murrell 67ae43665e Recognise Lustre as a remote file system (#4530)
Lustre is also a remote file system that wants the network to be up before it is mounted.
2016-11-01 04:48:00 +01:00
Evgeny Vereshchagin 548bd57376 basic: fallback to the fstat if we don't have access to the /proc/self/fdinfo
https://github.com/systemd/systemd/pull/4372#discussion_r83354107:
I get `open("/proc/self/fdinfo/13", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)`

327   mkdir("/proc", 0755 <unfinished ...>
327   <... mkdir resumed> )             = -1 EEXIST (File exists)
327   stat("/proc",  <unfinished ...>
327   <... stat resumed> {st_dev=makedev(8, 1), st_ino=28585, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=4, st_size=1024, st_atime=2016/10/14-02:55:32, st_mtime=2016/
327   mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL <unfinished ...>
327   <... mount resumed> )             = 0
327   lstat("/proc",  <unfinished ...>
327   <... lstat resumed> {st_dev=makedev(0, 34), st_ino=1, st_mode=S_IFDIR|0555, st_nlink=75, st_uid=65534, st_gid=65534, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2016/10/14-03:13:35.971031263,
327   lstat("/proc/sys", {st_dev=makedev(0, 34), st_ino=4026531855, st_mode=S_IFDIR|0555, st_nlink=1, st_uid=65534, st_gid=65534, st_blksize=1024, st_blocks=0, st_size=0, st_atime=2016/10/14-03:13:39.1630
327   openat(AT_FDCWD, "/proc", O_RDONLY|O_DIRECTORY|O_CLOEXEC|O_PATH) = 11</proc>
327   name_to_handle_at(11</proc>, "sys", {handle_bytes=128}, 0x7ffe3a238604, AT_SYMLINK_FOLLOW) = -1 EOPNOTSUPP (Operation not supported)
327   name_to_handle_at(11</proc>, "", {handle_bytes=128}, 0x7ffe3a238608, AT_EMPTY_PATH) = -1 EOPNOTSUPP (Operation not supported)
327   openat(11</proc>, "sys", O_RDONLY|O_CLOEXEC|O_PATH) = 13</proc/sys>
327   open("/proc/self/fdinfo/13", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
327   close(13</proc/sys> <unfinished ...>
327   <... close resumed> )             = 0
327   close(11</proc> <unfinished ...>
327   <... close resumed> )             = 0

-bash-4.3# ls -ld /proc/
dr-xr-xr-x 76 65534 65534 0 Oct 14 02:57 /proc/

-bash-4.3# ls -ld /proc/1
dr-xr-xr-x 9 root root 0 Oct 14 02:57 /proc/1

-bash-4.3# ls -ld /proc/1/fdinfo
dr-x------ 2 65534 65534 0 Oct 14 03:00 /proc/1/fdinfo
2016-10-23 23:15:46 -04:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 60e76d4897 nspawn,mount-util: add [u]mount_verbose and use it in nspawn
This makes it easier to debug failed nspawn invocations:

Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")...
Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")...
Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")...
Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")...
Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")...
Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11 16:50:07 -04:00
Lennart Poettering 6b7c9f8bce namespace: rework how ReadWritePaths= is applied
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths=
specification, then we'd first recursively apply the ReadOnlyPaths= paths, and
make everything below read-only, only in order to then flip the read-only bit
again for the subdirs listed in ReadWritePaths= below it.

This is not only ugly (as for the dirs in question we first turn on the RO bit,
only to turn it off again immediately after), but also problematic in
containers, where a container manager might have marked a set of dirs read-only
and this code will undo this is ReadWritePaths= is set for any.

With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be
applied to the children listed in ReadWritePaths= in the first place, so that
we do not need to turn off the RO bit for those after all.

This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the
RO bit, but never to turn it off again. Or to say this differently: if some
dirs are marked read-only via some external tool, then ReadWritePaths= will not
undo it.

This is not only the safer option, but also more in-line with what the man page
currently claims:

        "Entries (files or directories) listed in ReadWritePaths= are
        accessible from within the namespace with the same access rights as
        from outside."

To implement this change bind_remount_recursive() gained a new "blacklist"
string list parameter, which when passed may contain subdirs that shall be
excluded from the read-only mounting.

A number of functions are updated to add more debug logging to make this more
digestable.
2016-09-25 10:40:51 +02:00
Lennart Poettering 450442cf93 add a new tool for creating transient mount and automount units
This adds "systemd-mount" which is for transient mount and automount units what
"systemd-run" is for transient service, scope and timer units.

The tool allows establishing mounts and automounts during runtime. It is very
similar to the usual /bin/mount commands, but can pull in additional
dependenices on access (for example, it pulls in fsck automatically), an take
benefit of the automount logic.

This tool is particularly useful for mount removable file systems (such as USB
sticks), as the automount logic (together with automatic unmount-on-idle), as
well as automatic fsck on first access ensure that the removable file system
has a high chance to remain in a fully clean state even when it is unplugged
abruptly, and returns to a clean state on the next re-plug.

This is a follow-up for #2471, as it adds a simple client-side for the
transient automount logic added in that PR.

In later work it might make sense to invoke this tool automatically from udev
rules in order to implement a simpler and safer version of removable media
management á la udisks.
2016-08-18 22:41:19 +02:00
Alban Crequy 98df8089be namespace: don't fail on masked mounts (#3794)
Before this patch, a service file with ReadWriteDirectories=/file...
could fail if the file exists but is not a mountpoint, despite being
listed in /proc/self/mountinfo. It could happen with masked mounts.

Fixes https://github.com/systemd/systemd/issues/3793
2016-07-25 15:39:46 +02:00
Alessandro Puccetti b3d1d51603 namespace: ensure to return a valid inaccessible nodes (#3778)
Because /run/systemd/inaccessible/{chr,blk} are devices with
major=0 and minor=0 it might be possible that these devices cannot be created
so we use /run/systemd/inaccessible/sock instead to map them.
2016-07-22 15:59:14 +02:00
Alessandro Puccetti c4b4170746 namespace: unify limit behavior on non-directory paths
Despite the name, `Read{Write,Only}Directories=` already allows for
regular file paths to be masked. This commit adds the same behavior
to `InaccessibleDirectories=` and makes it explicit in the doc.
This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}`
{dile,device}nodes and mounts on the appropriate one the paths specified
in `InacessibleDirectories=`.

Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
2016-07-19 17:22:02 +02:00
Valentin Vidić 0a86e68147 basic/mount-util: recognize ocfs2 as network fs (#3713) 2016-07-14 07:34:36 +02:00
Torstein Husebø 61233823aa treewide: fix typos and remove accidental repetition of words 2016-07-11 16:18:43 +02:00
Zbigniew Jędrzejewski-Szmek a44cb5cbf7 basic/mount-util: recognize pvfs2 as network fs (#3140)
Added to kernel 4.6.
2016-04-28 19:49:16 +02:00
Alexander Kuleshov c4b6915670 tree-wide: no need to pass excess flags to open()/openat() if O_PATH is passed
As described in the documentation:

When O_PATH is specified in flags, flag bits other than O_CLOEXEC,
O_DIRECTORY, and O_NOFOLLOW are ignored.

So, we can remove unnecessary flags in a case when O_PATH is passed
to the open() or openat().
2016-03-02 00:42:49 +06:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Thomas Hindoe Paaboel Andersen 93cc7779e0 basic: re-sort includes
My previous patch to only include what we use accidentially placed
the added inlcudes in non-sorted order.
2015-12-01 23:40:17 +01:00
Thomas Hindoe Paaboel Andersen 11c3a36649 basic: include only what we use
This is a cleaned up result of running iwyu but without forward
declarations on src/basic.
2015-11-30 21:51:03 +01:00
Lennart Poettering 3f2c0becc3 automount: move generically userful call repeat_mount() into mount-util.[ch] 2015-10-27 14:25:58 +01:00