This new helper not only removes a file from a directory but also
ensures its space on disk is deallocated, by either punching a hole over
the full file or truncating the file afterwards if the file's link
counter is 0. This is useful in "vacuuming" algorithms to ensure that
client's can't keep the disk space the vacuuming is supposed to recover
pinned simply by keeping an fd open to it.
The new flag returns the O_PATH fd of the final component, which may be
converted into a proper fd by open()ing it again through the
/proc/self/fd/xyz path.
Together with O_SAFE this provides us with a somewhat safe way to open()
files in directories potentially owned by unprivileged code, where we
want to refuse operation if any symlink tricks are played pointing to
privileged files.
When the flag is specified we won't transition to a privilege-owned
file or directory from an unprivileged-owned one. This is useful when
privileged code wants to load data from a file unprivileged users have
write access to, and validates the ownership, but want's to make sure
that no symlink games are played to read a root-owned system file
belonging to a different context.
Let's rework touch_file() so that it works correctly on sockets, fifos,
and device nodes: let's open an O_PATH file descriptor first and operate
based on that, if we can. This is usually the better option as it this
means we can open AF_UNIX nodes in the file system, and update their
timestamps and ownership correctly. It also means we can correctly touch
symlinks and block/character devices without triggering their drivers.
Moreover, by operating on an O_PATH fd we can make sure that we
operate on the same inode the whole time, and it can't be swapped out in
the middle.
While we are at it, rework the call so that we try to adjust as much as
we can before returning on error. This is a good idea as we call the
function quite often without checking its result, and hence it's best to
leave the files around in the most "correct" fashion possible.
The kernel will reply with -ENOTDIR when we try to access a non-directory under
a name which ends with a slash. But our functions would strip the trailing slash
under various circumstances. Keep the trailing slash, so that
path_is_mount_point("/path/to/file/") return -ENOTDIR when /path/to/file/ is a file.
Tests are added for this change in behaviour.
Also, when called with a trailing slash, path_is_mount_point() would get
"" from basename(), and call name_to_handle_at(3, "", ...), and always
return -ENOENT. Now it'll return -ENOTDIR if the mount point is a file, and
true if it is a directory and a mount point.
v2:
- use strip_trailing_chars()
v3:
- instead of stripping trailing chars(), do the opposite — preserve them.
Linux doesn't have faccess(), hence let's emulate it. Linux has access()
and faccessat() but neither allows checking the access rights of an fd
passed in directly.
test_readlink_and_make_absolute switches to a temp directory, and then
removes it.
test_get_files_in_directory calls opendir(".") from a directory that has
been removed from the filesystem.
This call sequence triggers a bug in Gentoo's sandbox library. This
library attempts to resolve the "." to an absolute path, and aborts when
it ultimately fails to do so.
Re-ordering the calls works around the issue until the sandbox library
can be fixed to more gracefully deal with this.
Bug: https://bugs.gentoo.org/590084
If chase_symlinks() encouters an absolute symlink, it resets the todo
buffer to just the newly discovered symlink and discards any of the
remaining previous symlink path. Regardless of whether or not the
symlink is absolute or relative, we need to preserve the remainder of
the path that has not yet been resolved.
Fixes:
```
$ ./libtool --mode=execute valgrind --leak-check=full ./test-fs-util
...
==22871==
==22871== 27 bytes in 1 blocks are definitely lost in loss record 1 of 1
==22871== at 0x4C2FC47: realloc (vg_replace_malloc.c:785)
==22871== by 0x4E86D05: strextend (string-util.c:726)
==22871== by 0x4E8F347: chase_symlinks (fs-util.c:712)
==22871== by 0x109EBF: test_chase_symlinks (test-fs-util.c:75)
==22871== by 0x10C381: main (test-fs-util.c:305)
==22871==
```
Closes#4888
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.
(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)
Fixes: #3867
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.