This is similar to TAKE_PTR() but operates on file descriptors, and thus
assigns -1 to the fd parameter after returning it.
Removes 60 lines from our codebase. Pretty good too I think.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
This usually is very annoying to users who then cannot log in, so
make sure we always warn if that happens (selinux, or whatever other reason).
This reverts a790812cb3.
This new helper not only removes a file from a directory but also
ensures its space on disk is deallocated, by either punching a hole over
the full file or truncating the file afterwards if the file's link
counter is 0. This is useful in "vacuuming" algorithms to ensure that
client's can't keep the disk space the vacuuming is supposed to recover
pinned simply by keeping an fd open to it.
The commit f14f1806e3 introduced CHASE_SAFE
flag. When the flag is set, then `fd_parent` may not be properly closed.
This sets `_cleanup_close_` attribute to `fd_parent`.
Thus, now `fd_parent` is always closed properly.
The commit b1bfb84804 makes chase_symlinks()
recognize empty string for root as an invalid parameter. However,
empty root is often used e.g. systemd-nspawn.
This makes chase_symlinks() support empty string safely.
Fixes#7927.
If we take a relative path we first make it absolute, based on the
current working directory. But if CHASE_PREFIX_ROOT is passe we are
supposed to make the path absolute taking the specified root path into
account, but that makes no sense if we talk about the current working
directory as that is relative to the host's root in any case. Hence,
let's refuse this politely.
The new flag returns the O_PATH fd of the final component, which may be
converted into a proper fd by open()ing it again through the
/proc/self/fd/xyz path.
Together with O_SAFE this provides us with a somewhat safe way to open()
files in directories potentially owned by unprivileged code, where we
want to refuse operation if any symlink tricks are played pointing to
privileged files.
When the flag is specified we won't transition to a privilege-owned
file or directory from an unprivileged-owned one. This is useful when
privileged code wants to load data from a file unprivileged users have
write access to, and validates the ownership, but want's to make sure
that no symlink games are played to read a root-owned system file
belonging to a different context.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
Let's rework touch_file() so that it works correctly on sockets, fifos,
and device nodes: let's open an O_PATH file descriptor first and operate
based on that, if we can. This is usually the better option as it this
means we can open AF_UNIX nodes in the file system, and update their
timestamps and ownership correctly. It also means we can correctly touch
symlinks and block/character devices without triggering their drivers.
Moreover, by operating on an O_PATH fd we can make sure that we
operate on the same inode the whole time, and it can't be swapped out in
the middle.
While we are at it, rework the call so that we try to adjust as much as
we can before returning on error. This is a good idea as we call the
function quite often without checking its result, and hence it's best to
leave the files around in the most "correct" fashion possible.
The kernel will reply with -ENOTDIR when we try to access a non-directory under
a name which ends with a slash. But our functions would strip the trailing slash
under various circumstances. Keep the trailing slash, so that
path_is_mount_point("/path/to/file/") return -ENOTDIR when /path/to/file/ is a file.
Tests are added for this change in behaviour.
Also, when called with a trailing slash, path_is_mount_point() would get
"" from basename(), and call name_to_handle_at(3, "", ...), and always
return -ENOENT. Now it'll return -ENOTDIR if the mount point is a file, and
true if it is a directory and a mount point.
v2:
- use strip_trailing_chars()
v3:
- instead of stripping trailing chars(), do the opposite — preserve them.
Already, path_is_safe() refused paths container the "." dir. Doing that
isn't strictly necessary to be "safe" by most definitions of the word.
But it is necessary in order to consider a path "normalized". Hence,
"path_is_safe()" is slightly misleading a name, but
"path_is_normalize()" is more descriptive, hence let's rename things
accordingly.
No functional changes.
Linux doesn't have faccess(), hence let's emulate it. Linux has access()
and faccessat() but neither allows checking the access rights of an fd
passed in directly.
If we follow an absolute symlink there's no need to prefix the path with
a "/", since by definition it already has one.
This helps suppressing double "/" in resolved paths containing absolute
symlinks.
If a path passes though an autofs filesystem, then accessing
the path might trigger and automount. As systemd-tmpfiles is run before
the network is up, and as automounts are often used for networked
filesystems, this can cause a deadlock.
So chase_symlinks is enhance to accept a new flag which tells it
to check for autofs, and return -EREMOTE if autofs is found.
tmpfiles is changed to check just before acting on a path so that it
can avoid autofs even if a symlink was created earlier by tmpfiles
that would send this path through an autofs.
This fixes a deadlock that happens when /home is listed in /etc/fstab as
x-systemd.automount for an NFS directory.
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
If chase_symlinks() encouters an absolute symlink, it resets the todo
buffer to just the newly discovered symlink and discards any of the
remaining previous symlink path. Regardless of whether or not the
symlink is absolute or relative, we need to preserve the remainder of
the path that has not yet been resolved.
Let's permit invoking chase_symlinks() with a NULL return parameter. If so, the
resolved name is not returned, and call is useful for checking for existance of
a file, without actually returning its ultimate path.
This new flag controls whether to consider a problem if the referenced path
doesn't actually exist. If specified it's OK if the final file doesn't exist.
Note that this permits one or more final components of the path not to exist,
but these must not contain "../" for safety reasons (or, to be extra safe,
neither "./" and a couple of others, i.e. what path_is_safe() permits).
This new flag is useful when resolving paths before issuing an mkdir() or
open(O_CREAT) on a path, as it permits that the file or directory is created
later.
The return code of chase_symlinks() is changed to return 1 if the file exists,
and 0 if it doesn't. The latter is only returned in case CHASE_NON_EXISTING is
set.
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Previously, we'd generate an EINVAL error if it is attempted to escape a root
directory with relative ".." symlinks. With this commit this is changed so that
".." from the root directory is a NOP, following the kernel's own behaviour
where /.. is equivalent to /.
As suggested by @keszybz.
chase_symlinks() currently expects a fully qualified, absolute path, relative
to the host's root as first argument. Which is useful in many ways, and similar
to the paths unlink(), rename(), open(), … expect. Sometimes it's however
useful to first prefix the specified path with the specified root directory.
Add a new call chase_symlinks_prefix() for this, that is a simple wrapper.
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
It's a common pattern, so add a helper for it. A macro is necessary
because a function that takes a pointer to a pointer would be type specific,
similarly to cleanup functions. Seems better to use a macro.
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.
(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)
Fixes: #3867
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
This is slightly nicer, since we actually watch the directories we opened and
enumerate. However, primarily this is preparation for adding support for
opening journal files by fd without specifying any path, to be added in a later
commit.