chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
Previously, we'd mount the ESP to /efi if that existed and was empty,
falling back to /boot if that existed and was empty.
With this change, the XBOOTLDR partition is mounted to /boot
unconditionally. And the EFI is mounted to /efi if that exists (but it
doesn't have to be empty — after all the name is very indicative of what
this is supposed to be), and to /boot as a fallback but only if it
exists and is empty (we insist on emptiness for that, since it might be
used differently than what we assume).
The net effect is that $BOOT should be reliably found under /boot, and
the ESP is either /efi or /boot.
(Note that this commit only is relevant for nspawn and suchlike, i.e.
the codepaths that mount an image without involving udev during boot.)
The boot loader spec supports two places to store boot loader
configuration: the ESP and a generic replacement for it in case the ESP
is not available or not suitable. Let's look for both.
gcc-9 complains that the string may be truncated when written into the output
structure. This shouldn't happen, but if it did, in principle we could remove a
different structure (with a matching name prefix). Let's just refuse the
operation if the name doesn't fit.
Generators run in a context where waiting for udev is not an option,
simply because it's not running there yet. Hence, let's not wait for it
in this case.
This is generally OK to do as we are operating on the root disk only
here, which should have been probed already by the time we come this
far.
An alternative fix might be to remove the udev dependency from image
dissection again in the long run (and thus replace reliance on
/dev/block/x:y somehow with something else).
Fixes: #11205
Fixes#10526.
Even if we waited for the root device to appear, the mount could still fail if
we didn't wait for udev to initalize the device. In particular, the
/dev/block/n:m path used to mount the device is created by udev, and nspawn
would sometimes win the race and the mount would fail with -ENOENT.
The same wait is done for partitions, since if we try to mount them, the same
considerations apply.
Note: I first implemented a version which just does a loop (with a short wait).
In that approach, udev takes on average ~800 µs to initialize the loopback
device. The approach where we set up a monitor and avoid the loop is a bit
nicer. There doesn't seem to be a significant difference in speed.
With 1000 invocations of 'systemd-nspawn -i image.squashfs echo':
loop (previous approach):
real 4m52.625s
user 0m37.094s
sys 2m14.705s
monitor (this patch):
real 4m50.791s
user 0m36.619s
sys 2m14.039s
dissect-image would wait for the root device and paritions to appear. But if we
had an image with no partitions, we'd not wait at all. If the kernel or udev
were slow in creating device nodes or symlinks, subsequent mount attempt might
fail if nspawn won the race.
Calling wait_for_partitions_to_appear() in case of no partitions means that we
verify that the kernel agrees that there are no partitions. We verify that the
kernel sees the same number of partitions as blkid, so let's that also in this
case.
This makes the failure in #10526 much less likely, but doesn't eliminate it
completely. Stay tuned.
fdopen doesn't accept "e", it's ignored. Let's not mislead people into
believing that it actually sets O_CLOEXEC.
From `man 3 fdopen`:
> e (since glibc 2.7):
> Open the file with the O_CLOEXEC flag. See open(2) for more information. This flag is ignored for fdopen()
As mentioned by @jlebon in #11131.
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.
No code changes, just some rearranging of source files.
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
Instead of
Please enter passphrase for disk <disk-name>!
use
Please enter passphrase for disk <disk-name>:
which is more polite and matches Plymouth convention.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
We already have a flag for creating a new mount namespace for the child.
Let's add an extension to that: a new FORK_MOUNTNFS_SLAVE flag. When
used in combination will mark all mounts in the child namespace as
MS_SLAVE so that the child can freely mount or unmount stuff but it
won't leak into the parent.
We already do this kind of validation in nspawn when we operate on a
plain directory, let's also do this on raw images under the same
condition: that we are about too boot the image. Also, do this when we
are about to read OS metadata from it.
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.
In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.
Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.
Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.
Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
When we try to read meta-data from an image, don't bother with mounting
/home or the ESP, as that's not where the metadata is. This not only
speeds things up a bit, but also has the benefit that setups where an
unencrypted root is mixed with an encrypted /home (which I have on one
of my own systems) won't result in errors that the crypto key is needed.
This extends on #8609, and makes two changes:
1. We'll now explicitly check that the child devices of a block device
we are interested in (i.e. the partitions) are block devices themselves.
On newer kernels the mmc rpmb stuff is actually exposed as char rather
than block device as before, and they probably should have been that in
the first place. By adding this check we'll hence filter out these weird
devices through a second rule too, that hopefully makes things a bit
more future-proof, should more devices like this be added eventually,
or other subsystems do a similar thing.
2. When counting partitions we'll now also check the devnum of the
device being non-null, which we already do when matching up the devices
in the second iteration. This should make things more robust, and
prevent other kinds of miscounting, which after all was the main
issue #8609 fixed.
Filter-out RPMB partitions and boot partitions from MMC devices when
counting partitions enumerated by the kernel. Also factor out the now
duplicated code into a separate function.
This complement the previous fixes to the problem reported in
https://github.com/systemd/systemd/issues/5806