Let's suppress the secondary arch data, since we never ever want to
mount it if we found the primary arch.
Previously we only suppressed in the Verity case, but there's little
reason to entertain the idea of a secondary arch in non-Verity
environments either, we are not going to use them, and should not do
decryption or anything like that.
Uppercase first char of log message, and indicate correct program name.
Reindent comment table at one place.
Use correct, specific, enum type at one more place.
Just some refactoring: let's place the various verity related parameters
in a common structure, and pass that around instead of the individual
parameters.
Also, let's load the PKCS#7 signature data when finding metadata
right-away, instead of delaying this until we need it. In all cases we
call this there's not much time difference between the metdata finding
and the loading, hence this simplifies things and makes sure root hash
data and its signature is now always acquired together.
Let's make libcryptsetup a dlopen() style dep for PID 1 (i.e. for
RootImage= and stuff), systemd-growfs and systemd-repart. (But leave to
be a regulra dep in systemd-cryptsetup, systemd-veritysetup and
systemd-homed since for them the libcryptsetup support is not auxiliary
but pretty much at the core of what they do.)
This should be useful for container images that want systemd in the
payload but don't care for the cryptsetup logic since dm-crypt and stuff
isn't available in containers anyway.
Fixes: #8249
"crypt-util.c" is such a generic name, let's avoid that, in particular
as libc's/libcrypt's crypt() function is so generically named too that
one might thing this is about that. Let's hence be more precise, and
make clear that this is about cryptsetup, and nothing else.
We already had cryptsetup-util.[ch] in src/cryptsetup/ doing keyfile
management. To avoid the needless confusion, let's rename that file to
cryptsetup-keyfile.[ch].
N_DEVICE_NODE_LIST_ATTEMPTS is unconditionally used since version 246 and
ac1f3ad05f
However, this variable is only defined if HAVE_BLKID is set resulting in
the following build failure if cryptsetup is enabled but not libblkid:
../src/shared/dissect-image.c:1336:34: error: 'N_DEVICE_NODE_LIST_ATTEMPTS' undeclared (first use in this function)
1336 | for (unsigned i = 0; i < N_DEVICE_NODE_LIST_ATTEMPTS; i++) {
|
Fixes:
- http://autobuild.buildroot.org/results/67782c225c08387c1bbcbea9eee3ca12bc6577cd
This matches how we handle things everywhere else, i.e. in .mount units,
and similar: when a mount point dir is missing, we create it, let's do
so too when dealing with disk images.
This makes things a lot simpler, more robust, and systematic.
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
If we don't succeed on the first try it's because another process is
opening the same device. Do a microsleep for 2ms to increase the
chances it has completed the next time around the loop.
The symlink /dev/mapper/dm_name is created by udev after a mapper
device is set up. So libdevmapper/libcrypsetup might tell us that
a verity device exists, but the symlink we use as the source for
the mount operation might not be there yet.
Instead of falling back to a new unique device set up, wait for
the udev event matching on the expected devlink for at least 100ms
(after which the benefits of sharing a device in terms of setup
time start to disappear - on my production machines, opening a new
verity device seems to take between 150ms and 300ms)
This effectively makes little difference because we exit soon later
anyway, which will close the fds, too. However, it's still useful since
it means the parent will get EOF events on them in the order we process
things and isn't delayed to process the data from the pipes until the
child dies.
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:
RootImageOptions=1:ro,dev 2:nosuid nodev
In absence of a partition number, 0 is assumed.
Opening a verity device is an expensive operation. The kernelspace operations
are mostly sequential with a global lock held regardless of which device
is being opened. In userspace jumps in and out of multiple libraries are
required. When signatures are used, there's the additional cryptographic
checks.
We know when two devices are identical: they have the same root hash.
If libcrypsetup returns EEXIST, double check that the hashes are really
the same, and that either both or none have a signature, and if everything
matches simply remount the already open device. The kernel will do
reference counting for us.
In order to quickly and reliably discover if a device is already open,
change the node naming scheme from '/dev/mapper/major:minor-verity' to
'/dev/mapper/$roothash-verity'.
Unfortunately libdevmapper is not 100% reliable, so in some case it
will say that the device already exists and it is active, but in
reality it is not usable. Fallback to an individually-activated
unique device name in those cases for robustness.
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it.
dm-verity support in dissect-image at the moment is restricted to GPT
volumes.
If the image a single-filesystem type without a partition table (eg: squashfs)
and a roothash/verity file are passed, set the verity flag and mark as
read-only.
Single filesystem images are mounted from the /dev/block/X:Y symlink
rather than /dev/loopZ, so we need to wait for udev to create it or
mounting will be racy and occasionally fail.
In case the dissected image has a filesystem, don't scan for partitions. This
avoids problems with services using a `RootImage=` in early boot when udevd is
not yet started.
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)