No functional change, just moving a bunch of things around. Before
we needed a rather complicated setup to test hostname_setup(), because
the code was in src/core/. When things are moved to src/shared/
we can just test it as any function.
The test is still "unsafe" because hostname_setup() may modify the
hostname.
This beefs up the READ_FULL_FILE_CONNECT_SOCKET logic of
read_full_file_full() a bit: when used a sender socket name may be
specified. If specified as NULL behaviour is as before: the client
socket name is picked by the kernel. But if specified as non-NULL the
client can pick a socket name to use when connecting. This is useful to
communicate a minimal amount of metainformation from client to server,
outside of the transport payload.
Specifically, these beefs up the service credential logic to pass an
abstract AF_UNIX socket name as client socket name when connecting via
READ_FULL_FILE_CONNECT_SOCKET, that includes the requesting unit name
and the eventual credential name. This allows servers implementing the
trivial credential socket logic to distinguish clients: via a simple
getpeername() it can be determined which unit is requesting a
credential, and which credential specifically.
Example: with this patch in place, in a unit file "waldo.service" a
configuration line like the following:
LoadCredential=foo:/run/quux/creds.sock
will result in a connection to the AF_UNIX socket /run/quux/creds.sock,
originating from an abstract namespace AF_UNIX socket:
@$RANDOM/unit/waldo.service/foo
(The $RANDOM is replaced by some randomized string. This is included in
the socket name order to avoid namespace squatting issues: the abstract
socket namespace is open to unprivileged users after all, and care needs
to be taken not to use guessable names)
The services listening on the /run/quux/creds.sock socket may thus
easily retrieve the name of the unit the credential is requested for
plus the credential name, via a simpler getpeername(), discarding the
random preifx and the /unit/ string.
This logic uses "/" as separator between the fields, since both unit
names and credential names appear in the file system, and thus are
designed to use "/" as outer separators. Given that it's a good safe
choice to use as separators here, too avoid any conflicts.
This is a minimal patch only: the new logic is used only for the unit
file credential logic. For other places where we use
READ_FULL_FILE_CONNECT_SOCKET it is probably a good idea to use this
scheme too, but this should be done carefully in later patches, since
the socket names become API that way, and we should determine the right
amount of info to pass over.
On systems that have a udev before
a7fdc6cbd3 uevents would sometimes be
eaten because of device node collisions that caused the ruleset to fail.
Let's add an (ugly) work-around for this, so that we can even work with
such an older udev.
Previously, we'd just wait for the first moment where the kernel exposes
the same numbre of partitions as libblkid tells us. After that point we
enumerate kernel partitions and look for matching libblkid partitions.
With this change we'll instead enumerate with libblkid only, and then
wait for each kernel partition to show up with the exact parameters we
expect them to show up. Once that happens we are happy.
Care is taken to use the udev device notification messages only as hint
to recheck what the kernel actually says. That's because we are
otherwise subject to a race: we might see udev events from an earlier
use of a loopback device. After all these devices are heavily recycled.
Under the assumption that we'll get udev events for *at least* all
partitions we care about (but possibly more) we can fix the race
entirely with one more fix coming in a later commit: if we make sure
that a loopback block device has zero kernel partitions when we take
possession of it, it doesn't matter anymore if we get spurious udev
events from a previous use. All we have to do is notice when the devices
we need all popped up.
If the first boot was aborted, /etc/machine-id might read as
"uninitialized" in some cases. Add a separate case for this
instead of printing a confusing error message.
Let's suppress the secondary arch data, since we never ever want to
mount it if we found the primary arch.
Previously we only suppressed in the Verity case, but there's little
reason to entertain the idea of a secondary arch in non-Verity
environments either, we are not going to use them, and should not do
decryption or anything like that.
Uppercase first char of log message, and indicate correct program name.
Reindent comment table at one place.
Use correct, specific, enum type at one more place.
Just some refactoring: let's place the various verity related parameters
in a common structure, and pass that around instead of the individual
parameters.
Also, let's load the PKCS#7 signature data when finding metadata
right-away, instead of delaying this until we need it. In all cases we
call this there's not much time difference between the metdata finding
and the loading, hence this simplifies things and makes sure root hash
data and its signature is now always acquired together.
Let's make libcryptsetup a dlopen() style dep for PID 1 (i.e. for
RootImage= and stuff), systemd-growfs and systemd-repart. (But leave to
be a regulra dep in systemd-cryptsetup, systemd-veritysetup and
systemd-homed since for them the libcryptsetup support is not auxiliary
but pretty much at the core of what they do.)
This should be useful for container images that want systemd in the
payload but don't care for the cryptsetup logic since dm-crypt and stuff
isn't available in containers anyway.
Fixes: #8249
"crypt-util.c" is such a generic name, let's avoid that, in particular
as libc's/libcrypt's crypt() function is so generically named too that
one might thing this is about that. Let's hence be more precise, and
make clear that this is about cryptsetup, and nothing else.
We already had cryptsetup-util.[ch] in src/cryptsetup/ doing keyfile
management. To avoid the needless confusion, let's rename that file to
cryptsetup-keyfile.[ch].
N_DEVICE_NODE_LIST_ATTEMPTS is unconditionally used since version 246 and
ac1f3ad05f
However, this variable is only defined if HAVE_BLKID is set resulting in
the following build failure if cryptsetup is enabled but not libblkid:
../src/shared/dissect-image.c:1336:34: error: 'N_DEVICE_NODE_LIST_ATTEMPTS' undeclared (first use in this function)
1336 | for (unsigned i = 0; i < N_DEVICE_NODE_LIST_ATTEMPTS; i++) {
|
Fixes:
- http://autobuild.buildroot.org/results/67782c225c08387c1bbcbea9eee3ca12bc6577cd
This matches how we handle things everywhere else, i.e. in .mount units,
and similar: when a mount point dir is missing, we create it, let's do
so too when dealing with disk images.
This makes things a lot simpler, more robust, and systematic.
Similarly to "setup" vs. "set up", "fallback" is a noun, and "fall back"
is the verb. (This is pretty clear when we construct a sentence in the
present continous: "we are falling back" not "we are fallbacking").
If we don't succeed on the first try it's because another process is
opening the same device. Do a microsleep for 2ms to increase the
chances it has completed the next time around the loop.
The symlink /dev/mapper/dm_name is created by udev after a mapper
device is set up. So libdevmapper/libcrypsetup might tell us that
a verity device exists, but the symlink we use as the source for
the mount operation might not be there yet.
Instead of falling back to a new unique device set up, wait for
the udev event matching on the expected devlink for at least 100ms
(after which the benefits of sharing a device in terms of setup
time start to disappear - on my production machines, opening a new
verity device seems to take between 150ms and 300ms)
This effectively makes little difference because we exit soon later
anyway, which will close the fds, too. However, it's still useful since
it means the parent will get EOF events on them in the order we process
things and isn't delayed to process the data from the pipes until the
child dies.
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:
RootImageOptions=1:ro,dev 2:nosuid nodev
In absence of a partition number, 0 is assumed.
Opening a verity device is an expensive operation. The kernelspace operations
are mostly sequential with a global lock held regardless of which device
is being opened. In userspace jumps in and out of multiple libraries are
required. When signatures are used, there's the additional cryptographic
checks.
We know when two devices are identical: they have the same root hash.
If libcrypsetup returns EEXIST, double check that the hashes are really
the same, and that either both or none have a signature, and if everything
matches simply remount the already open device. The kernel will do
reference counting for us.
In order to quickly and reliably discover if a device is already open,
change the node naming scheme from '/dev/mapper/major:minor-verity' to
'/dev/mapper/$roothash-verity'.
Unfortunately libdevmapper is not 100% reliable, so in some case it
will say that the device already exists and it is active, but in
reality it is not usable. Fallback to an individually-activated
unique device name in those cases for robustness.
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it.
dm-verity support in dissect-image at the moment is restricted to GPT
volumes.
If the image a single-filesystem type without a partition table (eg: squashfs)
and a roothash/verity file are passed, set the verity flag and mark as
read-only.