Commit graph

111 commits

Author SHA1 Message Date
Luca Boccassi ac1f3ad05f verity: re-use already open devices if the hashes match
Opening a verity device is an expensive operation. The kernelspace operations
are mostly sequential with a global lock held regardless of which device
is being opened. In userspace jumps in and out of multiple libraries are
required. When signatures are used, there's the additional cryptographic
checks.

We know when two devices are identical: they have the same root hash.
If libcrypsetup returns EEXIST, double check that the hashes are really
the same, and that either both or none have a signature, and if everything
matches simply remount the already open device. The kernel will do
reference counting for us.

In order to quickly and reliably discover if a device is already open,
change the node naming scheme from '/dev/mapper/major:minor-verity' to
'/dev/mapper/$roothash-verity'.

Unfortunately libdevmapper is not 100% reliable, so in some case it
will say that the device already exists and it is active, but in
reality it is not usable. Fallback to an individually-activated
unique device name in those cases for robustness.
2020-07-21 23:42:03 +01:00
Luca Boccassi 536879480a dm-util: use CRYPT_DEACTIVATE_DEFERRED instead of ioctl 2020-07-21 23:26:41 +01:00
Luca Boccassi c2923fdcd7 dissect/nspawn: add support for dm-verity root hash signature
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it.
2020-06-25 08:45:21 +01:00
Luca Boccassi 0389f4fa81 core: add RootHash and RootVerity service parameters
Allow to explicitly pass root hash (explicitly or as a file) and verity
device/file as unit options. Take precedence over implicit checks.
2020-06-23 10:50:09 +02:00
Luca Boccassi e7cbe5cb9e dissect: support single-filesystem verity images with external verity hash
dm-verity support in dissect-image at the moment is restricted to GPT
volumes.
If the image a single-filesystem type without a partition table (eg: squashfs)
and a roothash/verity file are passed, set the verity flag and mark as
read-only.
2020-06-09 12:19:21 +01:00
Luca Boccassi b1806441bb dissect-image: wait for udev for single filesystem images too
Single filesystem images are mounted from the /dev/block/X:Y symlink
rather than /dev/loopZ, so we need to wait for udev to create it or
mounting will be racy and occasionally fail.
2020-06-08 13:06:53 +01:00
Lennart Poettering 58dfbfbdd6 dissect: use log_debug_errno() where appropriate 2020-05-18 18:41:56 +02:00
Topi Miettinen 1887032f71 shared/dissect-image: log messages from cryptsetup
Before:
```
write(2, "Device /dev/loop1p1 is too small.\n", 34) = -1 ENOTCONN (Transport
endpoint is not connected)
```

After:
```
$ journalctl -b -e | grep 'too small'
Apr 02 16:53:30 loora systemd[343579]: Device /dev/loop1p1 is too small.
```
2020-04-03 17:44:20 +02:00
Vito Caputo 4fa744a35c *: convert amenable fdopen calls to take_fdopen
Mechanical change to eliminate some cruft by using the
new take_fdopen{_unlocked}() wrappers where trivial.
2020-03-31 06:48:03 -07:00
Topi Miettinen 0108c42f59 dissect-image: avoid scanning partitions
In case the dissected image has a filesystem, don't scan for partitions. This
avoids problems with services using a `RootImage=` in early boot when udevd is
not yet started.
2020-03-10 10:03:57 +01:00
Lennart Poettering cf32c48657 dissect: optionally, run fsck before mounting dissected images
Some file systems want us to run fsck before mounting, hence do so,
optionally.
2020-01-29 19:29:44 +01:00
Lennart Poettering 0f7c9a3d81 dissect: complain if partition flags are set that we don't know 2020-01-29 19:29:39 +01:00
Lennart Poettering d4dffb8533 dissect: introduce new recognizable partition types for /var and /var/tmp
This has been requested many times before. Let's add it finally.

GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.

To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).

Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.

To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
2019-12-23 14:43:59 +01:00
Lennart Poettering 10c1b18888 valgrind: temporarily handle that valgrind still doesn't know LOOP_GET_STATUS64
Should be removed once valgrind learns it.
2019-12-02 10:06:56 +01:00
Yu Watanabe f5947a5e92 tree-wide: drop missing.h 2019-10-31 17:57:03 +09:00
Zbigniew Jędrzejewski-Szmek a5648b8094 basic/fs-util: change CHASE_OPEN flag into a separate output parameter
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.

(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
2019-10-24 22:44:24 +09:00
Lennart Poettering 66855de739 tree-wide: make use of errno_or_else() everywhere 2019-07-11 23:20:31 +02:00
Lennart Poettering a709a3154d dissect: split out DM deferred remove into src/shared/dm-util.c
The function is generally useful, let's split it out so that we can make
use of it later on in systemd-homed.
2019-07-05 02:19:24 +09:00
Yu Watanabe 657ee2d82b tree-wide: replace strjoin() with path_join() 2019-06-21 03:26:16 +09:00
Yu Watanabe 1b47436e0e util: make device_wait_for_initialization() optionally takes timeout value 2019-06-04 01:19:43 +09:00
Ben Boeckel 5238e95759 codespell: fix spelling errors 2019-04-29 16:47:18 +02:00
Yu Watanabe 01234e1fe7 tree-wide: drop several missing_*.h and import relevant headers from kernel-5.0 2019-04-11 19:00:37 +02:00
Lennart Poettering d8b4d14df4 util: split out nulstr related stuff to nulstr-util.[ch] 2019-03-14 13:25:52 +01:00
Lennart Poettering d9223c07f5 dissect: when mounting an image mount the XBOOTLDR partition to /boot
Previously, we'd mount the ESP to /efi if that existed and was empty,
falling back to /boot if that existed and was empty.

With this change, the XBOOTLDR partition is mounted to /boot
unconditionally. And the EFI is mounted to /efi if that exists (but it
doesn't have to be empty — after all the name is very indicative of what
this is supposed to be), and to /boot as a fallback but only if it
exists and is empty (we insist on emptiness for that, since it might be
used differently than what we assume).

The net effect is that $BOOT should be reliably found under /boot, and
the ESP is either /efi or /boot.

(Note that this commit only is relevant for nspawn and suchlike, i.e.
the codepaths that mount an image without involving udev during boot.)
2019-03-01 12:41:32 +01:00
Lennart Poettering a8c47660bb dissect: automatically detect boot loader spec $BOOT partition
The boot loader spec supports two places to store boot loader
configuration: the ESP and a generic replacement for it in case the ESP
is not available or not suitable. Let's look for both.
2019-03-01 12:41:32 +01:00
Lennart Poettering 59ba6d0c17 dissect: use SYNTHETIC_ERRNO() where appropriate 2019-03-01 12:41:32 +01:00
Zbigniew Jędrzejewski-Szmek cd8c98d7a7 shared/dissect-image: make sure that we don't truncate device name
gcc-9 complains that the string may be truncated when written into the output
structure. This shouldn't happen, but if it did, in principle we could remove a
different structure (with a matching name prefix). Let's just refuse the
operation if the name doesn't fit.
2019-01-27 09:35:36 +01:00
Lennart Poettering f70e7f70c9 dissect: add some assert()s 2018-12-19 23:27:47 +01:00
Lennart Poettering 052eaf5c93 gpt-auto-generator: don't wait for udev
Generators run in a context where waiting for udev is not an option,
simply because it's not running there yet. Hence, let's not wait for it
in this case.

This is generally OK to do as we are operating on the root disk only
here, which should have been probed already by the time we come this
far.

An alternative fix might be to remove the udev dependency from image
dissection again in the long run (and thus replace reliance on
/dev/block/x:y somehow with something else).

Fixes: #11205
2018-12-19 23:27:47 +01:00
Zbigniew Jędrzejewski-Szmek a8040b6d0a dissect-image: wait for the main device and all partitions to be known by udev
Fixes #10526.

Even if we waited for the root device to appear, the mount could still fail if
we didn't wait for udev to initalize the device. In particular, the
/dev/block/n:m path used to mount the device is created by udev, and nspawn
would sometimes win the race and the mount would fail with -ENOENT.

The same wait is done for partitions, since if we try to mount them, the same
considerations apply.

Note: I first implemented a version which just does a loop (with a short wait).
In that approach, udev takes on average ~800 µs to initialize the loopback
device. The approach where we set up a monitor and avoid the loop is a bit
nicer. There doesn't seem to be a significant difference in speed.
With 1000 invocations of 'systemd-nspawn -i image.squashfs echo':

loop (previous approach):
real	4m52.625s
user	0m37.094s
sys	2m14.705s

monitor (this patch):
real	4m50.791s
user	0m36.619s
sys	2m14.039s
2018-12-17 13:50:57 +01:00
Zbigniew Jędrzejewski-Szmek b887c8b8a8 dissect-image: wait for the root to appear
dissect-image would wait for the root device and paritions to appear. But if we
had an image with no partitions, we'd not wait at all. If the kernel or udev
were slow in creating device nodes or symlinks, subsequent mount attempt might
fail if nspawn won the race.

Calling wait_for_partitions_to_appear() in case of no partitions means that we
verify that the kernel agrees that there are no partitions. We verify that the
kernel sees the same number of partitions as blkid, so let's that also in this
case.

This makes the failure in #10526 much less likely, but doesn't eliminate it
completely. Stay tuned.
2018-12-17 13:50:57 +01:00
Zbigniew Jędrzejewski-Szmek ea887be00b dissect-image: split out a chunk of dissect_image() out
No functional change, just moving code around.
2018-12-17 13:50:57 +01:00
Chris Down e92aaed30e tree-wide: Remove O_CLOEXEC from fdopen
fdopen doesn't accept "e", it's ignored. Let's not mislead people into
believing that it actually sets O_CLOEXEC.

From `man 3 fdopen`:

> e (since glibc 2.7):
> Open the file with the O_CLOEXEC flag. See open(2) for more information. This flag is ignored for fdopen()

As mentioned by @jlebon in #11131.
2018-12-12 20:47:40 +01:00
Lennart Poettering 686d13b9f2 util-lib: split out env file parsing code into env-file.c
It's quite complex, let's split this out.

No code changes, just some file rearranging.
2018-12-02 13:22:29 +01:00
Lennart Poettering e4de72876e util-lib: split out all temporary file related calls into tmpfiles-util.c
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.

No code changes, just some rearranging of source files.
2018-12-02 13:22:29 +01:00
Zbigniew Jędrzejewski-Szmek b2ac2b01c8
Merge pull request #10996 from poettering/oci-prep
Preparation for the nspawn-OCI work
2018-11-30 10:09:00 +01:00
Zbigniew Jędrzejewski-Szmek 049af8ad0c Split out part of mount-util.c into mountpoint-util.c
The idea is that anything which is related to actually manipulating mounts is
in mount-util.c, but functions for mountpoint introspection are moved to the
new file. Anything which requires libmount must be in mount-util.c.

This was supposed to be a preparation for further changes, with no functional
difference, but it results in a significant change in linkage:

$ ldd build/libnss_*.so.2
(before)
build/libnss_myhostname.so.2:
	linux-vdso.so.1 (0x00007fff77bf5000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f4bbb7b2000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007f4bbb755000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4bbb734000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f4bbb56e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4bbb8c1000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f4bbb51b000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f4bbb512000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f4bbb4e3000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f4bbb45e000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f4bbb458000)
build/libnss_mymachines.so.2:
	linux-vdso.so.1 (0x00007ffc19cc0000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fdecb74b000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fdecb744000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007fdecb6e7000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdecb6c6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fdecb500000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fdecb8a9000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fdecb4ad000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fdecb4a2000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fdecb475000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fdecb3f0000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fdecb3ea000)
build/libnss_resolve.so.2:
	linux-vdso.so.1 (0x00007ffe8ef8e000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fcf314bd000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fcf314b6000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007fcf31459000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcf31438000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fcf31272000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fcf31615000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fcf3121f000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fcf31214000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fcf311e7000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fcf31162000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fcf3115c000)
build/libnss_systemd.so.2:
	linux-vdso.so.1 (0x00007ffda6d17000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f610b83c000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f610b835000)
	libmount.so.1 => /lib64/libmount.so.1 (0x00007f610b7d8000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f610b7b7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f610b5f1000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f610b995000)
	libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f610b59e000)
	libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f610b593000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f610b566000)
	libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f610b4e1000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f610b4db000)

(after)
build/libnss_myhostname.so.2:
	linux-vdso.so.1 (0x00007fff0b5e2000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fde0c328000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde0c307000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fde0c141000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fde0c435000)
build/libnss_mymachines.so.2:
	linux-vdso.so.1 (0x00007ffdc30a7000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f06ecabb000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f06ecab4000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06eca93000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f06ec8cd000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f06ecc15000)
build/libnss_resolve.so.2:
	linux-vdso.so.1 (0x00007ffe95747000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fa56a80f000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007fa56a808000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa56a7e7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fa56a621000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa56a964000)
build/libnss_systemd.so.2:
	linux-vdso.so.1 (0x00007ffe67b51000)
	librt.so.1 => /lib64/librt.so.1 (0x00007ffb32113000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007ffb3210c000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffb320eb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007ffb31f25000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffb3226a000)

I don't quite understand what is going on here, but let's not be too picky.
2018-11-29 21:03:44 +01:00
Lennart Poettering 54b22b2643 tree-wide: port various parts of the code over to the new device_major_minor_path() calls 2018-11-29 20:21:39 +01:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Lennart Poettering aa8fbc74e3 fileio: drop "newline" parameter for env file parsers
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
2018-11-14 17:01:54 +01:00
Zbigniew Jędrzejewski-Szmek 705727fd76 shared/dissect-image: drop parens 2018-11-13 11:58:44 +01:00
Yu Watanabe fbd0aea17e dissect: do not store unused devnum 2018-10-31 09:29:51 +09:00
David Tardon 4db1879acd dissect-image: use right comparison function
fstype can be NULL here.
2018-10-12 12:38:49 +02:00
Marko Myllynen a1c111c2d1 More polite passphrase prompt
Instead of

Please enter passphrase for disk <disk-name>!

use

Please enter passphrase for disk <disk-name>:

which is more polite and matches Plymouth convention.
2018-10-09 16:26:03 +02:00
Yu Watanabe 8437c0593f tree-wide: replace device_enumerator_scan_devices()+FOREACH_DEVICE_AND_SUBSYSTEM() by FOREACH_DEVICE() 2018-09-10 16:48:37 +09:00
Yu Watanabe 7ecc4799a9
dissect: rescan devices before creating partition list (#9930)
Fixes #9924 which is caused by 3c1f2cee0a.
2018-08-24 22:34:13 +09:00
Yu Watanabe 3c1f2cee0a dissect: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Yu Watanabe fc95c359f6 tree-wide: use returned value from log_*_errno() 2018-08-07 15:48:37 +09:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00