Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.
Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
Apparently, if IO is still in flight at the moment we invoke LOOP_CLR_FD
it is likely simply dropped (probably because yanking physical storage,
such as a USB stick would drop it too). Let's protect ourselves against
that and always sync explicitly before we invoke it.
The explicit limit is dropped, which means that we return to the kernel default
of 50% of RAM. See 362a55fc14 for a discussion why that is not as much as it
seems. It turns out various applications need more space in /dev/shm and we
would break them by imposing a low limit.
While at it, rename the define and use a single macro for various tmpfs mounts.
We don't really care what the purpose of the given tmpfs is, so it seems
reasonable to use a single macro.
This effectively reverts part of 7d85383edb. Fixes#16617.
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:
RootImageOptions=1:ro,dev 2:nosuid nodev
In absence of a partition number, 0 is assumed.
This should be enough to fix https://bugzilla.redhat.com/show_bug.cgi?id=1856514.
But the limit should be significantly higher than 10% anyway. By setting a
limit on /tmp at 10% we'll break many reasonable use cases, even though the
machine would deal fine with a much larger fraction devoted to /tmp.
(In the first version of this patch I made it 25% with the comment that
"Even 25% might be too low.". The kernel default is 50%, and we have been using
that seemingly without trouble since https://fedoraproject.org/wiki/Features/tmp-on-tmpfs.
So let's just make it 50% again.)
See 7d85383edb.
(Another consideration is that we learned from from the whole initiative with
zram in Fedora that a reasonable size for zram is 0.5-1.5 of RAM, and that pretty
much all systems benefit from having zram or zswap enabled. Thus it is reasonable
to assume that it'll become widely used. Taking the usual compression effectiveness
of 0.2 into account, machines have effective memory available of between
1.0 - 0.2*0.5 + 0.5 = 1.4 (for zram sized to 0.5 of RAM) and
1.0 - 0.2*1.5 + 1.5 = 2.2 (for zram 1.5 sized to 1.5 of RAM) times RAM size.
This means that the 10% was really like 7-4% of effective memory.)
A comment is added to mount-util.h to clarify that tmp.mount is separate.
Opening a verity device is an expensive operation. The kernelspace operations
are mostly sequential with a global lock held regardless of which device
is being opened. In userspace jumps in and out of multiple libraries are
required. When signatures are used, there's the additional cryptographic
checks.
We know when two devices are identical: they have the same root hash.
If libcrypsetup returns EEXIST, double check that the hashes are really
the same, and that either both or none have a signature, and if everything
matches simply remount the already open device. The kernel will do
reference counting for us.
In order to quickly and reliably discover if a device is already open,
change the node naming scheme from '/dev/mapper/major:minor-verity' to
'/dev/mapper/$roothash-verity'.
Unfortunately libdevmapper is not 100% reliable, so in some case it
will say that the device already exists and it is active, but in
reality it is not usable. Fallback to an individually-activated
unique device name in those cases for robustness.
This changes the code to allow looking at multiple files with different
prefixes, but uses "/etc" and "/usr/lib". rpm-ostree uses
/usr/lib/{passwd,group} with nss-altfiles. I see no harm in simply trying both
paths on all systems.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1857530.
A minor memory leak is fixed: hashmap_put() returns -EEXIST is the key is
present *and* and the value is different. It return 0 if the value is the
same. Thus, we would leak the user/group name if it was specified multiple
times with the same uid/gid. I opted to remove the warning message completely:
with multiple files it is reasonable to have the same name defined more than
once. But even with one file the warning is dubious: all tools that read those
files deal correctly with duplicate entries and we are not writing a linter.
The test binary has two modes: in the default argument-less mode, it
just checks that "root" can be resolved. When invoked manually, a root
prefix and user/group names can be specified.
let's separate things out a bit, to make it easier to discern log output
and catalog data.
catalog data is now colored green (which is a color we don't use for log
data currently), and prefixed with a block shade.
sssd people don't like enumeration and for some other cases it's not
nice to support either, in particular when synthesizing records for
container/userns UID/GID ranges.
Hence, let's make enumeration optional.
Currently systemd-user-runtime-dir does not create the files in
/run/user/$UID/systemd/inaccessible with the default SELinux label.
The user and role part of these labels should be based on the user
related to $UID and not based on the process context of
systemd-user-runtime-dir.
Since v246-rc1 (9664be199a) /run/user/$UID/systemd is also created by
systemd-user-runtime-dir and should also be created with the default
SELinux context.
The call would always fail with:
systemd-userwork[780]: Failed to dlopen(libnss_systemd.so.2), ignoring: /usr/lib64libnss_systemd.so.2: cannot open shared object file: No such file or directory
Add support for creating a MACVLAN interface in "source" mode by
specifying Mode=source in the [MACVLAN] section of a .netdev file.
A list of allowed MAC addresses for the corresponding MACVLAN can also
be specified with the SourceMACAddress= option of the [MACVLAN] section.
An example .netdev file:
[NetDev]
Name=macvlan0
Kind=macvlan
MACAddress=02:DE:AD:BE:EF:00
[MACVLAN]
Mode=source
SourceMACAddress=02:AB:AB:AB:AB:01 02:CD:CD:CD:CD:01
SourceMACAddress=02:EF:EF:EF:EF:01
The same keys can also be specified in [MACVTAP] for MACVTAP kinds of
interfaces, with the same semantics.
This partially undoes the parent commit. We follow the symlink and
if it appears to be a symlink to /dev/null, even if /dev/null is not
present, we treat it as such. The addition of creation of /dev/null
in the test is reverted.