- Parse the tags list using strv_split_newlines() which remove any
unnecessary empty string at the end of the strv.
- Use this parsed list for manager_process_barrier_fd() and every call
to manager_invoke_notify_message().
- This also allow to simplify the manager_process_barrier_fd() function.
Let's introduce an explicit line ending marker for line endings due to
pid change.
Let's also make sure we don't get confused with buffer management.
Fixes: #15654
If the previous received buffer length is almost equal to the allocated
buffer size, before this change the next read can only receive a couple
of bytes (in the worst case only 1 byte), which is not efficient.
When set, the offset specified for the vtable entry is passed to the
handler as-is, and is not added to the userdata pointer. This is useful
in case methods/properties are mixed on the same vtable, that expect to
operate relative to some object in memory and that expect pointers to
absolute memory, or that just want a number passed.
Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var
to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and
/sys/fs/cgroup since no (or very few) regular files are expected to be used.
In addition, since directories, symbolic links, device specials and xattrs are
not counted towards the size= limit, number of inodes is also limited
correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of
RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes.
Because nr_inodes option can't use ratios like size option, there's an
unfortunate side effect that with small memory systems the limit may be on the
too large side. Also, on an extremely small device with only 256MB of RAM, 10%
of RAM for /run may not be enough for re-exec of PID1 because 16MB of free
space is required.
These arguments contain UserRecord structures serialized to JSON,
however only the "secret" part of it, not a whole user record. We do
this since the secret part is conceptually part of the user record and
in some contexts we need a user record in full with both secret and
non-secret part, and in others just the secret and in other just the
non-secret part, but we want to keep this in memory in the same logic.
Hence, let's rename the arguments where we expect a user record
consisting only of the secret part to "secret".
This also makes sure the control buffer is properly aligned. This
matters, as otherwise the control buffer might not be aligned and the
cmsg buffer counting might be off. The incorrect alignment is becoming
visible by using recvmsg_safe() as we suddenly notice the MSG_CTRUNC bit
set because of this.
That said, apparently this isn't enough to make this work on all
kernels. Since I couldn't figure this out, we now add 1K to the buffer
to be sure. We do this once already, also for a pktinfo structure
(though an IPv4/IPv6) one. I am puzzled by this, but this shouldn't
matter much. it works locally just fine, except for those ubuntu CI
kernels...
While we are at it, make some other changes too, to simplify and
modernize the function.
Previously pam_systemd_home.so was relying on `PAM_PROMPT_ECHO_OFF` to
display error messages to the user and also display the next prompt.
`PAM_PROMPT_ECHO_OFF` was never meant as a way to convey information to
the user, and following the example set in pam_unix.so you can see that
it's meant to _only_ display the prompt. Details about why the
authentication failed should be done in a `PAM_ERROR_MSG` before
displaying a short prompt as per usual using `PAM_PROMPT_ECHO_OFF`.
[127/1355] Compiling C object 'src/shared/5afaae1@@systemd-shared-245@sta/ethtool-util.c.o'
../src/shared/ethtool-util.c: In function ‘ethtool_get_permanent_macaddr’:
../src/shared/ethtool-util.c:260:60: warning: array subscript 5 is outside the bounds of an interior zero-length array ‘__u8[0]’ {aka ‘unsigned char[]’} [-Wzero-length-bounds]
260 | ret->ether_addr_octet[i] = epaddr.addr.data[i];
| ~~~~~~~~~~~~~~~~^~~
In file included from ../src/shared/ethtool-util.c:5:
../src/shared/linux/ethtool.h:704:7: note: while referencing ‘data’
704 | __u8 data[0];
| ^~~~
../src/shared/ethtool-util.c: In function ‘ethtool_set_features’:
../src/shared/ethtool-util.c:488:31: warning: array subscript 0 is outside the bounds of an interior zero-length array ‘__u32[0]’ {aka ‘unsigned int[]’} [-Wzero-length-bounds]
488 | len = buffer.info.data[0];
| ~~~~~~~~~~~~~~~~^~~
In file included from ../src/shared/ethtool-util.c:5:
../src/shared/linux/ethtool.h:631:8: note: while referencing ‘data’
631 | __u32 data[0];
| ^~~~
The kernel should not define the length of the array, but it does. We can't fix
that, so let's use a cast to avoid the warning.
For https://github.com/systemd/systemd/issues/6119#issuecomment-626073743.
v2:
- use #pragma instead of a cast. It seems the cast only works in some cases, and
gcc is "smart" enough to see beyond the cast. Unfortunately clang does not support
this warning, so we need to do a config check whether to try to suppress.
Indicates that the tags list cannot be modified by notify_message function.
Since the tags list is created only once for multiple call to
notify_message functions.
kernel 5.6 added support for a new flag for getrandom(): GRND_INSECURE.
If we set it we can get some random data out of the kernel random pool,
even if it is not yet initializated. This is great for us to initialize
hash table seeds and such, where it is OK if they are crap initially. We
used RDRAND for these cases so far, but RDRAND is only available on
newer CPUs and some archs. Let's now use GRND_INSECURE for these cases
as well, which means we won't needlessly delay boot anymore even on
archs/CPUs that do not have RDRAND.
Of course we never set this flag when generating crypto keys or uuids.
Which makes it different from RDRAND for us (and is the reason I think
we should keep explicit RDRAND support in): RDRAND we don't trust enough
for crypto keys. But we do trust it enough for UUIDs.
Let's make the logic a bit smarter: if we detect that /home is
encrypted, let's avoid double encryption and prefer plain
directory/subvolumes instead of our regular luks images.
Also, allow configuration go storage/file system via an env var passed
to homework. In a later commit, let's then change homed to initialize
that env var from a config file setting, when invoking homework.
Make use of the new user_record_build_image_path() helper the previous
commit added to share some code.
Also, let's make sure we update all parsed-out fields with the new data
from the binding, so that the parsed-out fields are definitely
up-to-date.
We should return 0 only if current freezer state, as reported by the
kernel, is already the desired state. Otherwise, we would dispatch
return dbus message prematurely in bus_unit_method_freezer_generic().
Thanks to Frantisek Sumsal for reporting the issue.
As described in #15603, it is a fairly common setup to use a fqdn as the
configured hostname. But it is often convenient to use just the actual
hostname, i.e. until the first dot. This adds support in tmpfiles, sysusers,
and unit files for %l which expands to that.
Fixes#15603.
Ideally, assert_cc() would be used for this, so that it is not possible to even
compile systemd with something like '-Dfallback-hostname=.foo'. But to do a
proper check we need to call hostname_is_valid(), and we cannot depend on being
able to run code (e.g. during cross-compilation). So let's do a very superficial
check in meson, and a proper on in test-util.
This new helper checks whether the specified locale is installed. It's
distinct from locale_is_valid() which just superficially checks if a
string looks like something that could be a valid locale.
Heavily inspired by @jsynacek's #13964.
Replaces: #13964
There are two libc APIs for accessing the user database: NSS/getpwuid(),
and fgetpwent(). if we run in --root= mode (i.e. "offline" mode), let's
use the latter. Otherwise the former. This means tmpfiles can use the
database included in the root environment for chowning, which is a lot
more appropriate.
Fixes: #14806
We make this entirely independent of the regular discard field, i.e. the
one that controls discard behaviour when the home directory is online.
Not all combinations make a ridiculous amount of sense, but most do.
Specifically:
online-discard = yes, offline-discard = yes
→ Discard when activating explicitly, and during runtime using
the "discard" mount option, and discard explicitly when logging
out again.
online-discard = no, offline-discard = yes
→ The new default: when logging in allocate the full backing
store, and use no discard while active. When loging out discard
everything. This provides nice behaviour: we take minimal storage
when offline but provide allocation guarantees while online.
online-discard = no, offline-discard = no
→ Never, ever discard, always operate with fully allocated
backing store. The extra safe mode.
Let's make debugging a bit easier: when invoking homed from the build
tree it's now possible to make sure homed invokes the build tree's
homework binary by setting an env var.
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
We need to use the CMSG_SPACE() macro to size the control buffers, not
CMSG_LEN(). The former is rounded up to next alignment boundary, the
latter is not. The former should be used for allocations, the latter for
encoding how much of it is actually initialized. See cmsg(3) man page
for details about this.
Given how confusing this is, I guess we don't have to be too ashamed
here, in most cases we actually did get this right.
Apparently unpriv clients expect to be able to auth via PAM. Kinda
sucks. But it is what it is. Hence open this up.
This shouldn't be too bad in effect since clients after all need to
provide security creds for unlocking the home dir, in order to misuse
this.
Fixes: #15072