It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().
socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
In a user namespace container:
Feb 28 12:45:53 0b2420135953 systemd[1]: Starting Home Manager...
Feb 28 12:45:53 0b2420135953 systemd[21]: systemd-homed.service: Failed to set up network namespacing: Operation not permitted
Feb 28 12:45:53 0b2420135953 systemd[21]: systemd-homed.service: Failed at step NETWORK spawning /usr/lib/systemd/systemd-homed: Operation not permitted
Feb 28 12:45:53 0b2420135953 systemd[1]: systemd-homed.service: Main process exited, code=exited, status=225/NETWORK
Feb 28 12:45:53 0b2420135953 systemd[1]: systemd-homed.service: Failed with result 'exit-code'.
Feb 28 12:45:53 0b2420135953 systemd[1]: Failed to start Home Manager.
We should treat this similarly to the case where network namespace are not
supported at all.
https://bugzilla.redhat.com/show_bug.cgi?id=1807465
The man pages state that the '+' prefix in Exec* directives should
ignore filesystem namespacing options such as PrivateTmp. Now it does.
This is very similar to #8842, just with PrivateTmp instead of
PrivateDevices.
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
Let per-user service managers have user namespaces too.
For unprivileged users, user namespaces are set up much earlier
(before the mount, network, and UTS namespaces vs after) in
order to obtain capbilities in the new user namespace and enable use of
the other listed namespaces. However for privileged users (root), the
set up for the user namspace is still done at the end to avoid any
restrictions with combining namespaces inside a user namespace (see
inline comments).
Closes#10576
The function capability_ambient_set_apply() now drops capabilities not
in the capability_ambient_set(), so it is necessary to call it when
the ambient set is empty.
Fixes#13163
Previously we'd only skip ProtectHostname= if kernel support for
namespaces was lacking. With this change we also accept if unshare()
fails because it is blocked.
In some containers unshare() is made unavailable entirely. Let's deal
with this that more gracefully and disable our sandboxing of services
then, so that we work in a container, under the assumption the container
manager is then responsible for sandboxing if we can't do it ourselves.
Previously, we'd insist on sandboxing as soon as any form of BindPath=
is used. With this change we only insist on it if we have a setting like
that where source and destination differ, i.e. there's a mapping
established that actually rearranges things, and thus would result in
systematically different behaviour if skipped (as opposed to mappings
that just make stuff read-only/writable that otherwise arent').
(Let's also update a test that intended to test for this behaviour with
a more specific configuration that still triggers the behaviour with
this change in place)
Fixes: #13955
(For testing purposes unshare() can easily be blocked with
systemd-nspawn --system-call-filter=~unshare.)
"ratelimit" is a real word, so we don't need to use the other form anywhere.
We had both forms in various places, let's standarize on the shorter and more
correct one.
This way less stuff needs to be in basic. Initially, I wanted to move all the
parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because
efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the
move done in this patch is not necessary anymore. Nevertheless, it is still
valid on its own. If at some point we want to expose libbasic, it is better to
to not have stuff that belong in libshared there.
let's add [static] where it was missing so far
Drop [static] on parameters that can be NULL.
Add an assert() around parameters that have [static] and can't be NULL
hence.
Add some "const" where it was forgotten.