This usually happens for device units with long
path in /sys. But users can't even create such drop-ins,
so lets just ignore the error here.
Fixes#6867
In this way, individual errors in files can be treated differently than a
failure of the whole service.
A test is added to check that the expected value is returned.
Some parts are commented out, because it is not. This will be fixed in
a subsequent commit.
Apparently, the kernel returns EINVAL on NFS4 sometimes, even if we do
everything right, let's fallback in that case and find a different
approach to determine if something's a mount point.
See discussion at:
https://github.com/systemd/systemd/issues/7082#issuecomment-348001289
Let's not use local process data for remote processes, that can only
show nonsense.
Maybe one day we should add a bus API to query the comm field of a
process remotely, but for now, let's not bother, the information is
redundant anyway, as the cgroup data shows it too (and the cgroup tree
is show as part of status as well, and is requested from remote through
dbus, without local kernel calls).
Fixes: #7516
Now the function returns an empty string when given an empty string.
Not sure if this is the best option (maybe this should be an error?),
but at least the behaviour is well defined.
We let the caller make the decision. Existing callers are OK with treating an
ambiguous result the same as no content, but makefs and growfs should refuse such
partitions.
I opted to completely generate a unit for both mount points and swaps. For
swaps, it would be possible to use fixed template unit like systemd-mkswap@.service,
because there's no information passed except the device name. For mount points,
that's not possible because both the device name and file system type need to
be passed. Nevertheless, I expect that options will need to passed to both mkfs
and mkswap, in which case it'll be necessary to create units of both types
anyway.
Also do not include libcryptsetup.h directly, but only through crypt-util.h.
This way we do not have to repeat the define in every file where it is used.
The kernel will reply with -ENOTDIR when we try to access a non-directory under
a name which ends with a slash. But our functions would strip the trailing slash
under various circumstances. Keep the trailing slash, so that
path_is_mount_point("/path/to/file/") return -ENOTDIR when /path/to/file/ is a file.
Tests are added for this change in behaviour.
Also, when called with a trailing slash, path_is_mount_point() would get
"" from basename(), and call name_to_handle_at(3, "", ...), and always
return -ENOENT. Now it'll return -ENOTDIR if the mount point is a file, and
true if it is a directory and a mount point.
v2:
- use strip_trailing_chars()
v3:
- instead of stripping trailing chars(), do the opposite — preserve them.
RequiredForOnline= denotes a link/network that does/does not require being up
for systemd-networkd-wait-online to consider the system online; this makes it
possible to ignore devices without modifying parameters to wait-online.
When using SELinux with legacy cgroups the tmpfs on /sys/fs/cgroup is by
default labelled as tmpfs_t. This label is also inherited by the "cpu"
and "cpuacct" symbolic links. Unfortunately the policy expects them to
be labelled as cgroup_t, which is used for all the actual cgroup
filesystems. Failure to do so results in a stream of denials.
This state cannot be fixed reliably when the cgroup filesystem structure
is set-up as the SELinux policy is not yet loaded at this
moment. It also cannot be fixed later as the root of the cgroup
filesystem is remounted read-only. In order to fix it the root of the
cgroup filesystem needs to be temporary remounted read-write, relabelled
and remounted back read-only.
This makes sure we migrate /var/lib/<foo> if it exists to
/var/lib/private/<foo> if DynamicUser=1 is set. This is useful to allow
turning on DynamicUser= on services that previously didn't use it, and
we can deal with this, and migrate the relevant directories as
necessary.
Note that "downgrading" from DynamicUser=1 backto DynamicUser=0 works
too. However in that case we simply continue to use
/var/lib/private/<foo>, which works because /var/lib/<foo> is a symlink
there after all.
Let's always leave logging to the call that actually added the fields to
the bus message. This way we don't get duplicate logging whenver
bus_append_unit_property_assignment() ends up being called, which does
all its logging on its own (and probably should do, as it can output
much more precise errors).
IN_SET only works for constant values, hence clarify that. Moreover, we
declared a statement "s" we never made use of. Drop it.
Also, for both scripts, let's support 10 items. More causes spatch to
die with "Stack overflow" for me.
This changes the unit search path logic to never drop the transient and
control directories from the unit search path. This is necessary as we
add new entries to both during runtime, due to the "systemctl
set-property" and transient unit logic.
Previously, the "transient" directory was created during early boot to
deal with this, but the "control" directories were not covered like
that. Creating the control directories early at boot is not possible
however, as /etc might be read-only then, and we do define a persistent
control directory. Hence, let's create these dirs on-demand when we need
them, and make sure the search path clean-up logic never drops them from
the search path even if they are initially missing.
(Also, always create these paths properly labelled)
We shouldn't rely on C's incremental assignment of values of enums for
bit fields. That'll work only between the first two flags, but for
everything following will break horrible. Hence, let's avoid any
ambiguity here, and let's clearly define the flags as shifts of 1.
This message is displayed either when the unit file itself is newer than
what is loaded, but also when any of the drop-ins is newer. Say so in
the message, in order not to confuse the user unnecessarily.
Let's make sure that when we return a D-Bus error, we return a native
one, if we generate it ourselves, and use errno-based error
synthetization only if we received an errno ourselves. Yes, this makes
things slightly longer, but is highly misleading as we propagate D-Bus
errors, and not errnos to the client.
This majorly refactors the transient unit file and drop-in writing
logic, so that we properly C-escape and specifier-escape (% → %%)
everything we write out, so that when we read it back again, specifiers
are parsed that aren't supposed to be parsed.
This renames unit_write_drop_in() and friends by unit_write_setting().
The name change is supposed to clarify that the functions are not only
used to write drop-in files, but also transient unit files.
The previous "mode" parameter to this function is replaced by a more
generic "flags", which knows additional flags for implicit C-style and
specifier escaping before writing things out. This can cover most
properties where either form of escaping is defined. For the cases where
this isn't sufficient, we add helpers unit_escape_setting() and
unit_concat_strv() for escaping individual strings or strvs properly.
While we are at it, we also prettify generation of transient unit files:
we try to reduce the number of section headers written out: previously
we'd write the right section header our for each setting. With this
change we do so only if the setting lives in a different section than
the one before.
(This should also be considered preparation for when we add proper APIs
to systemd to write normal, persistant unit files through the bus API)
Let's always escape strings we receive from the user before writing them
out to unit file settings that suppor specifier expansion, so that user
strings are transported as-is.
Using specifiers in these settings isn't particularly useful by itself,
but it unifies behaviour a bit. It's kinda surprising that What= in
mount units resolves specifies, but Where= does not. Hence let's add
that too. Also, it's surprising Where=/What= in mount units behaves
differently than in automount and swap units, hence resolve specifiers
there too. Then, Type= in mount units is nowadays an arbitrary,
sometimes non-trivial string (think fuse!), hence let's also expand
specifiers there, to match the rest of the mount settings.
This has the benefit that when writing code that generates unit files,
less care has to be taken to check whether escaping of specifiers is
necessary or not: broadly everything that takes arbitrary user strings
now does specifier expansion, while enums/numerics/booleans do not.
We process C-style escapes in Environment=, hence we should process it
in UnsetEnvironment= too, as the latter accepts assignments much like
the former, including arbitrary values specified by the user.
All other cases where we accept a reboot argument are decoded with
config_parse_unit_string_printf() rather than
config_parse_unit_path_printf(), and that's really the only thing what
makes sense here, hence adjust this here, too.
The code in install-printf.c and unit-printf.c for these is pretty much
the same and very generic. Let's move this all over to the generic
specifier.c, and share the implementations.
Specifier expansion (much like C escape handling) should be a helper for
writing unit files, but should be nothing we do on programatic APIs. For
those, the client can do the necessary replacements anyway, and we
really should be careful with doing such string processing of data we
get via lower level programmatic APIs.
We currently do specifier expansion only for the env var transient unit
APIs, no other properties do this. Let's remove it here too, to be fully
systematic.
Yes, in a way this is API breakage, but then again this API isn't
documented yet, and an outlier, hence let's clear this up now, before it
is too late.
N_IOVEC_OBJECT_FIELDS is bumped 14 → 18 (see dispatch_message_real() and
count!)
N_IOVEC_PAYLOAD_FIELDS is bumped 15 → 16 (see
server_space_usage_message() and count!)
Also, add comments, to make clear what is what.
Coverity says that's undefined. I'm pretty sure we always would get a nan, but
let's avoid (formally) undefined behaviour since that can cause compilers to do
strange things.
read_one_line_file() always returns <= 0, so the code was OK, but let's write
the check a bit differently to make it obvious that min_max is always set.
The comment above makes the intent of the code pretty clear:
"use security2_protocol == NULL as indicator".
So revert the condition in the check and fix the logic in the comment while
at it.
The question is how this could have ever worked: if BS->LocateProtocol
(which is supposedly optional) ever failed, we'd crash here. Strange.
Found by coverity.
m->encrypted is set when fstype=="crypto_LUKS", but this is not obvious when
reading decrypt_partition(). Just check if passphrase is set before using
it.
This fixes the (mostly theoretical, since we're only parsing data that we write
ourselves) memleak when iif or oif is deserialized multiple times. Unfortunately
it does not fix the memleak when rule is freed, but that'll require a bigger
effort.
Let's just say that the code wasn't fully functional ;(
Since we only had the parser for serialization, and not the writer, we are
free to change the format. So while at it, let's use shorter names in the
serialization format that match the surrounding style.
If test-resolve is running in some CI environment that doesn't have
reliably working DNS, it's a good idea not to hang forever, let's
time-out after 20s
Let's always use the same logic when parsing error numbers, i.e. use
parse_errno() here too, to unify some code, and tighten the checks a
bit.
This also allows clients to pass errors as symbolic names. Probably
nothing we want to advertise too eagerly (since new daemons generating
this on old service managers won't understand), but still pretty
useful I think, in particular in scripting languages and such, where the
numeric error numbers might not be readily available.
Currenly the only way to remove fds from the fdstore is to fully
stop the service, or to somehow trigger POLLERR/POLLHUP on the fd, in
which case systemd will remove the fd automatically.
Let's add another way: a new message that can be sent to remove fds
explicitly, given their name.
Of course, it's not really a valid sd_notify() message if multiple of
these fields are used in one, but let's handle this somewhat gracefully,
by only processing one of them, and ignoring the rest.
Let's optimize things a bit for the non-debug case. No change in
behaviour.
Main reason to do this is not so much the speed benefit though, but
merely to isolate the code from its surroundings more.
Some devices get reset itself while setting the MTU. we get in to a LOOP .
Once the MTU changed then the DHCP client talking with DHCP server never stops.
networkd gets into a loop and generates endless DHCP requests.
fixes#6593fixes#7380
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
Let's move the cgroup empty check for all unit types into the generic
unit_check_gc() call, out of the per-unit-type _check_gc() type. This
not only allows us to share some code, but also hooks up mount and
socket units with this kind of check, for free, as it was missing there
previously.
Previously, in the service unit type we ran all control processes in a
special subcgroup /control of the unit's main cgroup. Remove that, and
run the control program in the main cgroup instead.
The concept conflicts with cgroupv2's logic of "no processes in inner
nodes": if a unit has a main daemon process running in the main cgroup,
and a reload control process would be started in the /control subcgroup,
then this would necessarily fail, as the main daemon process would
become an inner node process that way.
We could in theory continue to support this in cgroupv1, but in the
interest in keeping behaviour similar in both hierarchies, let's drop
this altogether.
Philosophically maybe it wasn't the greatest idea anyway to just go
berserk and SIGKILL all those processes — loud warning logging might
have sufficed, too.
Before this patch, the bpf cgroup realization state was implicitly set
to "NO", meaning that the bpf configuration was realized but was turned
off. That means invalidation requests for the bpf stuff (which we issue
in blanket fashion when doing a daemon reload) would actually later
result in a us re-realizing the unit, under the assumption it was
already realized once, even though in reality it never was realized
before.
This had the effect that after each daemon-reload we'd end up realizing
*all* defined units, even the unloaded ones, populating cgroupfs with
lots of unneeded empty cgroups.
With this fix we properly set the realiazation state to "INVALIDATED",
i.e. indicating the bpf stuff was never set up for the unit, and hence
when we try to invalidate it later we won't do anything.
We add units to the cgroup realization queue when propagating realizing
requests to sibling units, and when invalidating cgroup settings because
some cgroup setting changed. In the time between where we add the unit
to the queue until the cgroup is actually dispatched the unit's state
might have changed however, so that the unit doesn't actually need to be
realized anymore, for example because the unit went down. To handle
that, check the unit state again, if realization makes sense.
Redundant realization is usually not a problem, except when the unit is
not actually running, hence check exactly for that.
We never use these functions seperately, hence don't bother splitting
them into to.
Also, simplify things a bit, and maintain tables for the attribute files
to chown. Let's also update those tables a bit, and include thenew
"cgroup.threads" file in it, that needs to be delegated too, according
to the documentation.
Some filesystems do not set d_type value when
readdir is called, so entry type is unknown.
Therefore check if accessing entry does not
return ELOOP error.
The DHCP code in systemd-networkd relies on the
`net.ipv4.conf.{default,all,<if>}.promote_secondaries` sysctl to be set
(the kernels default is that it is unset). If this sysctl is not set
DHCP will work most of the time, however when the IP address changes
between leases then the system will loose its IP.
Because some distributions decided to not ship these defaults (Debian
is an example and via downstream Ubuntu) networkd by default will now
enable this sysctl opton automatically.
When "-U" is used we look for a UID range we can use for our container.
We start with the UID the tree is already assigned to, and if that
didn't work we'd pick random ranges so far. With this change we'll first
try to hash a suitable range from the container name, and use that if it
works, in order to make UID assignments more likely to be stable.
This follows a similar logic PID 1 follows when using DynamicUser=1.
We should not only ignore "-t" itself, but also whatever is passed to
it.
This pretty much reverts the core of
a4420f7b8e, and adds back in the status
quo ante. What a difference a ':' can make.
This also adds a quick comment for this, so that we don't make this
mistake again.
Fixes: #7413