With this change almost all log messages that are suppressed through
--quiet are not actually suppressed anymore, but simply downgraded to
LOG_DEBUG. Previously we did it this way for some log messages and fully
suppressed them for others. With this it's pretty much systematic.
Inspired by #10122.
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.
Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.
Closes#8658
Start with route set to NULL should there be no route created. Remove
the explicit route_free as the _cleanup_ will take care of that after
the continue;.
This is an implementation that covers making errors encountered when writing
file content optionally fatal. If this is something that folks would want I'll
add handling of this for all the other directives. I'd appreciate suggestions
on how this might better be structured as well (use of a goto fail or such) as
I'm not super happy with the approach.
Let's change utf16_to_utf8() prototype to refer to utf16 chars with char16_t rather than void
Let's not cast away a "const" needlessly.
Let's add a few comments.
Let's fix the calculations of the buffer size to allocate, and how long
to run the loop in case of uneven byte numbers
Let's fix an indentation issue.
Let's avoid yoda comparisons.
Let's drop unnecessary ().
Let's make sure we convert 16bit values to 32bit before shifting them by
10bit to the left, to avoid overflows.
Let's avoid comparisons between signed literals and unsigned variables,
in particular if the literals are outside of the minimum range C
requires for "int".
Let's avoid a few casts in the function. Also, let's drop the "const"
when returning the string, for similar reasons as strchr() and friends
drop it: so that we don't add a const if the user passes in a non-const
string.
Yes, there are still a lot of users of bzip2, but it's fallen out of
favour after LZMA/xz, which can compress a lot more and often
decompresses faster than bzip2 too.
We use strtoul() which returns an "unsigned long", but then assign this
to int or unsigned in, i.e. drop 32bit silently on 64bit systems. Let's
clean this up a bit, and retain the right types.
This stuff is likely to fail in many setups (for example when quota is
not supported by the btrfs version), hence only log at debug
level. Previously we'd silently ignore things altogether which makes
things pretty hard to debug.
This changes the output a bit, as the previous multi-line output of each
inhibitor is changed to a single line, but it does unify the output look
with the one of our other tools. Moreover this adds proper sorting.
When we parse an "u" from an sd_bus_message then we need to do that into
a uint32_t, not a pid_t or uid_t, even if this is likely the same.
Also, let's count objects we keep in memory as size_t as usual.
In seccomp code, the code is changed to propagate errors which are about
anything other than unknown/unimplemented syscalls. I *think* such errors
should not happen in normal usage, but so far we would summarilly ignore all
errors, so that part is uncertain. If it turns out that other errors occur and
should be ignored, this should be added later.
In nspawn, we would count the number of added filters, but didn't use this for
anything. Drop that part.
The comments suggested that seccomp_add_syscall_filter_item() returned negative
if the syscall is unknown, but this wasn't true: it returns 0.
The error at this point can only be if the syscall was known but couldn't be
added. If the error comes from our internal whitelist in nspawn, treat this as
error, because it means that our internal table is wrong. If the error comes
from user arguments, warn and ignore. (If some syscall is not known at current
architecture, it is still silently ignored.)
Our logs are full of:
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain
...
This is pointless and makes debug logs hard to read. Let's keep the logs
in test code, but disable it in nspawn and pid1. This is done through a function
parameter because those functions operate recursively and it's not possible to
make the caller to log meaningfully.
There should be no functional change, except the skipped debug logs.
"because blacklisted" is not grammatical, it should be something like
"becuase it is blacklisted", which in turn in too verbose. Let's use
a shorter form.
This effectively reverts 2bc54be485
and relevant changes in #9920, as it is used to determine the version
of udev, e.g., dracut.
Fixesdracutdevs/dracut#468.
This reverts commit d4e9e574ea.
(systemd.conf.m4 part was already reverted in 5b5d82615011b9827466b7cd5756da35627a1608.)
Together those reverts should "fix" #10025 and #10011. ("fix" is in quotes
because this doesn't really fix the underlying issue, which is combining
DynamicUser= with strict container sandbox, but it avoids the problem by not
using that feature in our default installation.)
Dynamic users don't work well if the service requires matching configuration in
other places, for example dbus policy. This is true for those three services.
In effect, distros create the user statically [1, 2]. Dynamic users make more
sense for "add-on" services where not creating the user, or more precisely,
creating the user lazily, can save resources. For "basic" services, if we are
going to create the user on package installation anyway, setting DynamicUser=
just creates unneeded confusion. The only case where it is actually used is
when somebody forgets to do system configuration. But it's better to have the
service fail cleanly in this case too. If we want to turn on some side-effect
of DynamicUser=yes for those services, we should just do that directly through
fine-grained options. By not using DynamicUser= we also avoid the need to
restart dbus.
[1] bd9bf30727
[2] 48ac1cebde/f/systemd.spec (_473)
(Fedora does not create systemd-timesync user.)
In order to shut down networkd properly, the delegated routes added
need to be removed properly, and as error reporting is wanted, the
network link is needed in the debug output.
Solve this by calling manager_dhcp6_prefix_remove_all(), which will
remove each prefix stored in the Manager structure, and while doing
that reference each link so that it isn't freed before the route
removal callback is called. This in turn causes the network link to
be referenced once more, and an explicit hashmap_remove() must be
called to remove the network link from the m->links hashmap.
Also, since the registered callback is not called when the DHCPv6
client is stopped with sd_dhcp6_client_stop(), an explicit call
to dhcp6_lease_pd_prefix_lost() needs to be made to clean up any
unreachable routes set up for the delegated prefixes.
Instead of setting many small unreachable routes for each of the /64
subnets that were not distributed between the links requesting delegated
prefixes, set one unreachable route for the size of the delegated
prefix. Each subnet asssigned to a downstream link will add a routable
subnet for that link, and as the subnet assigned to the downstream link
has a longer prefix than the whole delegated prefix, the downstream
link subnet routes are preferred over the unroutable delegated one.
The unreachable route is not added when the delegated prefix is exactly
a /64 as the prefix size cannot be used to sort out the order of routing
into a bigger blocking subnet with the smaller /64 punching routable
"holes" into it.
When stopping the DHCPv6 client, the unroutable delegated prefix is
removed before the downstream link prefixes are unassigned.
Since we now have the possibility to request prefixes to be delegated
without corresponding IPv6 addresses, it does not make sense to store
lease T1 and T2 timeouts in the otherwise unused IA_NA structure.
Therefore lease timeouts T1 and T2 are moved to the DHCPv6 client
structure, as there will be only one set of stateful timeouts required
by RFC 7550, Section 4.3.
Select T1 and T2 timeouts based on whether addresses or prefixes were
requested and what the server offered. The address and prefix timeouts
values have been computed earlier when the relevant DHCPv6 options were
parsed.
Add function to fetch the IAID for the delegated IA_PD prefix. In
order to keep things simple in the implemntation, the same IAID
is used with IA_NA addresses and IA_PD prefixes. But the DHCPv6
server can choose to return only IA_PD prefixes, and the client
can nowadays omit requesting of IA_NA addresses. Since the function
fetching said IAID from the lease looks only for IA_NA ones, it
will return an empty IAID, which of course does not match the one
set for prefixes.
Fix this by adding a function returning the IAID for the prefix.
RFC 7084, WPD-4, requires Customer Edge end routers to behave
according to the following:
"WPD-4: By default, the IPv6 CE router MUST initiate DHCPv6 prefix
delegation when either the M or O flags are set to 1 in a
received Router Advertisement (RA) message. Behavior of the
CE router to use DHCPv6 prefix delegation when the CE router
has not received any RA or received an RA with the M and the
O bits set to zero is out of scope for this document."
Since it cannot be automatically detected whether DHCPv6 is to be
operated as an CE end router or whether to initiate an Informational
exchange to obtain other useful network information via DHCPv6 when the
Router Advertisement 'O' bit is set, a 'ForceDHCPv6PDOtherInformation'
boolean network configuration option in the '[DHCP]' section of a is
introduced. Setting this option causes DHCPv6 to be started in stateful
mode, although only the 'O' bit is seen in the Router Advertisement.
When 'ForceDHCPv6PDOtherInformation' is set and the Router Advertisement
has only the Other information 'O' bit set, disable requests for IA_NA
addresses.
Fixes#9745.
Add function to enable/disable IA_NA address requests. Internally
handle the request as a bit mask and add IA_PD prefix delegation
to the same bit mask instead of having a separate boolean. Thus
the calling code can set requests for prefix and address delegation
separately. This is handy when supporting RFC 7084.
Add a check in the code that at least something is requested from
the server in Managed mode. By default request IA_NA addresses from
the DHCPv6 server. Although a value has been defined for IA_TA,
temporay IA_TA addresses are not yet requested.
Add helper function for fetching enabled/disabled state of Prefix
Delegation for a DHCPv6 client. Update function setting prefix
delegation to use an int instead of a boolean.
Both 'systemd-hwdb update' and 'udevadm hwdb --update' creates hwdb
database. The database created by systemd-hwdb containes additional
information such that priority, line number, and source filename.
The unified function 'hwdb_update()' can take a flag 'compat' which
controls the format version of created database.
Several functions are shared by qsort and hash_ops or Prioq.
This makes these functions explicitly specify argument type,
and cast to __compar_fn_t where necessary.
In older kernels IPoIB network devices expose the port number via
the sysfs attribute 'dev_id', which is not intended to be used this way.
Let's support both options for a while.
An InfiniBand network address is 20 bytes long. Only the least
significant 8 bytes can be interpreted as a persistent hardware unit
identifier; the other 12 are transiently derived at runtime from metadata
specific to the protocol stack.
However, since the network interface name length is hard-capped by
IFNAMSIZ at 16 chars and the 2-byte type prefix with '\0' at the end
leave us only at 13, we cannot squeeze a descriptive representation of a
HW address into an interface name. Thus, it makes the most sense to drop
the scheme for IPoIB interfaces entirely.
Currently udev just gets confused and does what it has been taught
to do: fetches the first six bytes and puts them into a permanent
device attribute.
We've long neglected IP-over-InfiniBand network interfaces, let's treat
them the same way we treat anyone else.
IPoIB interfaces will retain the 'ib' prefix; otherwise the naming scheme
is the same one we use for other network interfaces. E.g. a IPoIB network
device provided by a PCI card at bus 21 slot 0 function 6 will be named
'ibp21s0f6'.
Quoting https://github.com/systemd/systemd/issues/10074:
> detect_vm_uml() reads /proc/cpuinfo with read_full_file()
> read_full_file() has a file max limit size of READ_FULL_BYTES_MAX=(4U*1024U*1024U)
> Unfortunately, the size of my /proc/cpuinfo is bigger, approximately:
> echo $(( 4* $(cat /proc/cpuinfo | wc -c)))
> 9918072
> This causes read_full_file() to fail and the Condition test fallout.
Let's just read line by line until we find an intersting line. This also
helps if not running under UML, because we avoid reading as much data.
optind may be used in each verb, e.g., udevadm. So, let's initialize
optind before calling verbs.
Without this, e.g., udevadm -d hwdb --update causes error in parsing arguments.
Since the commit torvalds/linux@fdb5c4531c
the syscall BPF_PROG_ATTACH return EBADF when CONFIG_CGROUP_BPF is
turned off and as result the bpf_firewall_supported() returns the
incorrect value.
This commmit replaces the syscall BPF_PROG_ATTACH with BPF_PROG_DETACH
which is still work as expected.
Resolvesopenbmc/linux#159
See also systemd/systemd#7054
Signed-off-by: Alexander Filippov <a.filippov@yadro.com>
Without this, log shows meaningless error message 'No anode', e.g.,
===
Failed to unshare the mount namespace: Operation not permitted
foo.service: Failed to set up mount namespacing: No anode
foo.service: Failed at step NAMESPACE spawning /usr/bin/test: No anode
===
Follow-up for 1beab8b0d0.
When loading units, sometimes we'd first encounter a unit from .wants or
.requires directory. A typical case would be when multi-user.target.wants/
contains a symlink to some unit. We would prepare to load this unit using
/etc/systemd/system/multi-user.target.wants/foo.service as the fragment
path. This is always wrong. Instead, let's use NULL as the path and let
manager_load_unit() figure out the path on its own.
Fixes#9921.
path=0x5625ed9b01a0 "/usr/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service", e=0x0,
_ret=0x7ffe64645000) at ../src/core/manager.c:1887
name=0x5625ed9b01ce "systemd-remount-fs.service",
path=0x5625ed9b01a0 "/usr/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service", e=0x0,
_ret=0x7ffe64645000) at ../src/core/manager.c:1961
name=0x5625ed9b01ce "systemd-remount-fs.service",
path=0x5625ed9b01a0 "/usr/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service",
add_reference=true, mask=UNIT_DEPENDENCY_FILE) at ../src/core/unit.c:2946
dir_suffix=0x5625ebb179ed ".wants") at ../src/core/load-dropin.c:95
path=0x0, e=0x0, _ret=0x7ffe646452c0) at ../src/core/manager.c:1965
name=0x5625ebb186f8 "local-fs.target", path=0x0, add_reference=true,
mask=UNIT_DEPENDENCY_MOUNTINFO_IMPLICIT) at ../src/core/unit.c:2946
where=0x5625ed9b3cc0 "/tmp", options=0x5625ed947110 "rw,nosuid,nodev,seclabel",
fstype=0x5625ed95be90 "tmpfs", flags=0x7ffe64645395) at ../src/core/mount.c:1439
where=0x5625ed9b3cc0 "/tmp", options=0x5625ed947110 "rw,nosuid,nodev,seclabel",
fstype=0x5625ed95be90 "tmpfs", set_flags=false) at ../src/core/mount.c:1567
at ../src/core/mount.c:1635
ret_retval=0x7ffe64645660, ret_shutdown_verb=0x7ffe646456c0, ret_fds=0x7ffe646456d8,
ret_switch_root_dir=0x7ffe646456b0, ret_switch_root_init=0x7ffe646456b8,
ret_error_message=0x7ffe646456c8) at ../src/core/main.c:1669
Both SO_SNDBUFFORCE and SO_RCVBUFFORCE requires capability 'net_admin'.
If this capability is not granted to the service the first attempt to increase
the recv/snd buffers (via sd_notify()) with SO_RCVBUFFORCE/SO_SNDBUFFORCE will
fail, even if the requested size is lower than the limit enforced by the
kernel.
If apparmor is used, the DENIED logs for net_admin will show up. These log
entries are seen as red warning light, because they could indicate that a
program has been hacked and tries to compromise the system.
It would be nicer if they can be avoided without giving services (relying on
sd_notify) net_admin capability or dropping DENIED logs for all such services
via their apparmor profile.
I'm not sure if sd_notify really needs to forcibly increase the buffer sizes,
but at least if the requested size is below the kernel limit, the capability
(hence the log entries) should be avoided.
Hence let's first ask politely for increasing the buffers and only if it fails
then ignore the kernel limit if we have sufficient privileges.
When $XDG_DATA_DIRS is unset, then, the default value
'/usr/local/share:/usr/share' is used.
When $XDG_DATA_DIRS contain the default paths but the order
is inverted: '/usr/share:/usr/local/share', then test-path-lookup fails.
Fixes#10002.
Back in 08318a2c5a, value "false" was enabled for
'-Dtests=', but various tests were not conditionalized properly. So even with
-Dtests=false -Dslow-tests=false we'd run 120 tests. Let's make this consistent.
This makes it so that tests no longer need to know the absolute paths to the
source and build dirs, instead using the systemd-runtest.env file to get these
paths when running from the build tree.
Confirmed that test-catalog works on `ninja test`, when called standalone and
also when the environment file is not present, in which case it will use the
installed location under /usr/lib/systemd/catalog.
The location can now also be overridden for this test by setting the
$SYSTEMD_CATALOG_DIR environment variable.
This simplifies get_testdata_dir() to simply checking for an environment
variable, with an additional function to locate a systemd-runtest.env file in
the same directory as the test binary and reading environment variable
assignments from that file if it exists.
This makes it possible to:
- Run `ninja test` from the build dir and have it use ${srcdir}/test for
test unit definitions.
- Run a test directly, such as `build/test-execute` and have it locate
them correctly.
- Run installed tests (from systemd-tests package) and locate the test
units in the installed location (/usr/lib/systemd/tests/testdata), in
which case the absence of the systemd-runtest.env file will have
get_testdata_dir() use the installed location hardcoded into the
binaries.
Explicit setting of $SYSTEMD_TEST_DATA still overrides the contents of
systemd-runtest.env.
Actually check the return code from logind_schedule_shutdown() and proceed to
immediate shutdown if that fails. Negative return codes can be returned if
systemctl is compiled without logind support, or if logind otherwise failed
(either too old, disabled/masked, or it is incomplete
systemd-shim/systemd-service implementation).
An assertion in dhcp_network_bind_raw_socket() is triggered when
starting an sd_dhcp_client without setting a MAC address first.
- sd_dhcp_client_start()
- client_start()
- client_start_delayed()
- dhcp_network_bind_raw_socket()
In that case, the arp-type and MAC address is still unset. Note that
dhcp_network_bind_raw_socket() already checks for a valid arp-type
and MAC address below, so we should just gracefully return -EINVAL.
Maybe sd_dhcp_client_start() should fail earlier when starting without
MAC address. But the failure here will be correctly propagated and
the start aborted.
Fixes: 76253e73f9
When a network namespace is needed, /sys is mounted as tmpfs (see commit
d8fc6a000f for details).
But in this case mode 755 was used as initial permissions for /sys whereas the
default mode for sysfs is 555.
In practice using 755 doesn't have any impact because /sys is mounted read-only
too but for consistency, let's use the correct mode.
Fixes: #10050
If more than one errno is specified for a syscall in SystemCallFilter=,
use the last one instead of reporting an error. This is especially
useful when used with system call sets:
SystemCallFilter=@privileged:EPERM @reboot
This will block any system call requiring super-user capabilities with
EPERM, except for attempts to reboot the system, which will immediately
terminate the process. (@reboot is included in @privileged.)
This also effectively fixes#9939, since specifying different errnos for
“the same syscall” (same pseudo syscall number) is no longer an error.
Dracut has a support for unlocking encrypted drives with keyfile stored
on the external drive. This support is included in the generated initrd
only if systemd module is not included.
When systemd is used in initrd then attachment of encrypted drives is
handled by systemd-cryptsetup tools. Our generator has support for
keyfile, however, it didn't support keyfile on the external block
device (keydev).
This commit introduces basic keydev support. Keydev can be specified per
luks.uuid on the kernel command line. Keydev is automatically mounted
during boot and we look for keyfile in the keydev
mountpoint (i.e. keyfile path is prefixed with the keydev mount point
path). After crypt device is attached we automatically unmount
where keyfile resides.
Example:
rd.luks.key=70bc876b-f627-4038-9049-3080d79d2165=/key:LABEL=KEYDEV
According to RFC2616[1], HTTP header names are case-insensitive. So
it's totally valid to have a header starting with either `Date:` or
`date:`.
However, when systemd-importd pulls an image from an HTTP server, it
parses HTTP headers by comparing header names as-is, without any
conversion. That causes failures when some HTTP servers return headers
with different combinations of upper-/lower-cases.
An example:
https://alpha.release.flatcar-linux.net/amd64-usr/current/flatcar_developer_container.bin.bz2 returns `Etag: "pe89so9oir60"`,
while https://alpha.release.core-os.net/amd64-usr/current/coreos_developer_container.bin.bz2
returns `ETag: "f03372edea9a1e7232e282c346099857"`.
Since systemd-importd expects to see `ETag`, the etag for the Container Linux image
is correctly interpreted as a part of the hidden file name.
However, it cannot parse etag for Flatcar Linux, so the etag the Flatcar Linux image
is not appended to the hidden file name.
```
$ sudo ls -al /var/lib/machines/
-r--r--r-- 1 root root 3303014400 Aug 21 20:07 '.raw-https:\x2f\x2falpha\x2erelease\x2ecore-os\x2enet\x2famd64-usr\x2fcurrent\x2fcoreos_developer_container\x2ebin\x2ebz2.\x22f03372edea9a1e7232e282c346099857\x22.raw'
-r--r--r-- 1 root root 3303014400 Aug 17 06:15 '.raw-https:\x2f\x2falpha\x2erelease\x2eflatcar-linux\x2enet\x2famd64-usr\x2fcurrent\x2fflatcar_developer_container\x2ebin\x2ebz2.raw'
```
As a result, when the Flatcar image is removed and downloaded again,
systemd-importd is not able to determine if the file has been already
downloaded, so it always download it again. Then it fails to rename it
to an expected name, because there's already a hidden file.
To fix this issue, let's introduce a new helper function
`memory_startswith_no_case()`, which compares memory regions in a
case-insensitive way. Use this function in `curl_header_strdup()`.
See also https://github.com/kinvolk/kube-spawn/issues/304
[1]: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
On Dell machines LoadOptions is filled with:
01 00 00 00 <name of BIOS Boot Loader Entry> ... <unknown bytes>
So, in case of meaningfull LoadOptions, better check if the first char
is a printable character.
Fix#9993. When this code was split out to user-runtime-dir, it forgot to
include the call to mac_selinux_init(). So mkdir_label() stopped working.
Fixes: a9f0f5e501 ("logind: split %t directory creation to a helper
unit")
The MountEntry's added for EMPTY_DIR work very similarly to the TMPFS ones.
In both cases, .has_prefix is false. In fact, .has_prefix is false in
*all* the MountEntry's we add except for the access mounts (READONLY etc).
But EMPTY_DIR stuck out by explicitly setting .has_prefix = false.
Let's remove that.
... when no mount options are passed.
Change the code, to avoid the following failure in the newly added tests:
exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c
'[ "$(stat -c %a /var)" == 755 ]'
++ stat -c %a /var
+ '[' 1777 == 755 ']'
Received SIGCHLD from PID 30364 (sh).
Child 30364 (sh) died (code=exited, status=1/FAILURE)
(And I spotted an opportunity to use TAKE_PTR() at the end).
We can't remount the underlying superblocks, if we are inside a user
namespace and running Linux <= 4.17. We can only change the per-mount
flags (MS_REMOUNT | MS_BIND).
This type of mount() call can only change the per-mount flags, so we
don't have to worry about passing the right string options now.
Fixes#9914 ("Since 1beab8b was merged, systemd has been failing to start
systemd-resolved inside unprivileged containers" ... "Failed to re-mount
'/run/systemd/unit-root/dev' read-only: Operation not permitted").
> It's basically my fault :-). I pointed out we could remount read-only
> without MS_BIND when reviewing the PR that added TemporaryFilesystem=,
> and poettering suggested to change PrivateDevices= at the same time.
> I think it's safe to change back, and I don't expect anyone will notice
> a difference in behaviour.
>
> It just surprised me to realize that
> `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the
> superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does.
> Maybe a comment could note the kernel version (v4.18), that lets you
> remount without MS_BIND inside a user namespace.
This makes the code longer and I guess this function is still ugly, sorry.
One obstacle to cleaning it up is the interaction between
`PrivateDevices=yes` and `ReadOnlyPaths=/dev`. I've added a test for the
existing behaviour, which I think is now the correct behaviour.
Only report OOM if that was actually the error of the operation,
explicitly report the possible error that a syscall was already blocked
with a different errno and translate that into a more sensible errno
(EEXIST only makes sense in connection to the hashmap), and pass through
all other potential errors unmodified. Part of #9939.
`systemd-resolved.service` runs as `User=systemd-resolved`, and uses certain
Capabilit{y,ies} magic. By my understanding, this means it is started with a
number of "privileges". Indeed, `capabilities(7)` explains
> Linux divides the privileges traditionally
> associated with superuser into distinct units, known as capabilities,
> which can be independently enabled and disabled."
This situation appears to contradict our current code comment which said
> If we are not running as root we assume all privileges are already dropped.
This appears to be a confusion in the comment only. The rest of the code
tells a much clearer story. (Don't ask me if the story is correct.
`capabilities(7)` scares me). Let's tweak the comment to make it consistent
and avoid worrying readers about this.
If logind is not supported, don't try to schedule a shutdown,
immediately poweroff. This is the behavior indicated by the current
message given to the user, but the command is returning an error. I
believe this was broken on this commit:
7f96539d45
sfeatures is a "struct ethtool_sfeatures". Use sizeof() on the correct
data type.
Since "struct ethtool_gstrings" is larger than "struct ethtool_sfeatures",
this had no serious consequences.
Fixes: 50725d10e3
When computing the next network prefix to assign, compute the next
prefix to allocate based on the intended /64 assignment, not the
given prefix length for the whole prefix, e.g. /48, given to
systemd-networkd.
Fixes#9626.
The option is now called simply "Encapsulation=".
Also, "ignoring" is rather misleading, because we use to to mean that some line
is being ignored. Here the whole tunnel is dropped.
Use "falling back" instead of "fallback".
Also, it's not an application-specific machine ID, but rather an
application-and-machine-specific ID. Let's call it "app-machine-speicific" for
short.
This work add support to generic netlink to sd-netlink.
See https://lwn.net/Articles/208755/
networkd: add support FooOverUDP support to IPIP tunnel netdev
https://lwn.net/Articles/614348/
Example conf:
/lib/systemd/network/1-fou-tunnel.netdev
```
[NetDev]
Name=fou-tun
Kind=fou
[FooOverUDP]
Port=5555
Protocol=4
```
/lib/systemd/network/ipip-tunnel.netdev
```
[NetDev]
Name=ipip-tun
Kind=ipip
[Tunnel]
Independent=true
Local=10.65.208.212
Remote=10.65.208.211
FooOverUDP=true
FOUDestinationPort=5555
```
$ ip -d link show ipip-tun
```
5: ipip-tun@NONE: <POINTOPOINT,NOARP> mtu 1472 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 10.65.208.212 peer 10.65.208.211 promiscuity 0
ipip remote 10.65.208.211 local 10.65.208.212 ttl inherit pmtudisc encap fou encap-sport auto encap-dport 5555 noencap-csum noencap-csum6 noencap-remcsum numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```
Pretty much all intel cpus have had RDRAND in a long time. While
CPU-internal RNG are widely not trusted, for seeding hash tables it's
perfectly OK to use: we don't high quality entropy in that case, hence
let's use it.
This is only hooked up with 'high_quality_required' is false. If we
require high quality entropy the kernel is the only source we should
use.
This makes two changes to the namespacing code:
1. We'll only gracefully skip service namespacing on access failure if
exclusively sandboxing options where selected, and not mount-related
options that result in a very different view of the world. For example,
ignoring RootDirectory=, RootImage= or Bind= is really probablematic,
but ReadOnlyPaths= is just a weaker sandbox.
2. The namespacing code will now return a clearly recognizable error
code when it cannot enforce its namespacing, so that we cannot
confuse EPERM errors from mount() with those from unshare(). Only the
errors from the first unshare() are now taken as hint to gracefully
disable namespacing.
Fixes: #9844#9835
The fstab entry may contain comment/application-specific options, like
for example x-systemd.automount or x-initrd.mount.
With the recent switch to libmount, the mount options during remount are
now gathered via mnt_fs_get_options(), which returns the merged fstab
options with the effective options in mountinfo.
Unfortunately if one of these application-specific options are set in
fstab, the remount will fail with -EINVAL.
In systemd 238:
Remounting '/test-x-initrd-mount' read-only in with options
'errors=continue,user_xattr,acl'.
In systemd 239:
Remounting '/test-x-initrd-mount' read-only in with options
'errors=continue,user_xattr,acl,x-initrd.mount'.
Failed to remount '/test-x-initrd-mount' read-only: Invalid argument
So instead of using mnt_fs_get_options(), we're now using both
mnt_fs_get_fs_options() and mnt_fs_get_vfs_options() and merging the
results together so we don't get any non-relevant options from fstab.
Signed-off-by: aszlig <aszlig@nix.build>
A follow-up for commit 9d874aec45.
This patch makes "path" parameter mandatory in fd_set_*() helpers removing the
need to use fd_get_path() when NULL was passed. The caller is supposed to pass
the fd anyway so assuming that it also knows the path should be safe.
Actually, the only case where this was useful (or used) was when we were
walking through directory trees (in item_do()). But even in those cases the
paths could be constructed trivially, which is still better than relying on
fd_get_path() (which is an ugly API).
A very succinct test case is also added for 'z/Z' operators so the code dealing
with recursive operators is tested minimally.
Let's fold get_user_creds_clean() into get_user_creds(), and introduce a
flags argument for it to select "clean" behaviour. This flags parameter
also learns to other new flags:
- USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for
root/nobody are only synthesized as fallback. Normally, the synthesized
records take precedence over what is in the user database. With this
flag set this is reversed, and the user database takes precedence, and
the synthesized records are only used if they are missing there. This
flag should be set in cases where doing NSS is deemed safe, and where
there's interest in knowing the correct shell, for example if the
admin changed root's shell to zsh or suchlike.
- USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by
numeric value, and there's no user/group record for it accept it
anyway. This allows us to fix#9767
This then also ports all users to set the most appropriate flags.
Fixes: #9767
[zj: remove one isempty() call]
On the host these symlinks are created by udev, and we consider them API
and make use of them ourselves at various places. Hence when running a
private /dev, also create these symlinks so that lookups by major/minor
work in such an environment, too.
This is some extra protection for sloppy "golden master" systems, where
images are duplicated many times but the random seed is not
deleted (or reset for each copy). That golden master systems have to
reset /etc/machine-id is better known, and easier to notice (as having
the same ID will result in address conflicts and suchlike quite often).
Hence let's write the machine ID into /dev/urandom, in case it has been
initialized and unlikely the stored random seed has been provisioned
differently on each image.
Note that we don't credit the entropy either way, hence in the case
there's a cycle of a) generating the machine-id early at boot and b)
writing it back into /dev/urandom late at boot it shouldn't matter. It's
never going to make things worse, just in a few cases better.
When stdin/stdout/stderr is initialized from an fd, let's read the tty
name of it if we can, and pass that to PAM.
This makes sure that "machinectl shell" sessions have proper TTY fields
initialized that "loginctl" then shows.
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.
I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
This fixes a memory leak:
```
d5070e2f67ededca022f81f2941900606b16f3196b2268e856295f59._openpgpkey.gmail.com: resolve call failed: 'd5070e2f67ededca022f81f2941900606b16f3196b2268e856295f59._openpgpkey.gmail.com' not found
=================================================================
==224==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x7f71b0878850 in malloc (/usr/lib64/libasan.so.4+0xde850)
#1 0x7f71afaf69b0 in malloc_multiply ../src/basic/alloc-util.h:63
#2 0x7f71afaf6c95 in hexmem ../src/basic/hexdecoct.c:62
#3 0x7f71afbb574b in string_hashsum ../src/basic/gcrypt-util.c:45
#4 0x56201333e0b9 in string_hashsum_sha256 ../src/basic/gcrypt-util.h:30
#5 0x562013347b63 in resolve_openpgp ../src/resolve/resolvectl.c:908
#6 0x562013348b9f in verb_openpgp ../src/resolve/resolvectl.c:944
#7 0x7f71afbae0b0 in dispatch_verb ../src/basic/verbs.c:119
#8 0x56201335790b in native_main ../src/resolve/resolvectl.c:2947
#9 0x56201335880d in main ../src/resolve/resolvectl.c:3087
#10 0x7f71ad8fcf29 in __libc_start_main (/lib64/libc.so.6+0x20f29)
SUMMARY: AddressSanitizer: 65 byte(s) leaked in 1 allocation(s).
```
The references to the dns_server are now setup after the tls connection is setup.
This ensures that the stream got fully stopped when the initial tls setup failed
instead of having the unref being blocked by the reference to the stream by the server.
Therefore on_stream_io would no longer be called with a half setup encrypted connection.
Fixes the issue reported in #9838.
Previously, we'd act immediately on StopWhenUnneeded= when a unit state
changes. With this rework we'll maintain a queue instead: whenever
there's the chance that StopWhenUneeded= might have an effect we enqueue
the unit, and process it later when we have nothing better to do.
This should make the implementation a bit more reliable, as the unit notify event
cannot immediately enqueue tons of side-effect jobs that might
contradict each other, but we do so only in a strictly ordered fashion,
from the main event loop.
This slightly changes the check when to consider a unit "unneeded".
Previously, we'd assume that a unit in "deactivating" state could also
be cleaned up. With this new logic we'll only consider units unneeded
that are fully up and have no job queued. This means that whenever
there's something pending for a unit we won't clean it up.
The function replaces a couple commas, a semicolon and the final newline with
zero bytes in the string passed to it. The 'const' seems to have been added
by accident during a bulk edit (more specifically 3b3154df7e).