Add helper function for fetching enabled/disabled state of Prefix
Delegation for a DHCPv6 client. Update function setting prefix
delegation to use an int instead of a boolean.
Both 'systemd-hwdb update' and 'udevadm hwdb --update' creates hwdb
database. The database created by systemd-hwdb containes additional
information such that priority, line number, and source filename.
The unified function 'hwdb_update()' can take a flag 'compat' which
controls the format version of created database.
Several functions are shared by qsort and hash_ops or Prioq.
This makes these functions explicitly specify argument type,
and cast to __compar_fn_t where necessary.
In older kernels IPoIB network devices expose the port number via
the sysfs attribute 'dev_id', which is not intended to be used this way.
Let's support both options for a while.
An InfiniBand network address is 20 bytes long. Only the least
significant 8 bytes can be interpreted as a persistent hardware unit
identifier; the other 12 are transiently derived at runtime from metadata
specific to the protocol stack.
However, since the network interface name length is hard-capped by
IFNAMSIZ at 16 chars and the 2-byte type prefix with '\0' at the end
leave us only at 13, we cannot squeeze a descriptive representation of a
HW address into an interface name. Thus, it makes the most sense to drop
the scheme for IPoIB interfaces entirely.
Currently udev just gets confused and does what it has been taught
to do: fetches the first six bytes and puts them into a permanent
device attribute.
We've long neglected IP-over-InfiniBand network interfaces, let's treat
them the same way we treat anyone else.
IPoIB interfaces will retain the 'ib' prefix; otherwise the naming scheme
is the same one we use for other network interfaces. E.g. a IPoIB network
device provided by a PCI card at bus 21 slot 0 function 6 will be named
'ibp21s0f6'.
Quoting https://github.com/systemd/systemd/issues/10074:
> detect_vm_uml() reads /proc/cpuinfo with read_full_file()
> read_full_file() has a file max limit size of READ_FULL_BYTES_MAX=(4U*1024U*1024U)
> Unfortunately, the size of my /proc/cpuinfo is bigger, approximately:
> echo $(( 4* $(cat /proc/cpuinfo | wc -c)))
> 9918072
> This causes read_full_file() to fail and the Condition test fallout.
Let's just read line by line until we find an intersting line. This also
helps if not running under UML, because we avoid reading as much data.
optind may be used in each verb, e.g., udevadm. So, let's initialize
optind before calling verbs.
Without this, e.g., udevadm -d hwdb --update causes error in parsing arguments.
Since the commit torvalds/linux@fdb5c4531c
the syscall BPF_PROG_ATTACH return EBADF when CONFIG_CGROUP_BPF is
turned off and as result the bpf_firewall_supported() returns the
incorrect value.
This commmit replaces the syscall BPF_PROG_ATTACH with BPF_PROG_DETACH
which is still work as expected.
Resolvesopenbmc/linux#159
See also systemd/systemd#7054
Signed-off-by: Alexander Filippov <a.filippov@yadro.com>
Without this, log shows meaningless error message 'No anode', e.g.,
===
Failed to unshare the mount namespace: Operation not permitted
foo.service: Failed to set up mount namespacing: No anode
foo.service: Failed at step NAMESPACE spawning /usr/bin/test: No anode
===
Follow-up for 1beab8b0d0.
Both SO_SNDBUFFORCE and SO_RCVBUFFORCE requires capability 'net_admin'.
If this capability is not granted to the service the first attempt to increase
the recv/snd buffers (via sd_notify()) with SO_RCVBUFFORCE/SO_SNDBUFFORCE will
fail, even if the requested size is lower than the limit enforced by the
kernel.
If apparmor is used, the DENIED logs for net_admin will show up. These log
entries are seen as red warning light, because they could indicate that a
program has been hacked and tries to compromise the system.
It would be nicer if they can be avoided without giving services (relying on
sd_notify) net_admin capability or dropping DENIED logs for all such services
via their apparmor profile.
I'm not sure if sd_notify really needs to forcibly increase the buffer sizes,
but at least if the requested size is below the kernel limit, the capability
(hence the log entries) should be avoided.
Hence let's first ask politely for increasing the buffers and only if it fails
then ignore the kernel limit if we have sufficient privileges.
When $XDG_DATA_DIRS is unset, then, the default value
'/usr/local/share:/usr/share' is used.
When $XDG_DATA_DIRS contain the default paths but the order
is inverted: '/usr/share:/usr/local/share', then test-path-lookup fails.
Fixes#10002.
Back in 08318a2c5a, value "false" was enabled for
'-Dtests=', but various tests were not conditionalized properly. So even with
-Dtests=false -Dslow-tests=false we'd run 120 tests. Let's make this consistent.
This makes it so that tests no longer need to know the absolute paths to the
source and build dirs, instead using the systemd-runtest.env file to get these
paths when running from the build tree.
Confirmed that test-catalog works on `ninja test`, when called standalone and
also when the environment file is not present, in which case it will use the
installed location under /usr/lib/systemd/catalog.
The location can now also be overridden for this test by setting the
$SYSTEMD_CATALOG_DIR environment variable.
This simplifies get_testdata_dir() to simply checking for an environment
variable, with an additional function to locate a systemd-runtest.env file in
the same directory as the test binary and reading environment variable
assignments from that file if it exists.
This makes it possible to:
- Run `ninja test` from the build dir and have it use ${srcdir}/test for
test unit definitions.
- Run a test directly, such as `build/test-execute` and have it locate
them correctly.
- Run installed tests (from systemd-tests package) and locate the test
units in the installed location (/usr/lib/systemd/tests/testdata), in
which case the absence of the systemd-runtest.env file will have
get_testdata_dir() use the installed location hardcoded into the
binaries.
Explicit setting of $SYSTEMD_TEST_DATA still overrides the contents of
systemd-runtest.env.
Actually check the return code from logind_schedule_shutdown() and proceed to
immediate shutdown if that fails. Negative return codes can be returned if
systemctl is compiled without logind support, or if logind otherwise failed
(either too old, disabled/masked, or it is incomplete
systemd-shim/systemd-service implementation).
An assertion in dhcp_network_bind_raw_socket() is triggered when
starting an sd_dhcp_client without setting a MAC address first.
- sd_dhcp_client_start()
- client_start()
- client_start_delayed()
- dhcp_network_bind_raw_socket()
In that case, the arp-type and MAC address is still unset. Note that
dhcp_network_bind_raw_socket() already checks for a valid arp-type
and MAC address below, so we should just gracefully return -EINVAL.
Maybe sd_dhcp_client_start() should fail earlier when starting without
MAC address. But the failure here will be correctly propagated and
the start aborted.
Fixes: 76253e73f9
When a network namespace is needed, /sys is mounted as tmpfs (see commit
d8fc6a000f for details).
But in this case mode 755 was used as initial permissions for /sys whereas the
default mode for sysfs is 555.
In practice using 755 doesn't have any impact because /sys is mounted read-only
too but for consistency, let's use the correct mode.
Fixes: #10050
If more than one errno is specified for a syscall in SystemCallFilter=,
use the last one instead of reporting an error. This is especially
useful when used with system call sets:
SystemCallFilter=@privileged:EPERM @reboot
This will block any system call requiring super-user capabilities with
EPERM, except for attempts to reboot the system, which will immediately
terminate the process. (@reboot is included in @privileged.)
This also effectively fixes#9939, since specifying different errnos for
“the same syscall” (same pseudo syscall number) is no longer an error.
Dracut has a support for unlocking encrypted drives with keyfile stored
on the external drive. This support is included in the generated initrd
only if systemd module is not included.
When systemd is used in initrd then attachment of encrypted drives is
handled by systemd-cryptsetup tools. Our generator has support for
keyfile, however, it didn't support keyfile on the external block
device (keydev).
This commit introduces basic keydev support. Keydev can be specified per
luks.uuid on the kernel command line. Keydev is automatically mounted
during boot and we look for keyfile in the keydev
mountpoint (i.e. keyfile path is prefixed with the keydev mount point
path). After crypt device is attached we automatically unmount
where keyfile resides.
Example:
rd.luks.key=70bc876b-f627-4038-9049-3080d79d2165=/key:LABEL=KEYDEV
According to RFC2616[1], HTTP header names are case-insensitive. So
it's totally valid to have a header starting with either `Date:` or
`date:`.
However, when systemd-importd pulls an image from an HTTP server, it
parses HTTP headers by comparing header names as-is, without any
conversion. That causes failures when some HTTP servers return headers
with different combinations of upper-/lower-cases.
An example:
https://alpha.release.flatcar-linux.net/amd64-usr/current/flatcar_developer_container.bin.bz2 returns `Etag: "pe89so9oir60"`,
while https://alpha.release.core-os.net/amd64-usr/current/coreos_developer_container.bin.bz2
returns `ETag: "f03372edea9a1e7232e282c346099857"`.
Since systemd-importd expects to see `ETag`, the etag for the Container Linux image
is correctly interpreted as a part of the hidden file name.
However, it cannot parse etag for Flatcar Linux, so the etag the Flatcar Linux image
is not appended to the hidden file name.
```
$ sudo ls -al /var/lib/machines/
-r--r--r-- 1 root root 3303014400 Aug 21 20:07 '.raw-https:\x2f\x2falpha\x2erelease\x2ecore-os\x2enet\x2famd64-usr\x2fcurrent\x2fcoreos_developer_container\x2ebin\x2ebz2.\x22f03372edea9a1e7232e282c346099857\x22.raw'
-r--r--r-- 1 root root 3303014400 Aug 17 06:15 '.raw-https:\x2f\x2falpha\x2erelease\x2eflatcar-linux\x2enet\x2famd64-usr\x2fcurrent\x2fflatcar_developer_container\x2ebin\x2ebz2.raw'
```
As a result, when the Flatcar image is removed and downloaded again,
systemd-importd is not able to determine if the file has been already
downloaded, so it always download it again. Then it fails to rename it
to an expected name, because there's already a hidden file.
To fix this issue, let's introduce a new helper function
`memory_startswith_no_case()`, which compares memory regions in a
case-insensitive way. Use this function in `curl_header_strdup()`.
See also https://github.com/kinvolk/kube-spawn/issues/304
[1]: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
On Dell machines LoadOptions is filled with:
01 00 00 00 <name of BIOS Boot Loader Entry> ... <unknown bytes>
So, in case of meaningfull LoadOptions, better check if the first char
is a printable character.
Fix#9993. When this code was split out to user-runtime-dir, it forgot to
include the call to mac_selinux_init(). So mkdir_label() stopped working.
Fixes: a9f0f5e501 ("logind: split %t directory creation to a helper
unit")
The MountEntry's added for EMPTY_DIR work very similarly to the TMPFS ones.
In both cases, .has_prefix is false. In fact, .has_prefix is false in
*all* the MountEntry's we add except for the access mounts (READONLY etc).
But EMPTY_DIR stuck out by explicitly setting .has_prefix = false.
Let's remove that.
... when no mount options are passed.
Change the code, to avoid the following failure in the newly added tests:
exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c
'[ "$(stat -c %a /var)" == 755 ]'
++ stat -c %a /var
+ '[' 1777 == 755 ']'
Received SIGCHLD from PID 30364 (sh).
Child 30364 (sh) died (code=exited, status=1/FAILURE)
(And I spotted an opportunity to use TAKE_PTR() at the end).
We can't remount the underlying superblocks, if we are inside a user
namespace and running Linux <= 4.17. We can only change the per-mount
flags (MS_REMOUNT | MS_BIND).
This type of mount() call can only change the per-mount flags, so we
don't have to worry about passing the right string options now.
Fixes#9914 ("Since 1beab8b was merged, systemd has been failing to start
systemd-resolved inside unprivileged containers" ... "Failed to re-mount
'/run/systemd/unit-root/dev' read-only: Operation not permitted").
> It's basically my fault :-). I pointed out we could remount read-only
> without MS_BIND when reviewing the PR that added TemporaryFilesystem=,
> and poettering suggested to change PrivateDevices= at the same time.
> I think it's safe to change back, and I don't expect anyone will notice
> a difference in behaviour.
>
> It just surprised me to realize that
> `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the
> superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does.
> Maybe a comment could note the kernel version (v4.18), that lets you
> remount without MS_BIND inside a user namespace.
This makes the code longer and I guess this function is still ugly, sorry.
One obstacle to cleaning it up is the interaction between
`PrivateDevices=yes` and `ReadOnlyPaths=/dev`. I've added a test for the
existing behaviour, which I think is now the correct behaviour.
Only report OOM if that was actually the error of the operation,
explicitly report the possible error that a syscall was already blocked
with a different errno and translate that into a more sensible errno
(EEXIST only makes sense in connection to the hashmap), and pass through
all other potential errors unmodified. Part of #9939.
`systemd-resolved.service` runs as `User=systemd-resolved`, and uses certain
Capabilit{y,ies} magic. By my understanding, this means it is started with a
number of "privileges". Indeed, `capabilities(7)` explains
> Linux divides the privileges traditionally
> associated with superuser into distinct units, known as capabilities,
> which can be independently enabled and disabled."
This situation appears to contradict our current code comment which said
> If we are not running as root we assume all privileges are already dropped.
This appears to be a confusion in the comment only. The rest of the code
tells a much clearer story. (Don't ask me if the story is correct.
`capabilities(7)` scares me). Let's tweak the comment to make it consistent
and avoid worrying readers about this.
If logind is not supported, don't try to schedule a shutdown,
immediately poweroff. This is the behavior indicated by the current
message given to the user, but the command is returning an error. I
believe this was broken on this commit:
7f96539d45
sfeatures is a "struct ethtool_sfeatures". Use sizeof() on the correct
data type.
Since "struct ethtool_gstrings" is larger than "struct ethtool_sfeatures",
this had no serious consequences.
Fixes: 50725d10e3
When computing the next network prefix to assign, compute the next
prefix to allocate based on the intended /64 assignment, not the
given prefix length for the whole prefix, e.g. /48, given to
systemd-networkd.
Fixes#9626.
The option is now called simply "Encapsulation=".
Also, "ignoring" is rather misleading, because we use to to mean that some line
is being ignored. Here the whole tunnel is dropped.
Use "falling back" instead of "fallback".
Also, it's not an application-specific machine ID, but rather an
application-and-machine-specific ID. Let's call it "app-machine-speicific" for
short.
This work add support to generic netlink to sd-netlink.
See https://lwn.net/Articles/208755/
networkd: add support FooOverUDP support to IPIP tunnel netdev
https://lwn.net/Articles/614348/
Example conf:
/lib/systemd/network/1-fou-tunnel.netdev
```
[NetDev]
Name=fou-tun
Kind=fou
[FooOverUDP]
Port=5555
Protocol=4
```
/lib/systemd/network/ipip-tunnel.netdev
```
[NetDev]
Name=ipip-tun
Kind=ipip
[Tunnel]
Independent=true
Local=10.65.208.212
Remote=10.65.208.211
FooOverUDP=true
FOUDestinationPort=5555
```
$ ip -d link show ipip-tun
```
5: ipip-tun@NONE: <POINTOPOINT,NOARP> mtu 1472 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 10.65.208.212 peer 10.65.208.211 promiscuity 0
ipip remote 10.65.208.211 local 10.65.208.212 ttl inherit pmtudisc encap fou encap-sport auto encap-dport 5555 noencap-csum noencap-csum6 noencap-remcsum numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```