This adds a new flag JSON_FORMAT_OFF that is a marker for "no JSON
output please!".
Of course, this flag sounds pointless in a JSON implementation, however
this is useful in code that can generate JSON output, but also more
human friendly output (for example our table formatters).
With this in place various tools that so far maintained one boolean
field "arg_json" that controlled whether JSON output was requested at
all and another field "arg_json_format_flags" for selecing the precise
json output flags may merge them into one, simplifying code a bit.
When building with Clang and using structured initialization, the
bpf_attr union is not zero-padded, so the kernel misdetects it as
an unsupported extension.
zero it until Clang's behaviour matches GCC. Do not skip the test
on Github Actions anymore.
In situations where a service fails to start, systemd suggests the user to
use "journalctl -xe" to get details about the failure. While running this
command does provide some additional details, most of the information is
similar to what was already printed when the service fails.
often the actual reason for the failure can be found in the logs of the
service that fails to start.
This patch updates the wording to suggest using "-u" to view the service
logs instead.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit adds support for disabling the read and write
workqueues with the new crypttab options no-read-workqueue
and no-write-workqueue. These correspond to the cryptsetup
options --perf-no_read_workqueue and --perf-no_write_workqueue
respectively.
This is similar to the base64 support, but fixed-size hash values are
typically preferably presented as series of hex values, hence store them
here like that too.
Optionally, embedd PKCS#11 token URI and encrypted key in LUKS2 JSON
metadata header. That way it becomes very easy to unlock properly set up
PKCS#11-enabled LUKS2 volumes, a simple /etc/crypttab line like the
following suffices:
mytest /dev/disk/by-partuuid/41c1df55-e628-4dbb-8492-bc69d81e172e - pkcs11-uri=auto
Such a line declares that unlocking via PKCS#11 shall be attempted, and
the token URI and the encrypted key shall be read from the LUKS2 header.
An external key file for the encrypted PKCS#11 key is hence no longer
necessary, nor is specifying the precise URI to use.
In hostnamed this is exposed as a dbus property, and in the logs in both
places.
This is of interest to network management software and such: if the fallback
hostname is used, it's not as useful as the real configured thing. Right now
various programs try to guess the source of hostname by looking at the string.
E.g. "localhost" is assumed to be not the real hostname, but "fedora" is. Any
such attempts are bound to fail, because we cannot distinguish "fedora" (a
fallback value set by a distro), from "fedora" (received from reverse dns),
from "fedora" read from /etc/hostname.
/run/systemd/fallback-hostname is written with the fallback hostname when
either pid1 or hostnamed sets the kernel hostname to the fallback value. Why
remember the fallback value and not the transient hostname in /run/hostname
instead?
We have three hostname types: "static", "transient", fallback".
– Distinguishing "static" is easy: the hostname that is set matches what
is in /etc/hostname.
– Distingiushing "transient" and "fallback" is not easy. And the
"transient" hostname may be set outside of pid1+hostnamed. In particular,
it may be set by container manager, some non-systemd tool in the initramfs,
or even by a direct call. All those mechanisms count as "transient". Trying
to get those cases to write /run/hostname is futile. It is much easier to
isolate the "fallback" case which is mostly under our control.
And since the file is only used as a flag to mark the hostname as fallback,
it can be hidden inside of our /run/systemd directory.
For https://bugzilla.redhat.com/show_bug.cgi?id=1892235.
gethostname(3) says it's unspecified whether the string is properly terminated
when the hostname is too long. We created a buffer with one extra byte, and it
seems the intent was to let that byte serve as terminator even if we get an
unterminated string from gethostname().
No functional change, just moving a bunch of things around. Before
we needed a rather complicated setup to test hostname_setup(), because
the code was in src/core/. When things are moved to src/shared/
we can just test it as any function.
The test is still "unsafe" because hostname_setup() may modify the
hostname.
When someone runs 'nft flush ruleset' in the same net namespace
this will also tear down the systemd nat table.
Unlike iptables -t nat -F, which will remove all rules added by
the systemd iptables backend, iptables has builtin chains that cannot
be deleted. IOW, the next add operation will 'just work'.
In the nftables case however, the entire table gets removed.
When the systemd nat table is removed by an external entity next
attempt to add a set element will yield -ENOENT.
If this happens, recreate the table, and, if successful, re-do
the add operation.
Note that this doesn't protect against external sabotage such as
a running 'while true; nft flush ruleset;done'. However, there is
nothing that could be done short of extending the kernel to allow
tables to be "frozen" or otherwise tied to a process such as
systemd-networkd.
Idea is to use a static ruleset, added when the first attempt to
add a masquerade or dnat rule is made.
The alternative would be to add the ruleset when the init function is called.
The disadvantage is that this enables connection tracking and NAT in the kernel
(as the ruleset needs this to work), which comes with some overhead that might
not be needed (no nspawn usage and no IPMasquerade option set).
There is no additional dependency on the 'nft' userspace binary or other libraries.
sd-netlinks nfnetlink backend is used to modify the nftables ruleset.
The commit message/comments still use nft syntax since that is what
users will see when they use the nft tool to list the ruleset.
The added initial skeleton (added on first fw_add_masquerade/local_dnat
call) looks like this:
table ip io.systemd.nat {
set masq_saddr {
type ipv4_addr
flags interval
elements = { 192.168.59.160/28 }
}
map map_port_ipport {
type inet_proto . inet_service : ipv4_addr . inet_service
elements = { tcp . 2222 : 192.168.59.169 . 22 }
}
chain prerouting {
type nat hook prerouting priority dstnat + 1; policy accept;
fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport
}
chain output {
type nat hook output priority -99; policy accept;
ip daddr != 127.0.0.0/8 oif "lo" dnat ip addr . port to meta l4proto . th dport map @map_port_ipport
}
chain postrouting {
type nat hook postrouting priority srcnat + 1; policy accept;
ip saddr @masq_saddr masquerade
}
}
Next calls to fw_add_masquerade/add_local_dnat will then only add/delete the
element/mapping to masq_saddr and map_port_ipport, i.e. the ruleset doesn't
change -- only the set/map content does.
Running test-firewall-util with this backend gives following output
on a parallel 'nft monitor':
$ nft monitor
add table ip io.systemd.nat
add chain ip io.systemd.nat prerouting { type nat hook prerouting priority dstnat + 1; policy accept; }
add chain ip io.systemd.nat output { type nat hook output priority -99; policy accept; }
add chain ip io.systemd.nat postrouting { type nat hook postrouting priority srcnat + 1; policy accept; }
add set ip io.systemd.nat masq_saddr { type ipv4_addr; flags interval; }
add map ip io.systemd.nat map_port_ipport { type inet_proto . inet_service : ipv4_addr . inet_service; }
add rule ip io.systemd.nat prerouting fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport
add rule ip io.systemd.nat output ip daddr != 127.0.0.0/8 fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport
add rule ip io.systemd.nat postrouting ip saddr @masq_saddr masquerade
add element ip io.systemd.nat masq_saddr { 10.1.2.3 }
add element ip io.systemd.nat masq_saddr { 10.0.2.0/28 }
delete element ip io.systemd.nat masq_saddr { 10.0.2.0/28 }
delete element ip io.systemd.nat masq_saddr { 10.1.2.3 }
add element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.4 . 815 }
delete element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.4 . 815 }
add element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.5 . 815 }
delete element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.5 . 815 }
CTRL-C
Things not implemented/supported:
1. Change monitoring. The kernel allows userspace to learn about changes
made by other clients (using nfnetlink notifications). It would be
possible to detect when e.g. someone removes the systemd nat table.
This would need more work. Its also not clear on how to react to
external changes -- it doesn't seem like a good idea to just auto-undo
everthing.
2. 'set masq_saddr' doesn't handle overlaps.
Example:
fw_add_masquerade(true, AF_INET, "10.0.0.0" , 16);
fw_add_masquerade(true, AF_INET, "10.0.0.0" , 8); /* fails */
With the iptables backend the second call works, as it adds an
independent iptables rule.
With the nftables backend, the range 10.0.0.0-10.255.255.255 clashes with
the existing range of 10.0.0.0-10.0.255.255 so 2nd add gets rejected by the
kernel.
This will generate an error message from networkd ("Could not enable IP
masquerading: File exists").
To resolve this it would be needed to either keep track of the added elements
and perform range merging when overlaps are detected.
However, the add erquests are done using the configured network on a
device, so no overlaps should occur in normal setups.
IPv6 support is added in a extra changeset.
Fixes: #13307
for planned nft backend we have three choices:
- open/close a new nfnetlink socket for every operation
- keep a nfnetlink socket open internally
- expose a opaque fw_ctx and stash all internal data here.
Originally I opted for the 2nd option, but during review it was
suggested to avoid static storage duration because of perceived
problems with threaded applications.
This adds fw_ctx and new/free functions, then converts the existing api
and nspawn and networkd to use it.
In a nutshell:
1. git mv firewall-util.c firewall-util-iptables.c
2. existing external functions gain _iptables_ in their names
3. firewall-util.c provides old function names
4. build system always compiles firewall-util.c,
firewall-util-iptables.c is conditional instead (libiptc).
5. On first call to any of the 'old' API functions performs
a probe that should return the preferred backend.
In a future step, can add firewall-util-FOOTYPE.c, add its
probe function to firewall-util.c and then have calls to
fw_add_masq/local_dnat handed to the detected backend.
For now, only iptables backend exists, and no special probing
takes place for it, i.e. when systemd was built with iptables,
that will be used. If not, requets to add masquerade/dnat will
fail with same error (-EOPNOTSUPP) as before this change.
For reference, the rules added by the libiptc/iptables backend look like this:
for service export (via systemd-nspawn):
[0:0] -A PREROUTING -p tcp -m tcp --dport $exportedport -m addrtype --dst-type LOCAL -j DNAT --to-destination $containerip:$port
[0:0] -A OUTPUT ! -d 127.0.0.0/8 -p tcp -m tcp --dport $exportedport -m addrtype --dst-type LOCAL -j DNAT --to-destination $containerip:$port
for ip masquerade:
[0:0] -A POSTROUTING -s network/prefix -j MASQUERADE
Make sure we don't add masquerading rules without a explicitly
specified network range we should be masquerading for.
The only caller aside from test case is
networkd-address.c which never passes a NULL source.
As it also passes the network prefix, that should always be > 0 as well.
This causes expected test failure:
Failed to modify firewall: Invalid argument
Failed to modify firewall: Invalid argument
Failed to modify firewall: Invalid argument
Failed to modify firewall: Protocol not available
Failed to modify firewall: Protocol not available
Failed to modify firewall: Protocol not available
Failed to modify firewall: Protocol not available
The failing test cases are amended to expect failure on
NULL source or prefix instead of success.