We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
We need to use the CMSG_SPACE() macro to size the control buffers, not
CMSG_LEN(). The former is rounded up to next alignment boundary, the
latter is not. The former should be used for allocations, the latter for
encoding how much of it is actually initialized. See cmsg(3) man page
for details about this.
Given how confusing this is, I guess we don't have to be too ashamed
here, in most cases we actually did get this right.
Split out of #15457, let's see if this is the culprit of the CI failure.
(also setting green label here, since @keszybz already greenlit it in that other PR)
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
When specifying `DHCPv4.SendOption=`, it is used by systemd-networkd to
set the value of that option within the DHCP request that is sent out.
This differs to setting `DHCPServer.SendOption=`, which will place all
the options together as suboptions into the vendor-specific information
(code 43) option.
This commit adds two new config options, `DHCPv4.SendVendorOption=` and
`DHCPServer.SendVendorOption=`. These both have the behaviour of the old
`DHCPServer.SendOption=` flag, and set the value of the suboption in the
vendor-specific information option.
The behaviour of `DHCPServer.SendOption=` is then changed to reflect
that of `DHCPv4.SendOption=`. It will set the value of the corresponding
option in the DHCP request.
Don't reset the conflict counter when trying a new pseudo random
address, so that after trying 10 addresses the londer timeout is used in
accordance with the RFC
Fixes#14299.
The public function and the implementation were split into two for
no particular reason.
We would assert() on the internal state of the client. This should not be done
in a function that is directly called from a public function. (I.e., we should
not crash if the public function is called at the wrong time.)
assert() is changed to assert_return().
And before anyone asks: I put the assert_returns() *above* the internal
variables on purpose. This makes it easier to see that the assert_returns()
are about the state that is passed in, and if they are not satisfied, the
function returns immediately. The compiler doesn't care either way, so
the ordering that is clearest to the reader should be chosen.
IPServiceType set to CS6 (network control) causes problems on some old
network setups that continue to interpret the field as IP TOS.
Make DHCP work on such networks by allowing this field to be set to
CS4 (Realtime) instead, as this maps to IPTOS_LOWDELAY.
Signed-off-by: Siddharth Chandrasekaran <csiddharth@vmware.com>
The RFC states that lifetime (AdvDefaultLifetime) must be at least
MaxRtrAdvInterval (which more or less corresponds to SD_RADV_DEFAULT_MAX_TIMEOUT_USEC
in systemd).
To fulfill this limit, virtually lower MaxRtrAdvInterval and MinRtrAdvInterval
accordingly.
Also check that min is not lower than 3s and max is not lower than 4s.
This reverts commit 8a07b4033e.
The tests are kept. test-networkd-conf is adjusted to pass.
This fixes#13276. I think current rules are extremely confusing, as the
case in test-networkd-conf shows. We apply some kinds of unescaping (relating
to quoting), but not others (related to escaping of special characters).
But fixing this is hard, because people have adjusted quoting to match
our rules, and if we make the rules "better", things might break in unexpected
places.
Comparisons are done in the normal order (if (need > available), not if (available < need)),
variables have reduced scope and are renamed for clarity.
The only functional change is that if we return -ENAMETOOLONG, we do that
without modifying the options[] array.
I also added an explanatory comment. The use of one offset to point into three
buffers is not obvious.
Coverity (in CID#1402354) says that sname might be accessed at bad offset, but
I cannot see this happening. We check for available space before writing anything.
It's hard to even say what exactly this combination means. Escaping is
necessary when quoting to have quotes within the string. So the escaping of
quote characters is inherently tied to quoting. When unquoting, it seems
natural to remove escaping which was done for the quoting purposes. But with
both flags we would be expected to re-add this escaping after unqouting? Or
maybe keep the escaping which is not necessary for quoting but otherwise
present? This all seems too complicated, let's just forbid such usage and
always fully unescape when unquoting.
This is for 6d36464065. It turns out that this is causing more problems than
expected. Let's retroactively introduce naming scheme v241 to conditionalize
this change.
Follow-up for #12792 and 6d36464065. See also
https://bugzilla.suse.com/show_bug.cgi?id=1136600.
$ SYSTEMD_LOG_LEVEL=debug NET_NAMING_SCHEME=v240 build/udevadm test-builtin net_setup_link /sys/class/net/br11
$ SYSTEMD_LOG_LEVEL=debug NET_NAMING_SCHEME=v241 build/udevadm test-builtin net_setup_link /sys/class/net/br11
...
@@ -20,11 +20,13 @@
link_config: could not set ethtool features for br11
Could not set offload features of br11: Operation not permitted
br11: Device has name_assign_type=3
-Using interface naming scheme 'v240'.
+Using interface naming scheme 'v241'.
br11: Policy *keep*: keeping existing userspace name
br11: Device has addr_assign_type=1
-br11: No stable identifying information found
-br11: Could not generate persistent MAC: No data available
+br11: Using "br11" as stable identifying information
+br11: Using generated persistent MAC address
+Could not set Alias=, MACAddress= or MTU= on br11: Operation not permitted
+br11: Could not apply link config, ignoring: Operation not permitted
Unload module index
Unloaded link configuration context.
ID_NET_DRIVER=bridge
This reflect its role better.
(I didn't use …_persistent_name(), because which name is actually used
depends on the policy. So it's better not to make this sound like it returns
*the* persistent name.)
This is partially a refactoring, but also makes many more places use
unlocked operations implicitly, i.e. all users of fopen_temporary().
AFAICT, the uses are always for short-lived files which are not shared
externally, and are just used within the same context. Locking is not
necessary.
When the link goes down, DHCP client_receive_message*() functions return an
error and the related I/O source is removed from the main loop. With the
current implementation of systemd-networkd this doesn't matter because the DHCP
client is always stopped on carrier down and restarted on carrier up. However
it seems wrong to have the DHCP client crippled (because no packet can be
received anymore) once the link goes temporarily down.
Change the receive functions to ignore a ENETDOWN event so that the client will
be able to receive packets again after the link comes back.
inet_ntop() is not documented to be thread-safe, so it should not
be used in the DHCP library. Arguably, glibc uses a thread local
buffer, so indeed there is no problem with a suitable libc. Anyway,
just avoid it.
The DHCP client should not pre-filter addresses beyond what RFC
requires. If a client's user (like networkd) wishes to skip/filter
certain addresses, it's their responsibility.
The point of this is that the DHCP library does not hide/abstract
information that might be relevant for certain users. For example,
NetworkManager exposes DHCP options in its API. When doing that, the
options should be close to the actual lease.
This is related to commit d9ec2e632d
(dhcp4: filter bogus DNS/NTP server addresses silently).
The Router DHCP option may contain a list of one or more
routers ([1]). Extend the API of sd_dhcp_lease to return a
list instead of only the first.
Note that networkd still only uses the first router (if present).
Aside from extending the internal API of the DHCP client, there
is almost no change in behavior. The only visible difference in
behavior is that the "ROUTER" variable in the lease file is now a
list of addresses.
Note how RFC 2132 does not define certain IP addresses as invalid for the
router option. Still, previously sd_dhcp_lease_get_router() would never
return a "0.0.0.0" address. In fact, the previous API could not
differenciate whether no router option was present, whether it
was invalid, or whether its first router was "0.0.0.0". No longer let
the DHCP client library impose additional restrictions that are not
part of RFC. Instead, the caller should handle this. The patch does
that, and networkd only consideres the first router entry if it is not
"0.0.0.0".
[1] https://tools.ietf.org/html/rfc2132#section-3.5
deserialize_in_addrs() allocates the buffer before trying to parse
the IP address. Since a parsing error is silently ignored, the returned
size might be zero. In such a case we shouldn't return any buffer.
Anyway, there was no leak, because there are only two callers like
r = deserialize_in_addrs(&lease->dns, dns);
which both keep the unused buffer and later release it.
Note that deserialize_in_addrs() doesn't free the pointer before
reassigning the new output. The caller must take care to to pass
"ret" with an allocated buffer that would be leaked when returning
the result.
The "chaddr" field is 16 bytes long, with "hlen" being the
length of the address.
https://tools.ietf.org/html/rfc2131#section-4.3.1 says:
The server MUST return to the client:
...
o Any parameters specific to this client (as identified by
the contents of 'chaddr' or 'client identifier' in the DHCPDISCOVER
or DHCPREQUEST message), e.g., as configured by the network
administrator,
It's not clear, whether only the first 'hlen' bytes of 'chaddr'
must correspond or all 16 bytes.
Note that https://tools.ietf.org/html/rfc4390#section-2.1 says for IPoIB
"chaddr" (client hardware address) field MUST be zeroed.
with having "hlen" zero. This indicates that at least in this case, the
bytes after "hlen" would matter.
As the DHCP client always sets the trailing bytes to zero, we would expect
that the server also replies as such and we could just compare all 16 bytes.
However, let's be liberal and accept any padding here.
This in practice only changes behavior for infiniband, where we
previously would enforce that the first ETH_ALEN bytes are zero.
That seems arbitrary for IPoIB. We should either check all bytes or
none of them. Let's do the latter and don't enforce RFC 4390 in this
regard.
gcc-9 warns whenever the elements of a structure defined with _packed_ are used:
../src/network/networkd-dhcp6.c: In function ‘dhcp6_pd_prefix_assign’:
../src/network/networkd-dhcp6.c:92:53: warning: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Waddress-of-packed-member]
92 | r = manager_dhcp6_prefix_add(link->manager, &p->opt.in6_addr, link);
| ^~~~~~~~~~~~~~~~
And the compiler is right, because in principle the alignment could be wrong.
In this particular case it is not, because the structure is carefully defined
not to have holes. Let's remove _packed_ and use compile-time asserts to verify
that the offsets are not changed.
Fixes#3374. The problem is that we set MACPolicy=persistent (i.e. we would
like to generate persistent MAC addresses for interfaces which don't have a
fixed MAC address), but various virtual interfaces including bridges, tun/tap,
bonds, etc., do not not have the necessary ID_NET_NAME_* attributes and udev
would not assing the address and warn:
Could not generate persistent MAC address for $name: No such file or directory
Basic requirements which I think a solution for this needs to satisfy:
1. No changes to MAC address generation for those cases which are currently
handled successfully. This means that net_get_unique_predictable_data() must
keep returning the same answer, which in turn means net_get_name() must keep
returning the same answer. We can only add more things we look at with lower
priority so that we start to cover cases which were not covered before.
2. Like 1, but for IPvLL seed and DHCP IAD. This is less important, but "nice
to have".
3. Keep MACPolicy=persistent. If people don't want it, they can always apply
local configuration, but in general stable MACs are a good thing. I have never
seen anyone complain about that.
== Various approaches that have been proposed
=== https://github.com/systemd/systemd/issues/3374#issuecomment-223753264 (tomty89)
if !ID_BUS and INTERFACE, use INTERFACE
I think this almost does the good thing, but I don't see the reason to reject ID_BUS
(i.e. physical hardware). Stable MACs are very useful for physical hardware that has
no physical MAC.
=== https://github.com/systemd/systemd/issues/3374#issuecomment-224733069 (teg)
if (should_rename(device, true))
This means looking at name_assign_type. In particular for
NET_NAME_USER should_rename(..., true) returns true. It only returns false
for NET_NAME_PREDICTABLE. So this would cover stuff like br0, bond0, etc,
but would not cover lo and other devices with predictable names. That doesn't
make much sense.
But did teg mean should_rename() or !should_rename()?
=== https://github.com/systemd/systemd/issues/3374#issuecomment-234628502 (tomty89):
+ if (!should_rename(device, true))
+ return udev_device_get_sysname(device)
This covers only devices with NET_NAME_PREDICTABLE. Since the problem applies as
much to bridges and such, this isn't neough.
=== https://github.com/systemd/systemd/issues/3374#issuecomment-281745967 (grafi-tt)
+ /* if the machine doesn't provide data about the device, use the ifname specified by userspace
+ * (this is the case when the device is virtual, e.g., bridge or bond) */
+ s = udev_device_get_sysattr_value(device, "name_assign_type");
+ if (s && safe_atou(s, &type) >= 0 && type == NET_NAME_USER)
+ return udev_device_get_sysname(device);
This does not cover bond0, vnet0, tun/tap and similar.
grafi-tt also proposes patching the kernel, but *not* setting name_assign_type
seems intentional in those cases, because the device name is a result of
enumeration, not set by the userspace.
=== https://github.com/systemd/systemd/issues/3374#issuecomment-288882355 (tomty89)
(also PR #11372)
- MACAddressPolicy=persistent
This break requirement 3. above. It would solve the immediate problem, but I
think the disruption is too big.
=== This patch
This patch means that we will set a "stable" MAC for pretty much any virtual
device by default, where "stable" means keyed off the machine-id and interface
name.
It seems like a big change, but we already did this for most physical devices.
Doing it also for virtual devices doesn't seem like a big issue. It will make
the setup and monitoring of virtualized networks slightly nicer. I don't think
anyone is depending on having the MAC address changed when those devices are
destoryed and recreated. If they do, they'd have to change MACAddressPolicy=.
== Implementation
net_get_name() is called from dhcp_ident_set_iaid() so I didn't change
net_get_name() like in grafi-tt's patch, but net_get_unique_predictable_data().
net_get_unique_predictable_data() is called from get_mac() in link-config.c
and sd_ipv4ll_set_address_seed(), so both of those code paths are affected
and will now get data in some cases where they errored out previously.
The return code is changed to -ENODATA since that gives a nicer error string.
Found by inspecting results of running this small program:
int main(int argc, const char **argv) {
for (int i = 1; i < argc; i++) {
FILE *f;
char line[1024], prev[1024], *r;
int lineno;
prev[0] = '\0';
lineno = 1;
f = fopen(argv[i], "r");
if (!f)
exit(1);
do {
r = fgets(line, sizeof(line), f);
if (!r)
break;
if (strcmp(line, prev) == 0)
printf("%s:%d: error: dup %s", argv[i], lineno, line);
lineno++;
strcpy(prev, line);
} while (!feof(f));
fclose(f);
}
}
It seems quite useful to provide this additional information in public exported
functions.
This is a c99 feature, not supported in C++. Without the check in _sd-common.h:
FAILED: test-bus-vtable-cc@exe/src_libsystemd_sd-bus_test-bus-vtable-cc.cc.o
...
In file included from ../src/libsystemd/sd-bus/test-bus-vtable-cc.cc:9:
In file included from ../src/systemd/sd-bus-vtable.h:26:
In file included from ../src/systemd/sd-bus.h:26:
../src/systemd/sd-id128.h:38:47: error: static array size is a C99 feature, not permitted in C++
char *sd_id128_to_string(sd_id128_t id, char s[static SD_ID128_STRING_MAX]);
^
In .c files, I opted to use the define for consistency, even though we don't support
compilation with a C++ compiler, so the unconditional keyword would work too.
There are various functions to set the DUID of a DHCPv6 client.
However, none of them allows to set arbitrary data. The closest is
sd_dhcp6_client_set_duid(), which would still do validation of the
DUID's content via dhcp_validate_duid_len().
Relax the validation and only log a debug message if the DUID
does not validate.
Note that dhcp_validate_duid_len() already is not very strict. For example
with DUID_TYPE_LLT it only ensures that the length is suitable to contain
hwtype and time. It does not further check that the length of hwaddr is non-zero
or suitable for hwtype. Also, non-well-known DUID types are accepted for
extensibility. Why reject certain DUIDs but allowing clearly wrong formats
otherwise?
The validation and failure should happen earlier, when accepting the
unsuitable DUID. At that point, there is more context of what is wrong,
and a better failure reason (or warning) can be reported to the user. Rejecting
the DUID when setting up the DHCPv6 client seems not optimal, in particular
because the DHCPv6 client does not care about actual content of the
DUID and treats it as opaque blob.
Also, NetworkManager (which uses this code) allows to configure the entire
binary DUID in binary. It intentionally does not validate the binary
content any further. Hence, it needs to be able to set _invalid_ DUIDs,
provided that some basic constraints are satisfied (like the maximum length).
sd_dhcp6_client_set_duid() has two callers: both set the DUID obtained
from link_get_duid(), which comes from configuration.
`man networkd.conf` says: "The configured DHCP DUID should conform to
the specification in RFC 3315, RFC 6355.". It does not not state that
it MUST conform.
Note that dhcp_validate_duid_len() has another caller: DHCPv4's
dhcp_client_set_iaid_duid_internal(). In this case, continue with
strict validation, as the callers are more controlled. Also, there is
already sd_dhcp_client_set_client_id() which can be used to bypass
this check and set arbitrary client identifiers.
sd_dhcp_client_set_client_id() is the only API for setting a raw client-id.
All other setters are more restricted and only allow to set a type 255 DUID.
Also, dhcp4_set_client_identifier() is the only caller, which already
does:
r = sd_dhcp_client_set_client_id(link->dhcp_client,
ARPHRD_ETHER,
(const uint8_t *) &link->mac,
sizeof(link->mac));
and hence ensures that the data length is indeed ETH_ALEN.
Drop additional input validation from sd_dhcp_client_set_client_id(). The client-id
is an opaque blob, and if a caller wishes to set type 1 (ethernet) or type 32
(infiniband) with unexpected address length, it should be allowed. The actual
client-id is not relevant to the DHCP client, and it's the responsibility of the
caller to generate a suitable client-id.
For example, in NetworkManager you can configure all the bytes of the
client-id, including such _invalid_ settings. I think it makes sense,
to allow the user to fully configure the identifier. Even if such configuration
would be rejected, it would be the responsibility of the higher layers (including
a sensible error message to the user) and not fail later during
sd_dhcp_client_set_client_id().
Still log a debug message if the length is unexpected.