While working on the gateway→_gateway conversion, I noticed that
libidn2 strips the leading underscore in some names.
https://gitlab.com/libidn/libidn2/issues/30 was resolved in
05d753ea69,
which disabled "STD3 ASCII rules" by default, i.e. disabled stripping
of underscores. So the situation is that with previously released libidn2
versions we would get incorrect behaviour, and once new libidn2 is released,
we should be OK.
Let's implement a simple test which checks that the name survives the
roundtrip, and if it doesn't, skip IDN resolution. Under old libidn2 this will
fail in more cases, and under new libidn2 in fewer, but should be the right
thing to do also under new libidn2.
https://tools.ietf.org/html/rfc5891#section-4.2.3.1 says that
> The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third
> and fourth character positions and MUST NOT start or end with a "-" (hyphen).
This means that libidn2 refuses to encode such names.
Let's just resolve them without trying to use IDN.
libidn2 2.0.0 supports IDNA2008, in contrast to libidn which supports IDNA2003.
https://bugzilla.redhat.com/show_bug.cgi?id=1449145
From that bug report:
Internationalized domain names exist for quite some time (IDNA2003), although
the protocols describing them have evolved in an incompatible way (IDNA2008).
These incompatibilities will prevent applications written for IDNA2003 to
access certain problematic domain names defined with IDNA2008, e.g., faß.de is
translated to domain xn--fa-hia.de with IDNA2008, while in IDNA2003 it is
translated to fass.de domain. That not only causes incompatibility problems,
but may be used as an attack vector to redirect users to different web sites.
v2:
- keep libidn support
- require libidn2 >= 2.0.0
v3:
- keep dns_name_apply_idna caller dumb, and keep the #ifdefs inside of the
function.
- use both ±IDN and ±IDN2 in the version string
Let's make sure the root domain is normalized to ".", rather than then empty string, so that there's actually something
to see on screen. Normally, we don't append a trailing dot to normalized domain names, but do so in the one exception
of the root domain, taking inspiration from UNIX file system paths.
empty non-terminals generally lack NSEC RRs, which means we can deduce their existance only from the fact that there
are other RRs that contain them in their suffix. Specifically, the NSEC proof for NODATA on ENTs works by sending the
NSEC whose next name is a suffix of the queried name to the client. Use this information properly.
Previously, we'd not allow control characters to be embedded in domain
names, even when escaped. Since cloudflare uses \000 however to
implement its synthethic minimally covering NSEC RRs, we should allow
them, as long as they are properly escaped.
All our other domain name handling functions make no destinction between
domain names that end in a dot plus a NUL, or those just ending in a
NUL. Make sure dns_name_compare_func() and dns_label_unescape_suffix()
do the same.
Be stricter when searching suitable NSEC3 RRs for proof: generalize the
check we use to find suitable NSEC3 RRs, in nsec3_is_good(), and add
additional checks, such as checking whether all NSEC3 RRs use the same
parameters, have the same suffix and so on.
Note that this is still not complete, one additional step is still
missing: when we verified that a wildcard RRset is properly signed, we
still need to do an NSEC/NSEC3 proof that no more specific RRset exists.
Make sure dns_name_normalize(), dns_name_concat(), dns_name_is_valid()
do not accept/generate invalidly long hostnames, i.e. longer than 253
characters.
Labels of zero length are not OK, refuse them early on. The concept of a
"zero-length label" doesn't exist, a zero-length full domain name
however does (representing the root domain). See RFC 2181, Section 11.
The new dns_label_escape() call now operates on a buffer passed in,
similar to dns_label_unescape(). This should make decoding a bit faster,
and nicer.
Let's change the return value to bool. If we encounter an error while
parsing, return "false" instead of the actual parsing error, after all
the specified hostname does not qualify for what the function is
supposed to test.
Dealing with the additional error codes was always cumbersome, and
easily misused, like for example in the DHCP code.
Let's also rename the functions from dns_name_root() to
dns_name_is_root(), to indicate that this function checks something and
returns a bool. Similar for dns_name_is_signal_label().
This adds dns_service_join() and dns_service_split() which may be used
to concatenate a DNS-SD service name, am SRV service type string, and a
domain name into a full resolvable DNS domain name string. If the
service name is specified as NULL, only the type and domain are
appended, to implement classic, non-DNS-SD SRV lookups.
The reverse is dns_service_split() which takes the full name, and split
it into the three components again.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
Given three DNS names this function indicates if the second argument lies
strictly between the first and the third according to the canonical DNS
name order. Note that the order is circular, so the last name is
considered to be before the first.
Intended to be called repeatedly, and returns then successive unescaped labels
from the most to the least significant (left to right).
This is slightly inefficient as it scans the string three times (two would be
sufficient): once to find the end of the string, once to find the beginning
of each label and lastly once to do the actual unescaping. The latter two
could be done in one go, but that seemed unnecessarily convoluted.