https://tools.ietf.org/html/rfc1035#section-2.3.1 says (approximately)
that only letters, numbers, and non-leading non-trailing dashes are allowed
(for entries with A/AAAA records). We set no restrictions.
hosts(5) says:
> Host names may contain only alphanumeric characters, minus signs ("-"), and
> periods ("."). They must begin with an alphabetic character and end with an
> alphanumeric character.
nss-files follows those rules, and will ignore names in /etc/hosts that do not
follow this rule.
Let's follow the documented rules for /etc/hosts. In particular, this makes us
consitent with nss-files, reducing surprises for the user.
I'm pretty sure we should apply stricter filtering to names received over DNS
and LLMNR and MDNS, but it's a bigger project, because the rules differ
depepending on which level the label appears (rules for top-level names are
stricter), and this patch takes the minimalistic approach and only changes
behaviour for /etc/hosts.
Escape syntax is also disallowed in /etc/hosts, even if the resulting character
would be allowed. Other tools that parse /etc/hosts do not support this, and
there is no need to use it because no allowed characters benefit from escaping.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
This makes two changes: first of all we will now explicitly check
whether a domain to test against an NSEC record is actually below the
signer's name. This is relevant for NSEC records that chain up the end
and the beginning of a zone: we shouldn't alow that NSEC record to match
against domains outside of the zone.
This also fixes how we handle NSEC checks for domains that are prefixes
of the NSEC RR domain itself, fixing #8164 which triggers this specific
case. The non-wildcard NSEC check is simplified for that, we can
directly make our between check, there's no need to find the "Next
Closer" first, as the between check should not be affected by additional
prefixes. For the wild card NSEC check we'll prepend the asterisk in
this case to the NSEC RR itself to make a correct check.
Fixes: #8164
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This changes dns_name_between() to deal properly with checking whether B
is between A and C if A and C are equal. Previously we simply returned
-EINVAL in this case, refusing checking. With this change we correct
behaviour: if A and C are equal, then B is "between" both if it is
different from them. That's logical, since we do < and > comparisons, not
<= and >=, and that means that anything "right of A" and "left of C"
lies in between with wrap-around at the ends. And if A and C are equal
that means everything lies between, except for A itself.
This fixes handling of domains using NSEC3 "white lies", for example the
.it TLD.
Fixes: #7421
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
While working on the gateway→_gateway conversion, I noticed that
libidn2 strips the leading underscore in some names.
https://gitlab.com/libidn/libidn2/issues/30 was resolved in
05d753ea69,
which disabled "STD3 ASCII rules" by default, i.e. disabled stripping
of underscores. So the situation is that with previously released libidn2
versions we would get incorrect behaviour, and once new libidn2 is released,
we should be OK.
Let's implement a simple test which checks that the name survives the
roundtrip, and if it doesn't, skip IDN resolution. Under old libidn2 this will
fail in more cases, and under new libidn2 in fewer, but should be the right
thing to do also under new libidn2.
https://tools.ietf.org/html/rfc5891#section-4.2.3.1 says that
> The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third
> and fourth character positions and MUST NOT start or end with a "-" (hyphen).
This means that libidn2 refuses to encode such names.
Let's just resolve them without trying to use IDN.
libidn2 2.0.0 supports IDNA2008, in contrast to libidn which supports IDNA2003.
https://bugzilla.redhat.com/show_bug.cgi?id=1449145
From that bug report:
Internationalized domain names exist for quite some time (IDNA2003), although
the protocols describing them have evolved in an incompatible way (IDNA2008).
These incompatibilities will prevent applications written for IDNA2003 to
access certain problematic domain names defined with IDNA2008, e.g., faß.de is
translated to domain xn--fa-hia.de with IDNA2008, while in IDNA2003 it is
translated to fass.de domain. That not only causes incompatibility problems,
but may be used as an attack vector to redirect users to different web sites.
v2:
- keep libidn support
- require libidn2 >= 2.0.0
v3:
- keep dns_name_apply_idna caller dumb, and keep the #ifdefs inside of the
function.
- use both ±IDN and ±IDN2 in the version string
Let's make sure the root domain is normalized to ".", rather than then empty string, so that there's actually something
to see on screen. Normally, we don't append a trailing dot to normalized domain names, but do so in the one exception
of the root domain, taking inspiration from UNIX file system paths.
empty non-terminals generally lack NSEC RRs, which means we can deduce their existance only from the fact that there
are other RRs that contain them in their suffix. Specifically, the NSEC proof for NODATA on ENTs works by sending the
NSEC whose next name is a suffix of the queried name to the client. Use this information properly.
Previously, we'd not allow control characters to be embedded in domain
names, even when escaped. Since cloudflare uses \000 however to
implement its synthethic minimally covering NSEC RRs, we should allow
them, as long as they are properly escaped.
All our other domain name handling functions make no destinction between
domain names that end in a dot plus a NUL, or those just ending in a
NUL. Make sure dns_name_compare_func() and dns_label_unescape_suffix()
do the same.
Be stricter when searching suitable NSEC3 RRs for proof: generalize the
check we use to find suitable NSEC3 RRs, in nsec3_is_good(), and add
additional checks, such as checking whether all NSEC3 RRs use the same
parameters, have the same suffix and so on.
Note that this is still not complete, one additional step is still
missing: when we verified that a wildcard RRset is properly signed, we
still need to do an NSEC/NSEC3 proof that no more specific RRset exists.
Make sure dns_name_normalize(), dns_name_concat(), dns_name_is_valid()
do not accept/generate invalidly long hostnames, i.e. longer than 253
characters.
Labels of zero length are not OK, refuse them early on. The concept of a
"zero-length label" doesn't exist, a zero-length full domain name
however does (representing the root domain). See RFC 2181, Section 11.
The new dns_label_escape() call now operates on a buffer passed in,
similar to dns_label_unescape(). This should make decoding a bit faster,
and nicer.
Let's change the return value to bool. If we encounter an error while
parsing, return "false" instead of the actual parsing error, after all
the specified hostname does not qualify for what the function is
supposed to test.
Dealing with the additional error codes was always cumbersome, and
easily misused, like for example in the DHCP code.
Let's also rename the functions from dns_name_root() to
dns_name_is_root(), to indicate that this function checks something and
returns a bool. Similar for dns_name_is_signal_label().
This adds dns_service_join() and dns_service_split() which may be used
to concatenate a DNS-SD service name, am SRV service type string, and a
domain name into a full resolvable DNS domain name string. If the
service name is specified as NULL, only the type and domain are
appended, to implement classic, non-DNS-SD SRV lookups.
The reverse is dns_service_split() which takes the full name, and split
it into the three components again.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
Given three DNS names this function indicates if the second argument lies
strictly between the first and the third according to the canonical DNS
name order. Note that the order is circular, so the last name is
considered to be before the first.
Intended to be called repeatedly, and returns then successive unescaped labels
from the most to the least significant (left to right).
This is slightly inefficient as it scans the string three times (two would be
sufficient): once to find the end of the string, once to find the beginning
of each label and lastly once to do the actual unescaping. The latter two
could be done in one go, but that seemed unnecessarily convoluted.