https://tools.ietf.org/html/rfc1035#section-2.3.1 says (approximately)
that only letters, numbers, and non-leading non-trailing dashes are allowed
(for entries with A/AAAA records). We set no restrictions.
hosts(5) says:
> Host names may contain only alphanumeric characters, minus signs ("-"), and
> periods ("."). They must begin with an alphabetic character and end with an
> alphanumeric character.
nss-files follows those rules, and will ignore names in /etc/hosts that do not
follow this rule.
Let's follow the documented rules for /etc/hosts. In particular, this makes us
consitent with nss-files, reducing surprises for the user.
I'm pretty sure we should apply stricter filtering to names received over DNS
and LLMNR and MDNS, but it's a bigger project, because the rules differ
depepending on which level the label appears (rules for top-level names are
stricter), and this patch takes the minimalistic approach and only changes
behaviour for /etc/hosts.
Escape syntax is also disallowed in /etc/hosts, even if the resulting character
would be allowed. Other tools that parse /etc/hosts do not support this, and
there is no need to use it because no allowed characters benefit from escaping.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
This changes dns_name_between() to deal properly with checking whether B
is between A and C if A and C are equal. Previously we simply returned
-EINVAL in this case, refusing checking. With this change we correct
behaviour: if A and C are equal, then B is "between" both if it is
different from them. That's logical, since we do < and > comparisons, not
<= and >=, and that means that anything "right of A" and "left of C"
lies in between with wrap-around at the ends. And if A and C are equal
that means everything lies between, except for A itself.
This fixes handling of domains using NSEC3 "white lies", for example the
.it TLD.
Fixes: #7421
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
While working on the gateway→_gateway conversion, I noticed that
libidn2 strips the leading underscore in some names.
https://gitlab.com/libidn/libidn2/issues/30 was resolved in
05d753ea69,
which disabled "STD3 ASCII rules" by default, i.e. disabled stripping
of underscores. So the situation is that with previously released libidn2
versions we would get incorrect behaviour, and once new libidn2 is released,
we should be OK.
Let's implement a simple test which checks that the name survives the
roundtrip, and if it doesn't, skip IDN resolution. Under old libidn2 this will
fail in more cases, and under new libidn2 in fewer, but should be the right
thing to do also under new libidn2.
https://tools.ietf.org/html/rfc5891#section-4.2.3.1 says that
> The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third
> and fourth character positions and MUST NOT start or end with a "-" (hyphen).
This means that libidn2 refuses to encode such names.
Let's just resolve them without trying to use IDN.
libidn2 2.0.0 supports IDNA2008, in contrast to libidn which supports IDNA2003.
https://bugzilla.redhat.com/show_bug.cgi?id=1449145
From that bug report:
Internationalized domain names exist for quite some time (IDNA2003), although
the protocols describing them have evolved in an incompatible way (IDNA2008).
These incompatibilities will prevent applications written for IDNA2003 to
access certain problematic domain names defined with IDNA2008, e.g., faß.de is
translated to domain xn--fa-hia.de with IDNA2008, while in IDNA2003 it is
translated to fass.de domain. That not only causes incompatibility problems,
but may be used as an attack vector to redirect users to different web sites.
v2:
- keep libidn support
- require libidn2 >= 2.0.0
v3:
- keep dns_name_apply_idna caller dumb, and keep the #ifdefs inside of the
function.
- use both ±IDN and ±IDN2 in the version string
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
Let's make sure the root domain is normalized to ".", rather than then empty string, so that there's actually something
to see on screen. Normally, we don't append a trailing dot to normalized domain names, but do so in the one exception
of the root domain, taking inspiration from UNIX file system paths.
Move IDNA logic out of the normal domain name processing, and into the bus frontend calls. Previously whenever
comparing two domain names we'd implicitly do IDNA conversion so that "pöttering.de" and "xn--pttering-n4a.de" would be
considered equal. This is problematic not only for DNSSEC, but actually also against he IDNA specs.
Moreover it creates problems when encoding DNS-SD services in classic DNS. There, the specification suggests using
UTF8 encoding for the actual service name, but apply IDNA encoding to the domain suffix.
With this change IDNA conversion is done only:
- When the user passes a non-ASCII hostname when resolving a host name using ResolveHostname()
- When the user passes a non-ASCII domain suffix when resolving a service using ResolveService()
No IDNA encoding is done anymore:
- When the user does raw ResolveRecord() RR resolving
- On the service part of a DNS-SD service name
Previously, IDNA encoding was done when serializing names into packets, at a point where information whether something
is a label that needs IDNA encoding or not was not available, but at a point whether it was known whether to generate a
classic DNS packet (where IDNA applies), or an mDNS/LLMNR packet (where IDNA does not apply, and UTF8 is used instead
for all host names). With this change each DnsQuery object will now maintain two copies of the DnsQuestion to ask: one
encoded in IDNA for use with classic DNS, and one encoded in UTF8 for use with LLMNR and MulticastDNS.
empty non-terminals generally lack NSEC RRs, which means we can deduce their existance only from the fact that there
are other RRs that contain them in their suffix. Specifically, the NSEC proof for NODATA on ENTs works by sending the
NSEC whose next name is a suffix of the queried name to the client. Use this information properly.
Having this information available is useful when we need to check whether various RRs are suitable for proofs. This
information is stored in the RRs as number of labels to skip from the beginning of the owner name to reach the
synthesizing source/signer. Simple accessor calls are then added to retrieve the signer/source from the RR using this
information.
This also moves validation of a a number of RRSIG parameters into a new call dnssec_rrsig_prepare() that as side-effect
initializes the two numeric values.
Previously, we'd not allow control characters to be embedded in domain
names, even when escaped. Since cloudflare uses \000 however to
implement its synthethic minimally covering NSEC RRs, we should allow
them, as long as they are properly escaped.