DNS resolution of irc.freenode.net or chat.freenode.net fails

Bug #1665394 reported by Pirouette Cacahuète
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I am running Ubuntu 16.10 (from a base install upgraded several times), with proposed-updates enabled. Recently, I started having problems where I am unable to resolve irc.freenode.net.

This is unrelated to the networks. I have been able to confirm that systemd-resolved is able to perform the name resolution. Yet, getaddrinfo() call returns -EAI_AGAIN error.

This can be reproduced using getent:
> $ getent ahosts irc.freenode.net
> $

While testing v4 or v6 directly returns something valid:
> $ getent ahostsv4 irc.freenode.net
> 38.229.70.22 STREAM chat.freenode.net
> 38.229.70.22 DGRAM
> 38.229.70.22 RAW
> 130.239.18.119 STREAM
> 130.239.18.119 DGRAM
> 130.239.18.119 RAW
[...]
> $

> $ getent ahostsv6 irc.freenode.net
> 2001:5a0:3604:1:64:86:243:181 STREAM chat.freenode.net
> 2001:5a0:3604:1:64:86:243:181 DGRAM
> 2001:5a0:3604:1:64:86:243:181 RAW
> 2001:6b0:e:2a18::118 STREAM
> 2001:6b0:e:2a18::118 DGRAM
> 2001:6b0:e:2a18::118 RAW
[...]
> $

dpkg.log shows there has been some upgrade of systemd recently, though I cannot tell for sure if that directly relates to the problem, I do use suspend-to-ram a lot and reboot not that often.

> $ lsb_release -a
> LSB Version: core-9.20160110ubuntu5-amd64:core-9.20160110ubuntu5-noarch:printing-9.20160110ubuntu5-amd64:printing-9.20160110ubuntu5-noarch:security-9.20160110ubuntu5-amd64:security-9.20160110ubuntu5-noarch
> Distributor ID: Ubuntu
> Description: Ubuntu 16.10
> Release: 16.10
> Codename: yakkety

Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

A workaround is to remove "[NOTFOUND=return] resolve" from /etc/nsswitch.conf, though it may break other things (but nothing visible so far).

affects: glibc (Ubuntu) → systemd (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
koffeinfriedhof (koffeinfriedhof) wrote :

Perhaps this is an issue in iputils/ping if the response is to big. You can try this with normal ping command too.

ping c1 chat.freeenode.net results in a "ping: chat.freenode.net: Temporary failure in name resolution"

Using expicit -4 or -6 does work instead.

Workaround as mentioned above: switch "resolve [!UNAVAIL=return]" to "resolve [UNAVAIL=return]" or place "resolve [!UNAVAIL=return]" after "dns".

Revision history for this message
koffeinfriedhof (koffeinfriedhof) wrote :

iputils/ping was wrong.

On machines with this bug a "getent hosts chat.freenode.net" results in no output! Perhaps we have to look at getaddrinfo()!

Thanks to the french ubuntu-channel #ubuntu-fr for this hint :)

no longer affects: iputils (Ubuntu)
Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

Adding back "[NOTFOUND=return] resolve" to /etc/nsswitch.conf, setting DNSSEC=no in /etc/systemd/resolved.conf, restarting systemd-resolved process and then:
> $ getent ahosts irc.freenode.net
Gives proper list of hosts

Commenting again the DNSSEC line of resolved.conf, restarting the service, and the resolution is still working. I'm a bit puzzled.

Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

So, comment #5 is somehow invalid, I got the whole story on why it started to work again: Schiggn (comment #4) contacted some Freenode staff, and they changed the DNS records to expose less addresses. This indeed proved to workaround the issue.

Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

It is likely to be an upstream problem. Given a zone with A/AAAA records with enough addresses, I can reproduce the problem.

Hacking in systemd source code, I could find those STR:
 - disable "resolve" in nsswitch.conf
 - add a resolution against a record with a lot of addresses in src/libsystemd/sd-resolve/test-resolve.c
 - build test-resolve tool
 - run ./test-resolve

e.g.:
> r = sd_resolve_getaddrinfo(resolve, &q2, "test.neteffmon.eu", NULL, &hints, getaddrinfo_handler, NULL);

Running this results in:
> Assertion '*length <= maxlength' failed at src/libsystemd/sd-resolve/sd-resolve.c:203, function serialize_addrinfo(). Aborting.

Changing BUFSIZE in src/libsystemd/sd-resolve/sd-resolve.c to something bigger (e.g. 65536U), rebuild and restart ./test-resolve, the resolution succeeds.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.