systemd-resolved does not listen on TCP port, cannot serve large records (Cannot ping pod51041.outlook.com but can dig.)

Bug #1731522 reported by Shuhao
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
systemd
Fix Released
Unknown
systemd (Ubuntu)
Fix Released
High
Dimitri John Ledkov
Artful
Triaged
High
Dimitri John Ledkov
Bionic
Fix Released
High
Dimitri John Ledkov

Bug Description

[Impact]

 * Ubuntu hosts unable to perform queries against certain domains that respond with too big responses that do not fit over UDP protocol.
 * Solution is to enable local cachine DNS server to listen on both UDP and TCP by default

[Test Case]

 * nslookup -q=aaaa pod51041.outlook.com 127.0.0.53

Should work and return a bunch of ipv6 answers.

Note, this expects that the upstream DNS server used by resolved is "a sensitble" one, e.g. my default ISP/router did not work, whilst forcing 8.8.8.8 via network manager for this connection made it work.

[Regression Potential]

 * Given that resolved will now bind to a TCP port 53, this may result in a conflict with deployed DNS servers which do not correctly take over port 53 or bind to everything.

 * In those cases the software should be fixed to not bind to all interfaces and/or to not bind on 127.0.0.53, or change resolved to have DNSStubListener set to 'udp'.

[Other Info]

 * Original bug report

===

Trying to resolve pod51041.outlook.com's domain name seems to fail for applications:

$ ping pod51041.outlook.com
ping: pod51041.outlook.com: Temporary failure in name resolution

(Also can't access via thunderbird).

However, it seems to work directly via systemd-resolve:

$ systemd-resolve pod51041.outlook.com
pod51041.outlook.com: 40.97.160.2
                      40.97.126.50
                      132.245.38.194
                      40.97.147.194
                      132.245.41.34
                      40.97.176.2
                      40.97.150.242
                      40.97.85.114
                      40.97.120.50
                      40.97.85.2
                      40.97.176.34
                      40.97.138.242
                      40.97.166.18
                      40.97.120.162
                      40.97.119.82
                      40.97.176.18
                      40.97.85.98
                      40.97.134.34
                      40.97.84.18

-- Information acquired via protocol DNS in 2.5ms.
-- Data is authenticated: no

It also works with dig and nslookup.

Not quite sure why this is the case, I've spotted this issue upstream that looks similar: https://github.com/systemd/systemd/issues/6520. However, I'm not familiar enough with DNS to tell if it is the same issue.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: systemd 234-2ubuntu12
ProcVersionSignature: Ubuntu 4.13.0-16.19-generic 4.13.4
Uname: Linux 4.13.0-16-generic x86_64
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
ApportVersion: 2.20.7-0ubuntu3
Architecture: amd64
CurrentDesktop: MATE
Date: Fri Nov 10 13:10:02 2017
InstallationDate: Installed on 2017-11-10 (0 days ago)
InstallationMedia: Ubuntu-MATE 17.10 "Artful Aardvark" - Release amd64 (20171018)
MachineType: LENOVO 2324BB9
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.13.0-16-generic.efi.signed root=UUID=8ab6bf88-72bd-4308-941e-3b36d4d7811b ro rootflags=subvol=@ quiet splash vt.handoff=7
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/03/2016
dmi.bios.vendor: LENOVO
dmi.bios.version: G2ETA6WW (2.66 )
dmi.board.asset.tag: Not Available
dmi.board.name: 2324BB9
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrG2ETA6WW(2.66):bd03/03/2016:svnLENOVO:pn2324BB9:pvrThinkPadX230:rvnLENOVO:rn2324BB9:rvrNotDefined:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.family: ThinkPad X230
dmi.product.name: 2324BB9
dmi.product.version: ThinkPad X230
dmi.sys.vendor: LENOVO

Revision history for this message
Shuhao (shuhao) wrote :
Revision history for this message
Shuhao (shuhao) wrote :

The bug report here is likely inaccurate. I don't exactly know where the problem is.

I did some tcpdumps for port 53 traffic. I see that if I did a ping, it requests for the A records of the domain name and indeed the IP addresses are getting returned. However, weirdly, ping then requests for the A records of pod51041.outlook.com.lan. .lan is the search domain on my network as specified with /etc/resolv.conf automatically.

Furthermore, if I disable ipv6, thunderbird and firefox can access the domain, but ping still cannot.

So I don't think this bug report should be filed against systemd, but I don't really know where the problem lies.

description: updated
summary: - systemd-resolved sends SERVFAIL to host/nslookup for
- pod51041.outlook.com
+ Cannot ping pod51041.outlook.com but can dig.
Revision history for this message
Shuhao (shuhao) wrote : Re: Cannot ping pod51041.outlook.com but can dig.

Additionally, I also looked at my nsswitch.conf file, the host line in it is:

hosts: files mdns4_minimal [NOTFOUND=return] dns.

I tried hosts: files dns, and all the combinations, but nothing works.

Revision history for this message
Shuhao (shuhao) wrote :

This seems to be a duplicate of #1728560

Changed in systemd (Ubuntu):
status: New → Invalid
status: Invalid → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon)
summary: - Cannot ping pod51041.outlook.com but can dig.
+ systemd-resolved fails to fall back to TCP for large records (Cannot
+ ping pod51041.outlook.com but can dig.)
Revision history for this message
Steve Langasek (vorlon) wrote : Re: systemd-resolved fails to fall back to TCP for large records (Cannot ping pod51041.outlook.com but can dig.)

I can confirm this problem. 'dig' works because by default it's only asking for A records; but applications on ipv6-enabled clients will ask for both A and AAAA records, and if I query AAAA for this name, the response is too big to fit in a udp packet:

$ nslookup -q=aaaa pod51041.outlook.com 192.168.15.1
;; Truncated, retrying in TCP mode.
Server: 192.168.15.1
Address: 192.168.15.1#53

Non-authoritative answer:
pod51041.outlook.com has AAAA address 2603:1036:d02::2
pod51041.outlook.com has AAAA address 2603:1036:d02:6::2
pod51041.outlook.com has AAAA address 2603:1036:d02:7::2
pod51041.outlook.com has AAAA address 2a01:111:f400:5201::2
pod51041.outlook.com has AAAA address 2a01:111:f400:f370::2
pod51041.outlook.com has AAAA address 2603:1036:3:cc::2
pod51041.outlook.com has AAAA address 2603:1036:3:108::2
pod51041.outlook.com has AAAA address 2603:1036:4:6f::2
pod51041.outlook.com has AAAA address 2603:1036:4:71::2
pod51041.outlook.com has AAAA address 2603:1036:101:3a::2
pod51041.outlook.com has AAAA address 2603:1036:102:53::2
pod51041.outlook.com has AAAA address 2603:1036:102:cb::2
pod51041.outlook.com has AAAA address 2603:1036:405:3b::2
pod51041.outlook.com has AAAA address 2603:1036:804:1::2
pod51041.outlook.com has AAAA address 2603:1036:804:a::2
pod51041.outlook.com has AAAA address 2603:1036:902:a3::2
pod51041.outlook.com has AAAA address 2603:1036:906:4f::2
pod51041.outlook.com has AAAA address 2603:1036:d01:1::2

Authoritative answers can be found from:
outlook.com nameserver = ns2.msft.net.
outlook.com nameserver = ns3.msft.net.
outlook.com nameserver = ns1.msft.net.
outlook.com nameserver = ns2a.o365filtering.com.
outlook.com nameserver = ns4.msft.net.
outlook.com nameserver = ns1a.o365filtering.com.
outlook.com nameserver = ns4a.o365filtering.com.
ns1.msft.net internet address = 208.84.0.53
ns1.msft.net has AAAA address 2620:0:30::53
ns2.msft.net internet address = 208.84.2.53
ns2.msft.net has AAAA address 2620:0:32::53
ns3.msft.net internet address = 193.221.113.53
ns3.msft.net has AAAA address 2620:0:34::53
ns4.msft.net internet address = 208.76.45.53
ns4.msft.net has AAAA address 2620:0:37::53
ns1a.o365filtering.com internet address = 157.56.110.11
ns2a.o365filtering.com internet address = 157.56.116.52
ns4a.o365filtering.com internet address = 157.55.133.11

$

If I try this against systemd-resolved, I see:

$ nslookup -q=aaaa pod51041.outlook.com
;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
;; Connection to 127.0.0.53#53(127.0.0.53) for pod51041.outlook.com failed: connection refused.

$

So the problem is that systemd-resolved is not handling tcp requests at all.

Changed in systemd (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
summary: - systemd-resolved fails to fall back to TCP for large records (Cannot
- ping pod51041.outlook.com but can dig.)
+ systemd-resolved does not listen on TCP port, cannot serve large records
+ (Cannot ping pod51041.outlook.com but can dig.)
Revision history for this message
Steve Langasek (vorlon) wrote :

According to https://github.com/systemd/systemd/issues/6520 this can be worked around by setting DNSStubListener=yes in /etc/systemd/resolved.conf. This is disabled by default due to <https://github.com/systemd/systemd/pull/4061>.

It is not ideal to have systemd-resolved conflict with other nameservers listening on 0.0.0.0:53, but as a default behavior of systemd-resolved in Ubuntu, barring any other upstream fix for <https://github.com/systemd/systemd/issues/6520>, this should be our fallback position for bionic.

Changed in systemd:
status: Unknown → Fix Released
Revision history for this message
Daniel Richard G. (skunk) wrote :

Steve, Bionic still has the default (commented-out)

    #DNSStubListener=udp

in /etc/systemd/resolved.conf .

I've noticed that this breaks Kerberos KDC lookup at a large site, because the reply is quite large:

    # host -t SRV _kerberos._udp.xxx.example.com
    ;; Connection to 127.0.0.53#53(127.0.0.53) for _kerberos._udp.xxx.example.com failed: connection refused.

    # kinit <email address hidden>
    kinit: Cannot find KDC for realm "XXX.EXAMPLE.COM" while getting initial credentials

After setting DNSStubListener=yes:

    # host -t srv _kerberos._udp.xxx.example.com
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx01.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx02.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx03.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx04.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx05.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx06.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx07.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx08.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx09.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx10.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx11.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx12.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx13.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx14.xxx.example.com.
    _kerberos._udp.xxx.example.com has SRV record 0 100 88 xxxxxxx15.xxx.example.com.

    # kinit <email address hidden>
    Password for <email address hidden>:

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

This has totally slipped my radar, I'm sorry.

I will ensure this lands into bionic 18.04.0.

Changed in systemd (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: none → ubuntu-18.04
Changed in systemd (Ubuntu Artful):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: none → artful-updates
status: New → Triaged
importance: Undecided → High
Revision history for this message
Daniel Richard G. (skunk) wrote :

Thanks Dimitri, greatly appreciated. I haven't found many problems in my testing of Bionic, but this is the juiciest one so far.

description: updated
description: updated
Changed in systemd (Ubuntu Bionic):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 237-3ubuntu8

---------------
systemd (237-3ubuntu8) bionic; urgency=medium

  * Workaround captive portals not responding to EDNS0 queries (DVE-2018-0001).
    (LP: #1727237)
  * resolved: Listen on both TCP and UDP by default. (LP: #1731522)
  * Recommend networkd-dispatcher (LP: #1762386)
  * Refresh patches

 -- Dimitri John Ledkov <email address hidden> Thu, 12 Apr 2018 12:12:24 +0100

Changed in systemd (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.