Connections reset when downloading snaps from CDN

Bug #1617765 reported by Loïc Minier
42
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Snapcraft
Fix Released
Undecided
Unassigned
Software Center Agent
Fix Released
Undecided
Unassigned
snapd (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,

From a VM running on a hosted server at 195.154.102.74:
(amd64)ubuntu@localhost:~$ sudo snappy install docker/edge
Installing docker/edge
Starting download of docker
1.83 MB / 53.13 MB [>______________________________________] 3.44 % 6.77 MB/s 7s
Done
docker/edge failed to install: read tcp 77.242.195.170:443: connection reset by peer
(amd64)ubuntu@localhost:~$ sudo snappy install docker/edge
Installing docker/edge
Starting download of docker
3.56 MB / 53.13 MB [=>_____________________________________] 6.70 % 6.76 MB/s 7s
Done
docker/edge failed to install: read tcp 77.242.195.167:443: connection reset by peer
(amd64)ubuntu@localhost:~$ sudo snappy install docker/edge
Installing docker/edge
Starting download of docker
10.73 MB / 53.13 MB [======>______________________________] 20.20 % 7.90 MB/s 5s
Done
docker/edge failed to install: read tcp 77.242.195.174:443: connection reset by peer

Installing a smaller snap works:
(amd64)ubuntu@localhost:~$ sudo snappy install hello-world
Installing hello-world
Starting download of hello-world
21.60 KB / 21.60 KB [=====================================] 100.00 % 26.01 MB/s
Done
Starting download of icon for package
33.77 KB / 33.77 KB [======================================] 100.00 % 2.01 MB/s
Done
Name Date Version Developer
ubuntu-core 2016-08-11 17 ubuntu
hello-world 2016-08-28 1.0.18 canonical
webdm 2016-01-20 0.11 canonical
generic-amd64 2016-01-20 1.4 canonical

This is with snappy 15.04; vm was imported with:
virt-install --name 15.04-docker --ram 512 --vcpus 1 --disk path=$PWD/ubuntu-15.04-snappy-amd64+generic.img --import --noautoconsole --autostart --vnc

Oddly, if I run the same VM manually, it works:
sudo kvm -m 512 -redir :4200::4200 -redir :8090::80 -redir :8022::22 -nographic ubuntu-15.04-snappy-amd64+generic.img-docker

So I guess it's a weird double-NAT issue of some kind? It seems like it could affect other people, so in doubt I'm filing this even if I have a workaround.

Cheers,
- Loïc Minier

Revision history for this message
Loïc Minier (lool) wrote :

Trying again some minutes later, this is now working; I dont know how worrying this, please close if you think this is not worth investigating further. :-)

Just to make sure this wasn't caused by excessive network traffic / load, I've run this:
curl.curl http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img > foo
successfully in the virsh vm at ~5.4 MB/s.

Cheers,
- Loïc

Revision history for this message
Loïc Minier (lool) wrote :

Got it again today for a couple of minutes.

Revision history for this message
Loïc Minier (lool) wrote :

Sample IP from this morning:
- Download snap "docker" (33) from channel "edge" (read tcp 192.168.122.208:48044->77.242.195.167:443: read: connection reset by peer)

Revision history for this message
Loïc Minier (lool) wrote :

Still an annoying issue, particularly when trying to create images:
 5.77 MB / 74.40 MB [==>----------------------------------] 7.75% 10.29 MB/s 0
read tcp 192.168.122.208:56206->77.242.195.174:443: read: connection reset by peer

It happens most of the time for me unfortunately.

Revision history for this message
Bret Barker (noise) wrote :

Loic,

I have to figure this is something with your networking setup, as we are not getting reports elsewhere about conn resets. We do get occasional 500s, which is to be expected, but that's different.

Is there any consistency in the timing from start to the reset?

Revision history for this message
Janne Snabb (snabb) wrote :

I am hitting this issue too on Ubuntu 16.04 amd64 and using snapd package version 2.15.2ubuntu1.

So this is with the newer snap command but the result seems exactly the same.

Repeating does not help. I always get this:

snabb@chmaa:~$ sudo snap install canonical-livepatch
3.95 MB / 74.93 MB [==>_______________________________________________________________] 5.28 % 17.20 MB/s 4s

error: cannot perform the following tasks:
- Download snap "ubuntu-core" (423) from channel "stable" (read tcp 172.31.1.100:51356->95.172.71.44:443: read: connection reset by peer)

Every attempt produces the following syslog entries:

Oct 18 21:54:13 chmaa /usr/lib/snapd/snapd[3547]: api.go:770: Installing snap "canonical-livepatch" revision unset
Oct 18 21:54:13 chmaa snapd[3547]: 2016/10/18 21:54:13.334759 api.go:770: Installing snap "canonical-livepatch" revision unset
Oct 18 21:54:13 chmaa /usr/lib/snapd/snapd[3547]: daemon.go:174: DEBUG: uid=0;@ POST /v2/snaps/canonical-livepatch 253.27049ms 202
Oct 18 21:54:13 chmaa /usr/lib/snapd/snapd[3547]: taskrunner.go:293: DEBUG: Running task 183 on Do: Download snap "ubuntu-core" (423) from channel "stable"
Oct 18 21:54:14 chmaa /usr/lib/snapd/snapd[3547]: task.go:273: DEBUG: 2016-10-18T21:54:14Z ERROR read tcp 172.31.1.100:48050->95.172.71.39:443: read: connection reset by peer

I wonder if this is actually a problem on the server side? Temporary overload or some other malfunction.

It does not seem like a client side issue as we are getting the same result with two different client programs (snappy and snap). snap is implemented with Go and snappy is written in C so they probably don't share much if any of the code base.

I am familiar with my network environment - there is nothing which could be sending TCP RST.

Revision history for this message
Loïc Minier (lool) wrote :

I tried to debug this a bit on Thursday; it would happen almost all the time when running snapcraft in a lxd container. This meant:
- not specific to libvirt/kvm – same issue with lxd
- not specific to snapd client (Go) – same issue with snapcraft client (Python)

I then printed the macaroon and could reproduce the issue with httpie:
http --download --follow GET https://public.apps.ubuntu.com/downloadsnap/b8X2psL1ryVrPt5WEmpYiqfr5emixTd7_890.snap "Authorization:$auth"
where auth="Macaroon root=XYZ..., discharge=ABC..."

With httpie, I would reproduce the TCP RST issue almost all the time, but it would work 100% of the time when run from the host instead of within a lxd container.

Running the same queries from a home server, inside or outside lxd, it worked 100% of the time too. I might be hitting another CDN server and I might have different latency too (I have about the same amount of bandwidth to the CDN contents though).

Lastly, I took tcpdump traces of the issue and ran them through tcptrace which to the naked eye didn't show any obvious issue (no MSS, no high number of retransmits)...

Revision history for this message
Janne Snabb (snabb) wrote :

I have found several other mentions of this issue:

https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1612882
https://irclogs.ubuntu.com/2016/04/26/%23snappy.html
https://irclogs.ubuntu.com/2016/08/30/%23snappy.html

Clearly some other people are affected too, not just the two of us. :)

In some of the above links there is a workaround (for snapd case):

apt-get purge snapd
apt-get install snapd

After my 1st attempt the problem remained. However after doing the above for a 2nd time the problem went away!

Maybe it picked another non-broken CDN node after the 2nd reinstall? Or something else similar to that?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in snapd (Ubuntu):
status: New → Confirmed
Revision history for this message
Sergio Schvezov (sergiusens) wrote :

This affects classic confined builds in CI as well as we download the core snap to get the linker and others.

Revision history for this message
Kyle Fazzari (kyrofa) wrote :

Just got this twice from an lxc container here. It succeeded the third time. The mtr report to one of the failed IP addresses looks like this:

$ mtr --report 64.94.115.6
Start: Mon Apr 17 20:10:51 2017
HOST: growing-urchin Loss% Snt Last Avg Best Wrst StDev
  1.|-- 10.218.103.1 0.0% 10 0.1 0.1 0.1 0.2 0.0
  2.|-- router.asus.com 0.0% 10 0.9 1.0 0.9 1.8 0.0
  3.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
  4.|-- dtr01wlwlwa-tge-0-2-0-0.w 0.0% 10 11.5 13.4 8.5 37.3 8.5
  5.|-- dtr04knwcwa-bue-1.knwc.wa 10.0% 10 20.1 12.3 10.0 20.1 3.0
  6.|-- dtr03knwcwa-bue-2.knwc.wa 0.0% 10 10.9 11.7 10.1 15.2 1.5
  7.|-- 96-34-111-218.static.unas 0.0% 10 15.6 14.4 12.9 16.1 0.9
  8.|-- 96-34-108-178.static.unas 0.0% 10 19.8 18.4 13.5 22.3 3.1
  9.|-- bbr01sttlwa-tge-0-1-0-4.s 0.0% 10 33.6 23.2 17.3 33.6 5.7
 10.|-- prr01sttlwa-bue-1.sttl.wa 0.0% 10 17.2 17.8 15.3 25.2 2.7
 11.|-- seattle-ix.pnap.net 0.0% 10 17.7 16.1 15.2 17.9 0.8
 12.|-- core1.po2.inapvox-31.sea. 0.0% 10 17.4 19.2 17.4 22.5 1.6
 13.|-- border2.ae1-bbnet1.sef.pn 0.0% 10 20.9 19.9 17.3 27.6 2.9
 14.|-- cdn-gw1.ve2.sef004.intern 0.0% 10 15.7 16.8 15.7 18.0 0.6
 15.|-- cdce.sef004.internap.com 0.0% 10 17.9 19.0 16.9 21.7 1.1

Revision history for this message
Kyle Fazzari (kyrofa) wrote :

Oh my. That looks dreadful.

Revision history for this message
Kyle Fazzari (kyrofa) wrote :
Revision history for this message
Игорь (ifree92) wrote :

Hmm... something wrong.

➜ ~ mtr --report 95.172.71.42
Start: Tue Apr 18 00:07:34 2017
HOST: ifreevmw Loss% Snt Last Avg Best Wrst StDev
  1.|-- 192.168.152.2 0.0% 10 0.3 0.5 0.2 1.4 0.0
  2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
➜ ~

Revision history for this message
Игорь (ifree92) wrote :

but..

➜ ~ ping -c 5 95.172.71.42
PING 95.172.71.42 (95.172.71.42) 56(84) bytes of data.
64 bytes from 95.172.71.42: icmp_seq=1 ttl=128 time=37.6 ms
64 bytes from 95.172.71.42: icmp_seq=2 ttl=128 time=38.3 ms
64 bytes from 95.172.71.42: icmp_seq=3 ttl=128 time=38.9 ms
64 bytes from 95.172.71.42: icmp_seq=4 ttl=128 time=39.5 ms
64 bytes from 95.172.71.42: icmp_seq=5 ttl=128 time=39.4 ms

--- 95.172.71.42 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4010ms
rtt min/avg/max/mdev = 37.688/38.803/39.543/0.715 ms

Revision history for this message
Игорь (ifree92) wrote :

The cause of my post was a problem with installation any package.

➜ snap --version
snap 2.22.6
snapd 2.22.6
series 16
ubuntu 16.04
kernel 4.8.0-46-generic
➜ sudo snap install minesweeper
error: cannot perform the following tasks:
- Download snap "core" (1577) from channel "stable" (read tcp 192.168.152.130:40748->95.172.71.42:443: read: connection reset by peer)
➜ tmp sudo snap install uappexplorer-cli
error: cannot perform the following tasks:
- Download snap "core" (1577) from channel "stable" (read tcp 192.168.152.130:50130->95.172.71.38:443: read: connection reset by peer)

Revision history for this message
Michael Vogt (mvo) wrote :

IMO we should at least retry on connection reset: https://github.com/snapcore/snapd/pull/3197

Revision history for this message
Robert Kasanický (robokaso) wrote :

Is there a workaround solution to this problem? I suppose downloading the snap manually and installing from a local file should do the trick (though I can't find any instructions how to actually do that).

Revision history for this message
Sergio Schvezov (sergiusens) wrote :

this was alleviated on the snapcraft side a while back and later on, the store CDN was fixed to not cause these errors anymore.

Changed in snapcraft:
status: New → Fix Released
Changed in software-center-agent:
status: New → Fix Released
Revision history for this message
Usman Khwaja (u-khwaja) wrote :

I am getting this on Fedore 29 for a couple of hours. Cannot install any software.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.