[1.10-30] Traffic Drop seen in a transparent service-chain case when one of the Service VMs is deleted

Bug #1364908 reported by Ganesha HV
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
Critical
Naveen N
R1.1
Fix Committed
Critical
Naveen N

Bug Description

1]. Setup:
========
nodea26 - cfgm
10.204.216.140 & 10.204.216.141 - ctrl
nodeg16 & nodeg26 - compute

2]. Created a service-chain between left-vn(10.10.10.0/24) and right-vn(20.20.20.0/24) by applying a Service-Instance 'trans-si-1', with 3 Service VMs.They are in Transparent mode.

3]. Started a ping from left-vm(10.10.10.5) to right-vm(20.20.20.2)

4]. Deleted Service VM 'trans-si-1_3' in Service Instance 'trans-si-1'.

5]. Ping between left-vm and right-vm fails.

The following stats seen on nodeg26 housing the left-vm :

root@nodeg26:~# flow -l
Flow table

 Index Source:Port Destination:Port Proto(V)
-------------------------------------------------------------------------
 53776<=>286204 10.10.10.5:1522 20.20.20.2:0 1 (6)
        (K(nh):101, Action:F, S(nh):35, Statistics:2116/177744)

196276<=>253160 20.20.20.2:1522 10.10.10.5:0 1 (5)
        (K(nh):74, Action:F, S(nh):44, Statistics:217/18228)

253160<=>196276 10.10.10.5:1522 20.20.20.2:0 1 (1->5)
        (K(nh):10, Action:F, E:1, S(nh):10, Statistics:2116/177744)

286204<=>53776 20.20.20.2:1522 10.10.10.5:0 1 (6)
        (K(nh):101, Action:F, S(nh):35, Statistics:0/0)

6]. Dropstats show an increase in Invalid Source :

Checksum errors 0
No Fmd 0
Ivalid VNID 0
Fragment errors 0
Invalid Source 3500

root@nodeg26:~# dropstats
.
.
Checksum errors 0
No Fmd 0
Ivalid VNID 0
Fragment errors 0
Invalid Source 3502

Getting the route to right-vm(20.20.20.2/32) in vrf 5:

root@nodeg26:~# rt --dump 5 | grep 20.20.20.2
20.20.20.2/32 32 - 44

Getting the nh for 44 shows that it is a Composite next-Hop :

root@nodeg26:~# nh --get 44
Id:044 Type:Composite Fmly: AF_INET Flags:Valid, Policy, Ecmp, Rid:0 Ref_cnt:3
        Sub NH(label): 35(56) 74(44)

Since the ECMP Index for the flow was shown as 1, need to check the nh for 74:

root@nodeg26:~# nh --get 74
Id:074 Type:Encap Fmly: AF_INET Flags:Valid, Rid:0 Ref_cnt:4
        EncapFmly:0806 Oif:13 Len:18 Data:02 00 00 00 00 02 02 00 00 00 00 01 81 00 00 01 08 00

The Oif points to the left-interface of trans-si-1_2

root@nodeg26:~# vif --get 13
vif0/13 OS: tap04d31895-82
            Type:Virtual HWaddr:00:00:5e:00:01:00 IPaddr:0
            Vrf:2 Flags:SL3L2D MTU:9160 Ref:7
            RX packets:1807 bytes:184314 errors:0
            TX packets:3884 bytes:395808 errors:0

VRF table(vlan:vrf):
1:5,

Packets are seen exiting the right-interface of trans-si-1_2 as well :

root@nodeg26:~# tcpdump -eni tapa3d7a4b4-e5
tcpdump: WARNING: tapa3d7a4b4-e5: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tapa3d7a4b4-e5, link-type EN10MB (Ethernet), capture size 65535 bytes
15:17:30.014330 02:00:00:00:00:01 > 02:00:00:00:00:02, ethertype 802.1Q (0x8100), length 102: vlan 1, p 0, ethertype IPv4, 10.10.10.5 > 20.20.20.2: ICMP echo request, id 1522, seq 2299, length 64
15:17:31.022456 02:00:00:00:00:01 > 02:00:00:00:00:02, ethertype 802.1Q (0x8100), length 102: vlan 1, p 0, ethertype IPv4, 10.10.10.5 > 20.20.20.2: ICMP echo request, id 1522, seq 2300, length 64

The nh of 35 seen in the flow shows it to be a Tunnel, which is incorrect:

root@nodeg26:~# nh --get 35
Id:035 Type:Tunnel Fmly: AF_INET Flags:Valid, MPLSoUDP, Rid:0 Ref_cnt:22
        Oif:0 Len:14 Flags Valid, MPLSoUDP, Data:00 25 90 c4 76 bd 00 25 90 c5 59 45 08 00
        Vrf:0 Sip:22.22.22.26 Dip:22.22.22.16

Naveen and Praveen are aware of the issue.

I have kept the gcore at:

http://mayamruga.englab.juniper.net/bugs/<bug-ID>

Ganesha HV (ganeshahv)
summary: - [1.10-30] Ping fails in a transparent service-chain case when one of the
- Service VMs is deleted
+ [1.10-30] Traffic Drop seen in a transparent service-chain case when one
+ of the Service VMs is deleted
information type: Proprietary → Public
tags: added: releasenote
Revision history for this message
Ganesha HV (ganeshahv) wrote :

Observation
==========
1]. Instead of deleting the SVMs, I shutdown(suspended) the SVMs . This changed the setup from ECMP to non ECMP.
2]. Powered-on the SVMs and the ECMP kicks in.
3]. Saw traffic loss because the ECMP Index is not set in reverse flow.

Praveen and Naveen are aware of the issue.

Revision history for this message
Hari Prasad Killi (haripk) wrote :
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/3453
Committed: http://github.org/Juniper/contrail-controller/commit/879c3539a26cd6fed0df55642ad4c96112c25127
Submitter: Zuul
Branch: master

commit 879c3539a26cd6fed0df55642ad4c96112c25127
Author: Naveen N <email address hidden>
Date: Sun Oct 5 23:38:47 2014 -0700

* Go thru component NH list, and pick local component NH
if there is no ecmp peer path or local ecmp mpls label.
* If a link to instance-ip is not present, retain previous
known mode inline with ip-address.
* If flow transition from non-ecmp to ecmp trap the forward flow
* Test case for same added
Closes-bug: #1364908

Change-Id: If6f78b8cc7b3a62b2bb1cd7b0d30b38aaad545c7

Naveen N (naveenn)
Changed in juniperopenstack:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.