Octavia load balancers fail to load balance services

Bug #1707180 reported by Antoni Segura Puimedon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Fix Released
Critical
Antoni Segura Puimedon

Bug Description

How to reproduce:

This is the local.conf I used for devstack:

http://paste.openstack.org/show/616754/

After it finishes stacking:

1. kubectl run --image=celebdor/kuryr-demo kuryr-demo
2. kubectl scale deploy kuryr-demo --replicas=2
3. kubectl expose deploy/kuryr-demo --port 80 --target-port 8080
4. kubectl get svc kuryr-demo

Now that you have the IP of the svc, let's say 10.0.0.174,

5. kubectl exec into a pod and curl 10.0.0.174

Expected behavior:

A message like "kuryr-demo-3222597813-jj3bd: HELLO, I AM ALIVE!!!"

Actual behavior:

Request times out.

Possible causes:

The pod ports' project is 'k8s', which is the same as the VIP port project:

[centos@octavia ~]$ openstack port list --project k8s
+--------------------------------------+---------------------------------------------------+-------------------+---------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+---------------------------------------------------+-------------------+---------------------------------------------------------------------------+--------+
| 1e70d516-5cc1-4e17-a0eb-598abfca245a | loadbalancer-72f2f854-8a3a-41ac-8aa3-a0403107e1e5 | fa:16:3e:5f:9e:da | ip_address='10.0.0.174', subnet_id='96e4c70d-10fe-4b39-af8c-38e43bb37935' | DOWN |
| 49dde037-416e-4e0c-8230-05e6851fa3bc | kubelet-octavia | fa:16:3e:64:e3:f6 | ip_address='10.0.0.71', subnet_id='6659a632-04f1-4e00-bdac-f8b14b39976a' | ACTIVE |
| 90d615e3-d22c-40f2-b6fa-0b0d764151e3 | kuryr-demo-3222597813-jj3bd | fa:16:3e:bb:34:96 | ip_address='10.0.0.74', subnet_id='6659a632-04f1-4e00-bdac-f8b14b39976a' | ACTIVE |
| c3295f7c-50fb-4841-8bac-fa43092771d9 | centos | fa:16:3e:ba:8a:f7 | ip_address='10.0.0.70', subnet_id='6659a632-04f1-4e00-bdac-f8b14b39976a' | ACTIVE |
| e43f7b17-d9eb-42cc-9724-c22743de5889 | kuryr-demo-3222597813-xv91l | fa:16:3e:1b:5d:f3 | ip_address='10.0.0.75', subnet_id='6659a632-04f1-4e00-bdac-f8b14b39976a' | ACTIVE |
+--------------------------------------+---------------------------------------------------+-------------------+---------------------------------------------------------------------------+--------+

But the actual port in the k8s service subnet is in the admin project:

| bd93494b-18f8-408f-b356-4ed23f494fe0 | octavia-lb-vrrp-344a406a-144d-4324-a0cc-97b47f5c58ea | fa:16:3e:5c:44:dc | ip_address='10.0.0.133', subnet_id='96e4c70d-10fe-4b39-af8c-38e43bb37935' | ACTIVE |

This is not necessarily a problem, but it means that the 'default' security group assigned to bd93494b-18f8-408f-b356-4ed23f494fe0 is not the same as the 'default' security group assigned to the pods, so the communication haproxy -> pods will be blocked.

Another issue is that after forcing the bd93494b-18f8-408f-b356-4ed23f494fe0 port to have the same security group as the pods, the communication when curling the VIP is still not making it to the destination pods. See this tcpdump (in the amphora haproxy netns) of curling the VIP from a differnet pod (same sg):

root@amphora-344a406a-144d-4324-a0cc-97b47f5c58ea:~# tcpdump -i any -vvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
^C11:18:40.238776 IP (tos 0x0, ttl 63, id 17321, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.70.37368 > 10.0.0.174.http: Flags [S], cksum 0xe48b (correct), seq 2387754236, win 28200, options [mss 1410,sackOK,TS val 60718933 ecr 0,nop,wscale 7], length 0
11:18:41.241038 IP (tos 0x0, ttl 63, id 17322, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.70.37368 > 10.0.0.174.http: Flags [S], cksum 0x1522 (incorrect -> 0xe0a0), seq 2387754236, win 28200, options [mss 1410,sackOK,TS val 60719936 ecr 0,nop,wscale 7], length 0
11:18:43.245746 IP (tos 0x0, ttl 63, id 17323, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.70.37368 > 10.0.0.174.http: Flags [S], cksum 0x1522 (incorrect -> 0xd8cc), seq 2387754236, win 28200, options [mss 1410,sackOK,TS val 60721940 ecr 0,nop,wscale 7], length 0
11:18:47.257054 IP (tos 0x0, ttl 63, id 17324, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.70.37368 > 10.0.0.174.http: Flags [S], cksum 0x1522 (incorrect -> 0xc920), seq 2387754236, win 28200, options [mss 1410,sackOK,TS val 60725952 ecr 0,nop,wscale 7], length 0
11:18:52.265293 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.0.174 tell 10.0.0.190, length 28
11:18:52.265460 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.0.0.174 is-at fa:16:3e:5c:44:dc (oui Unknown), length 28
11:18:55.273469 IP (tos 0x0, ttl 63, id 17325, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.0.70.37368 > 10.0.0.174.http: Flags [S], cksum 0x1522 (incorrect -> 0xa9d0), seq 2387754236, win 28200, options [mss 1410,sackOK,TS val 60733968 ecr 0,nop,wscale 7], length 0

Changed in kuryr-kubernetes:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Antoni Segura Puimedon (celebdor)
Revision history for this message
Antoni Segura Puimedon (celebdor) wrote :

If we specify the k8s-service-subnet in the member creation, then octavia does not create and attach a port to the pod subnet per each load balancer, so we spend less addresses. This approach, of course, relies on the service subnet and the pod subnet being routable.

Changed in kuryr-kubernetes:
milestone: none → pike-3
Changed in kuryr-kubernetes:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-kubernetes (master)

Reviewed: https://review.openstack.org/489157
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=ed1436f4b1c30e0da86a8f16d4c4960be12247c0
Submitter: Jenkins
Branch: master

commit ed1436f4b1c30e0da86a8f16d4c4960be12247c0
Author: Antoni Segura Puimedon <email address hidden>
Date: Mon Jul 31 11:48:21 2017 +0200

    octavia: Make Octavia ready devstack

    This patch changes the main sample devstack local.conf to use Octavia.
    In order for that to work, it does some security group changes to ensure
    that the communication from the LB to the members will work in L3 modes.

    In L2 mode, which will be added at some point after this patch Octavia
    creates a pod_subnet port per each Load Balancer with the 'default'
    security group of the 'admin' project. This means that it would not be
    allowed by the members since they use the 'default' security group from
    the 'k8s' project.

    In L3 mode, Octavia does not create a port in the members subnet and
    relies on the service and the pod subnet to be connected to the same
    router. Some changes were necessary on the lbaas handler for that.
    Specifically, changing the member subnet to be the service subnet so
    that Octavia does not go into L2 mode.

    Implements: blueprint octavia-support
    Change-Id: I993ebb0d7b82ad1140d752982013bbadf35dfef7
    Closes-Bug: #1707180
    Signed-off-by: Antoni Segura Puimedon <email address hidden>

Changed in kuryr-kubernetes:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 0.2.0

This issue was fixed in the openstack/kuryr-kubernetes 0.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.