RHOSP13- R5.0-182- generic linklocal service verification is failing

Bug #1786666 reported by alok kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Won't Fix
High
alok kumar
R5.0
Won't Fix
High
alok kumar
Trunk
Won't Fix
High
alok kumar

Bug Description

test case: test_generic_link_local_service

This test adds link local service with 169.254.1.2:8084 and try to "wget http://169.254.1.2:8084" which is getting timedout.

ubuntu@ctest-nova-client-vm-26611523:~$ wget http://169.254.1.2:8084
--2018-08-12 07:06:17-- http://169.254.1.2:8084/
Connecting to 169.254.1.2:8084... ^C
ubuntu@ctest-nova-client-vm-26611523:~$ curl -vI http://169.254.1.2:8084
* Rebuilt URL to: http://169.254.1.2:8084/
* Hostname was NOT found in DNS cache
* Trying 169.254.1.2...
^C
ubuntu@ctest-nova-client-vm-26611523:~$ ping 169.254.1.2
PING 169.254.1.2 (169.254.1.2) 56(84) bytes of data.

--- 169.254.1.2 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms

Traceback (most recent call last):
  File "tcutils/fabutils.py", line 111, in remote_cmd
    output = _run(cmd, timeout=timeout, pty=not as_daemon, shell=shell)
  File "/usr/lib/python2.7/site-packages/fabric/network.py", line 633, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/fabric/operations.py", line 1042, in run
    shell_escape=shell_escape)
  File "/usr/lib/python2.7/site-packages/fabric/operations.py", line 911, in _run_command
    stderr=stderr, timeout=timeout)
  File "/usr/lib/python2.7/site-packages/fabric/operations.py", line 795, in _execute
    worker.raise_if_needed()
  File "/usr/lib/python2.7/site-packages/fabric/thread_handling.py", line 12, in wrapper
    callable(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/fabric/io.py", line 31, in output_loop
    OutputLooper(*args, **kwargs).loop()
  File "/usr/lib/python2.7/site-packages/fabric/io.py", line 86, in loop
    raise CommandTimeout
CommandTimeout

alok kumar (kalok)
tags: added: sanityblocker
alok kumar (kalok)
tags: added: vrouter
Jeba Paulaiyan (jebap)
tags: added: contrail-cloud
Revision history for this message
Sivakumar Ganapathy (hotlava51) wrote :

This bug was filed very late for 5.0.1 and do not have access to the setup in 5.0.1 before the release FCS. Hence moving it to 5.0.2.

Revision history for this message
alok kumar (kalok) wrote :

on further debugging the issue(on dpdk setup), it's found that we are not able to edit /etc/hosts file on agent container with root user too however write permission is given to root user.

(vrouter-agent)[root@overcloud-contraildpdk-0 /]$ ls -l /etc/hosts
-rw-r--r--. 1 root root 7191 Aug 16 08:29 /etc/hosts
(vrouter-agent)[root@overcloud-contraildpdk-0 /]$ whoami
root
(vrouter-agent)[root@overcloud-contraildpdk-0 /]$ echo "10.0.0.18 overcloud-contrailcontroller-2-test" >> /etc/hosts
bash: /etc/hosts: Read-only file system

(vrouter-agent)[root@overcloud-contraildpdk-0 /]$ mount|grep /etc/hosts
/dev/sda2 on /etc/hosts type xfs (ro,relatime,seclabel,attr2,inode64,noquota)
(vrouter-agent)[root@overcloud-contraildpdk-0 /]$ lsattr /etc/hosts
---------------- /etc/hosts

tried editing the file manually too, but got permission issue.
Note: we are able to edit the same file on vrouter-dpdk docker.

Revision history for this message
alok kumar (kalok) wrote :

Setup Info:
undercloud hypervisor: 10.204.217.133
undercloud VM: 192.168.122.179

Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
Michael Henkel (mhenkel-3) wrote :

this is a problem with the link local service name. As long as the service is called metadata it works:

./provision_linklocal.py --api_server_port 8082 \
--api_server_ip 10.1.0.15 \
--linklocal_service_name metadata \
--linklocal_service_ip 169.254.169.10 \
--linklocal_service_port 8082 \
--ipfabric_service_ip 10.1.0.15 \
--ipfabric_service_port 8082 \
--admin_tenant_name admin \
--admin_user admin \
--admin_password YjjZ72bzeRvTPHwwmXRQNBBTJ \
--oper add

$ curl http://169.254.169.10:8082
{"href": "http://10.1.0.15", "links": [{"link": {"href": "http://10.1.0.15/documentation/index.html", "name": "documentation", "rel": "documentation", "method": "GET"}}, {"link": {"href": "http://10.1.0.15/config-root", "name": "config-root", "rel": "root", "method": null}}, {"link": {"href": "http://10.1.0.15/global-an......

changing the name to something else (metadata2):

./provision_linklocal.py --api_server_port 8082 \
--api_server_ip 10.1.0.15 \
--linklocal_service_name metadata2 \
--linklocal_service_ip 169.254.169.10 \
--linklocal_service_port 8082 \
--ipfabric_service_ip 10.1.0.15 \
--ipfabric_service_port 8082 \
--admin_tenant_name admin \
--admin_user admin \
--admin_password YjjZ72bzeRvTPHwwmXRQNBBTJ \
--oper add

$ curl http://169.254.169.10:8082
curl: (7) couldn't connect to host

I don't think that this is a problem with /etc/hosts

next hops for a working (169.254.169.10) and a non-working (169.254.169.254) link local service look the same:

[root@overcloud-novacompute-0 utils]# rt --get 169.254.169.254/32 --vrf 2
Match 169.254.169.254/32 in vRouter inet4 table 0/2/unicast

Flags: L=Label Valid, P=Proxy ARP, T=Trap ARP, F=Flood ARP
vRouter inet4 routing table 0/2/unicast
Destination PPL Flags Label Nexthop Stitched MAC(Index)
169.254.169.254/32 0 PTF - 11 -

[root@overcloud-novacompute-0 utils]# rt --get 169.254.169.10/32 --vrf 2
Match 169.254.169.10/32 in vRouter inet4 table 0/2/unicast

Flags: L=Label Valid, P=Proxy ARP, T=Trap ARP, F=Flood ARP
vRouter inet4 routing table 0/2/unicast
Destination PPL Flags Label Nexthop Stitched MAC(Index)
169.254.169.10/32 0 PTF - 11 -

dropstats Invalid NH counter increases for the non-working.

The vrouter agent team needs to take a look

Revision history for this message
Yuvaraja Mariappan (ymariappan) wrote :
Download full text (3.8 KiB)

It is working as expected. For more information, please see the below mail.

From: Michael Henkel <email address hidden>
Date: Thursday, October 18, 2018 at 12:22 AM
To: Yuvaraja Mariappan <email address hidden>, Jeba Paulaiyan <email address hidden>
Cc: Sachchidanand Vaidya <email address hidden>, Alok Kumar <email address hidden>
Subject: Re: generic linklocal service failing

Hi Yuvaraja,

Thanks, that makes sense.
The Fabric Address can only be a destination which is routed/reachable through the vhost0 interface and not any other interface on the compute node.
I verified it by setting the route to the external_api (public) network through vhost0 and with that I can reach the api server:

{
· linklocal_service_name: bla
· ip_fabric_service_ip: [
o 10.2.0.15
]
· linklocal_service_ip: 169.254.169.111
· ip_fabric_service_port: 8082
· ip_fabric_DNS_service_name:
· linklocal_service_port: 8082
· lls_fab_address_ip: IP
}

10.2.0.0/24 is the external/public network.,

Route on the compute node:

[root@overcloud-novacompute-0 heat-admin]# ip route get 10.2.0.15
10.2.0.15 via 10.0.0.1 dev vhost0 src 10.0.0.11

(10.0.0.1 is a GW which has to have a route to 10.2.0.15)

On the VM:
$ curl 169.254.169.111:8082|more
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{"href": "http://169.254.169.111:8082", "links": [{"link": {"href": "http://169.254.169.111:8082/documentation/index.html", "name": "documentation", "rel": "documentation", "method": "GET"}}, {"link": {"href": "http://169.254.169.111:8082/config-root", "name": "config-root", "rel": "root", "method": null}}, {"link": {"href": "http://169.254.169.111:8082/global-analytics-configs", "name": "global-analytics-config", "rel": "collection", "method": null}}, {"link": {"href": "http://169.254.169.111:8082/physical-interfaces", "name": "physical-interface", "rel": "collection", "method": null}},….

We need to document that internal_api IPs cannot be used as Fabric Address destinations IF tenant and internal api are on different interfaces on the compute node.

The good thing is that I learnt that metadata is not ‘just’ a name for the service.
A feature request for the future would be to set a Link Local Service to either NAT or proxy. In proxy we can then even use internal_api addresses.

@Jeba the bug can be closed as it is not a bug.

Regards,
Michael

From: Yuvaraja Mariappan <email address hidden>
Date: Thursday, October 18, 2018 at 6:04 AM
To: Michael Henkel <email address hidden>
Cc: Sachchidanand Vaidya <email address hidden>, Alok Kumar <email address hidden>, Jeba Paulaiyan <email address hidden>
Subject: Re: generic linklocal service failing

Hi Michael,

                It seems, it is a known behavior.
                Agent works as a web proxy if the link local service name is metadata and linux routing is used to forward the packet to the link-local endpoint.
For other names, agent setups flow with natting and vrouter does the packet forwa...

Read more...

Jeba Paulaiyan (jebap)
tags: added: releasenote
Changed in juniperopenstack:
status: New → Won't Fix
Changed in juniperopenstack:
assignee: Sachchidanand Vaidya (vaidyasd) → alok kumar (kalok)
Revision history for this message
alok kumar (kalok) wrote :

In script we were not setting ip_fabric_service_ip while creating link local service.
after setting it test passed in rhosp dpdk setup. Not sure how the test used to pass in other deployments without setting ip_fabric_service_ip.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.