Enable keepalived VRRP health check again

Bug #1825966 reported by Hua Zhang
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Gateway Charm
Fix Released
Undecided
Unassigned
OpenStack Neutron Open vSwitch Charm
Fix Released
Undecided
Unassigned

Bug Description

If you wanted to have vrrp watch the external networking interface today, the option ha_vrrp_health_check_interval [1] can help re-trigger the election process to make the system recover automatically, so we should enable it.

In fact, we've tried to enable it before [2], but then we have had to revert it [3] due to instability issues [4] in previous releases of Openstack.

Maybe the previous instability issue [4] was caused by another keepalived issue mentioned in the comment [5], today I tested this option again by the following detailed steps, it works.

# first create a neutron l3ha test env, then continue to do:
git clone https://github.com/openstack/charm-neutron-gateway.git neutron-gateway
cd neutron-gateway/
git fetch https://review.opendev.org/openstack/charm-neutron-gateway refs/changes/33/601533/1 && git format-patch -1 --stdout FETCH_HEAD > lp1732154.patch
git checkout master
patch -p1 < lp1732154.patch
juju upgrade-charm neutron-gateway --path $PWD

# install the script check_router_vrrp_transitions.sh in two neutron-gateway test nodes by:
wget https://gist.githubusercontent.com/dosaboy/cf8422f16605a76affa69a8db47f0897/raw/8e045160440ecf0f9dc580c8927b2bff9e9139f6/check_router_vrrp_transitions.sh
chmod +x check_router_vrrp_transitions.sh

This is test result, I haven't seen instability issue [4] now.

$ date; neutron l3-agent-list-hosting-router $(neutron router-show provider-router -c id -f value); juju ssh neutron-gateway/0 -- bash /home/ubuntu/check_router_vrrp_transitions.sh; juju ssh neutron-gateway/1 -- bash /home/ubuntu/check_router_vrrp_transitions.sh; sleep 40; date; neutron l3-agent-list-hosting-router $(neutron router-show provider-router -c id -f value); juju ssh neutron-gateway/0 -- bash /home/ubuntu/check_router_vrrp_transitions.sh; juju ssh neutron-gateway/1 -- bash /home/ubuntu/check_router_vrrp_transitions.sh;
Tue Apr 23 03:11:28 UTC 2019
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=16716, first=Apr-23-01:48:20, last=Apr-23-01:57:05) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=24269, first=Apr-23-02:22:16, last=Apr-23-02:22:28) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=29188, first=Apr-23-02:46:03, last=Apr-23-02:46:03) had 1 transition(s) (state=BACKUP)
Done.
Connection to 10.5.0.42 closed.
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=31249, first=Apr-23-01:48:26, last=Apr-23-02:21:53) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=6187, first=Apr-23-02:22:29, last=Apr-23-02:45:33) had 2 transition(s) (state=MASTER)
Done.
Connection to 10.5.0.36 closed.
Tue Apr 23 03:12:12 UTC 2019
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=16716, first=Apr-23-01:48:20, last=Apr-23-01:57:05) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=24269, first=Apr-23-02:22:16, last=Apr-23-02:22:28) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=29188, first=Apr-23-02:46:03, last=Apr-23-02:46:03) had 1 transition(s) (state=BACKUP)
Done.
Connection to 10.5.0.42 closed.
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=31249, first=Apr-23-01:48:26, last=Apr-23-02:21:53) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=6187, first=Apr-23-02:22:29, last=Apr-23-02:45:33) had 2 transition(s) (state=MASTER)
Done.
Connection to 10.5.0.36 closed.

So I would suggest that we focus on getting the vrrp healthcheck support added back to the charms so that we can have the gateway address pinged to monitor southbound network as well.

[1] https://docs.openstack.org/ocata/networking-guide/deploy-ovs-ha-vrrp.html#keepalived-vrrp-health-check
[2] https://review.opendev.org/#/c/601533/
[3] https://review.opendev.org/#/c/603347/
[4] https://bugs.launchpad.net/neutron/+bug/1793102
[5] https://bugs.launchpad.net/neutron/+bug/1793102/comments/5

Tags: sts
Revision history for this message
Hua Zhang (zhhuabj) wrote :

submitted the patch for review - https://review.opendev.org/#/c/657719/

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Could you submit a patch for charm-neutron-openvswitch as well?

It supports enable-dvr-snat=True option which allows running gateway components on compute nodes (as of 19.04 charms, see https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1808045).

Revision history for this message
Hua Zhang (zhhuabj) wrote :

@dmitriis, thank you, this is patch for charm-neutron-openvswitch - https://review.opendev.org/#/c/657774

Hua Zhang (zhhuabj)
Changed in charm-neutron-gateway:
status: New → In Progress
Changed in charm-neutron-openvswitch:
status: New → In Progress
Hua Zhang (zhhuabj)
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (master)

Reviewed: https://review.opendev.org/657719
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=4c150529b5725e25bfe53d7db15f2f11410c6111
Submitter: Zuul
Branch: master

commit 4c150529b5725e25bfe53d7db15f2f11410c6111
Author: Zhang Hua <email address hidden>
Date: Wed May 8 09:52:12 2019 +0800

    Enable keepalived VRRP health check

    If you want to have vrrp watch the external networking interface
    today, the option ha_vrrp_health_check_interval [1] detects a failure
    it re-triggers the transitional change - which works if the external
    physical interface fails because the ping will fail.

    In fact, we've tried to enable it before [2], but then we had to
    revert it [3] due to instability issues [4] in previous releases of
    OpenStack. Maybe the previous instability issue [4] was caused by
    another keepalived issue mentioned in the comment [5], now I have
    tested this option again, it works.

    This is how neutron allows monitoring southbound network today, so
    I would suggest we add this capability into the charm again.

    [1] https://docs.openstack.org/ocata/networking-guide/ \
            deploy-ovs-ha-vrrp.html#keepalived-vrrp-health-check
    [2] https://review.opendev.org/#/c/601533/
    [3] https://review.opendev.org/#/c/603347/
    [4] https://bugs.launchpad.net/neutron/+bug/1793102
    [5] https://bugs.launchpad.net/neutron/+bug/1793102/comments/5

    Change-Id: If2947e7640545cb9a48215afb9b2439fdc33c645
    Closes-Bug: 1825966

Changed in charm-neutron-gateway:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (stable/19.04)

Fix proposed to branch: stable/19.04
Review: https://review.opendev.org/660574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-neutron-gateway (stable/19.04)

Change abandoned by Zhang Hua (<email address hidden>) on branch: stable/19.04
Review: https://review.opendev.org/660574
Reason: Hi Alex and icey, I'm going to abandon this change, thanks for the review

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/657774
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=786906d559d88efbf5b7e95f67ceb0441d653cbd
Submitter: Zuul
Branch: master

commit 786906d559d88efbf5b7e95f67ceb0441d653cbd
Author: Zhang Hua <email address hidden>
Date: Wed May 8 16:45:29 2019 +0800

    Enable keepalived VRRP health check

    If you want to have vrrp watch the external networking interface
    today, the option ha_vrrp_health_check_interval [1] detects a failure
    it re-triggers the transitional change - which works if the external
    physical interface fails because the ping will fail.

    In fact, we've tried to enable it before [2], but then we had to
    revert it [3] due to instability issues [4] in previous releases of
    OpenStack. Maybe the previous instability issue [4] was caused by
    another keepalived issue mentioned in the comment [5], now I have
    tested this option again, it works.

    This is how neutron allows monitoring southbound network today, so
    I would suggest we add this capability into the charm again.

    This is a patch for charm-neutron-openvswitch side to support
    enable-dvr-snat=True option which allows running gateway components
    on compute nodes (as of 19.04 charms, see [6])

    [1] https://docs.openstack.org/ocata/networking-guide/ \
            deploy-ovs-ha-vrrp.html#keepalived-vrrp-health-check
    [2] https://review.opendev.org/#/c/601533/
    [3] https://review.opendev.org/#/c/603347/
    [4] https://bugs.launchpad.net/neutron/+bug/1793102
    [5] https://bugs.launchpad.net/neutron/+bug/1793102/comments/5
    [6] https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1808045

    Change-Id: Ic7e751dd876cc67805e841e109a4f955ad80be47
    Closes-Bug: 1825966

Changed in charm-neutron-openvswitch:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-neutron-gateway:
milestone: none → 19.07
Changed in charm-neutron-openvswitch:
milestone: none → 19.07
David Ames (thedac)
Changed in charm-neutron-gateway:
status: Fix Committed → Fix Released
Changed in charm-neutron-openvswitch:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.