ovn service agent lost for 20.04 + Ussuri deployment

Bug #2070332 reported by Zhanglei Mao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-chassis
New
Undecided
Unassigned

Bug Description

In my all 4 deployments, one of host was not listed in "openstack network agent list" and launch VM on this host would fail, the most related effor info is:

  Error executing command: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot
   find Chassis_Private with name=nc7.maas

The lost host was random and to redeploy after remove this node can fix.

description: updated
Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote (last edit ):

  ovn-chassis: #pure ovn nodes
    bindings:
      ? ''
      : oam-space
      certificates: internal-space
      data: oam-space
    channel: 22.03/stable
    charm: ovn-chassis
    options:
      bridge-interface-mappings: br-provider:bond1
      ovn-bridge-mappings: physnet1:br-provider physnet2:br-provider
      enable-dpdk: False
      prefer-chassis-as-gw: True
-----------
  ovn-chassis-dpdk: #for dpdk+ovn nodes
    bindings:
      ? ''
      : oam-space
      certificates: internal-space
      data: oam-space
    channel: 22.03/stable
    charm: ovn-chassis
    options:
      enable-dpdk: True
      #dpdk-bond-config: ":balance-tcp:active:fast" #default&recommend
      ovn-bridge-mappings: physnet2:br-provider
      bridge-interface-mappings: br-provider:dpdk-bond1
      dpdk-bond-mappings: >
        dpdk-bond1:xxxx # hiddened
      dpdk-driver: vfio-pci #
      dpdk-socket-cores: 2 #charm default is 1 per NUMA
      dpdk-socket-memory: 2048 #charm default is 1G per NUMA;

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote (last edit ):

The pure ovn (not dpdk) also happened for agent lost

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

The /varlog for 2 hosts ( one got agent lost and one is normal) can be found at google driver at
https://drive.google.com/drive/folders/1v2UAJSv7jqlWzC8zW6F_z12KOn181rHX

description: updated
Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

In my latest deployment, the nc7 was lost. The agent list output picture is attached.

description: updated
Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote (last edit ):

https://bugs.launchpad.net/neutron/+bug/1905700
It might be related but fixing have been there.
  python3-neutron 2:16.4.2-0ubuntu6.4

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote (last edit ):

The nuetron-ovn-metadata-agnet log, it seems the lost one run privsep helper before ovsdbapp.backend.ovs_idl.vlog connected.

agent lost
2024-06-20 23:58:31.604 721363 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.158:6642: connecting...
2024-06-20 23:58:31.604 721364 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.64:6642: connecting...
2024-06-20 23:58:31.606 721294 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/neutron_ovn_metadata_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpm2hdkfkn/privsep.sock']
2024-06-20 23:58:31.607 721364 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.64:6642: connected
2024-06-20 23:58:31.607 721363 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.158:6642: connected
2024-06-20 23:58:31.624 721363 INFO eventlet.wsgi.server [-] (721363) wsgi starting up on http:/var/lib/neutron/metadata_proxy
2024-06-20 23:58:31.624 721364 INFO eventlet.wsgi.server [-] (721364) wsgi starting up on http:/var/lib/neutron/metadata_proxy
2024-06-20 23:58:32.193 721294 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2024-06-20 23:58:32.115 727744 INFO oslo.privsep.daemon [-] privsep daemon starting

agent ok
2024-06-20 23:58:43.035 725507 INFO neutron.agent.ovn.metadata.ovsdb [-] Getting OvsdbSbOvnIdl for MetadataAgent with retry
2024-06-20 23:58:43.036 725426 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.85:6642: connecting...
2024-06-20 23:58:43.036 725507 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.85:6642: connecting...
2024-06-20 23:58:43.036 725506 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.158:6642: connecting...
2024-06-20 23:58:43.040 725426 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.85:6642: connected
2024-06-20 23:58:43.041 725506 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.158:6642: connected
2024-06-20 23:58:43.041 725507 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:192.168.11.85:6642: connected
2024-06-20 23:58:43.058 725506 INFO eventlet.wsgi.server [-] (725506) wsgi starting up on http:/var/lib/neutron/metadata_proxy
2024-06-20 23:58:43.059 725507 INFO eventlet.wsgi.server [-] (725507) wsgi starting up on http:/var/lib/neutron/metadata_proxy
2024-06-20 23:58:43.061 725426 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/neutron_ovn_metadata_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpsn2mij4d/privsep.sock']
2024-06-20 23:58:43.653 725426 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2024-06-20 23:58:43.575 725540 INFO oslo.privsep.daemon [-] privsep daemon starting
2024-06-20 23:58:43.577 725540 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

To verify be manually run below:
2024-06-20 23:58:31.606 721294 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/neutron_ovn_met
adata_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpm2hdkfkn/privsep.sock']

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

The error log actually come from
https://review.opendev.org/c/openstack/neutron/+/764318

    def register_metadata_agent(self):
        # NOTE(lucasagomes): db_add() will not overwrite the UUID if
        # it's already set.
        table = ('Chassis_Private' if self.has_chassis_private else 'Chassis')
        ext_ids = {
            ovn_const.OVN_AGENT_METADATA_ID_KEY: uuidutils.generate_uuid()}
        self.sb_idl.db_add(table, self.chassis, 'external_ids',
                           ext_ids).execute(check_error=True)

Revision history for this message
Zhanglei Mao (zhanglei-mao) wrote :

sudo ovn-sbctl show -- don't have those Chassis like nc7.maas.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.