[Queens -> Rocky][19.04] neutron-l3-agent is not restarted automatically which can result in impl_idl import errors during l3 agent operation

Bug #1835557 reported by Dmitrii Shcherbakov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Neutron Gateway Charm
Invalid
Undecided
Unassigned
OpenStack Neutron Open vSwitch Charm
Invalid
Undecided
Unassigned
OpenStack Nova Compute Charm
Fix Released
High
Unassigned

Bug Description

[Problem Description]

Upgraded a deployment from Queens to Rocky (via charm actions) and tried to create a router after that - all of its ports were down https://paste.ubuntu.com/p/G6cw3KzX6Q/

This looks like py2 -> py3 conversion related error - the necessary python2 packages were removed, however, the agent still runs with python2.7 (see below).

As soon as I restart the l3 agents on all nodes, ports are "up" as shown in `openstack port list --router <routername>`

juju run --application neutron-openvswitch 'sudo systemctl restart neutron-l3-agent'

I would expect neutron-dhcp-agent.service, neutron-openvswitch-agent.service and neutron-metadata-agent.service to potentially have similar issues because they also run with python2.7 after upgrade.

[Analysis]

Based on the l3-agent log I can see that this was due to import errors:

https://paste.ubuntu.com/p/CZbJHCh5rm/ (larger output)

2019-07-05 14:53:26.544 1146080 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2019-07-05 14:53:26.544 1146080 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 562, in _process_router_update
# ...
2019-07-05 14:53:26.544 1146080 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/ovsdb/api.py", line 30, in from_config
2019-07-05 14:53:26.544 1146080 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/importutils.py", line 73, in import_module
2019-07-05 14:53:26.544 1146080 ERROR neutron.agent.l3.agent ImportError: No module named impl_idl

There are no python2.7 packages left with that module:

find /usr/lib/python* -name 'impl_idl.py'
/usr/lib/python3/dist-packages/neutron/agent/ovsdb/impl_idl.py
/usr/lib/python3/dist-packages/ovsdbapp/schema/open_vswitch/impl_idl.py
/usr/lib/python3/dist-packages/ovsdbapp/schema/ovn_southbound/impl_idl.py
/usr/lib/python3/dist-packages/ovsdbapp/schema/ovn_northbound/impl_idl.py

root@adze:~# dpkg -l | grep ovsdb
ii python3-ovsdbapp 0.12.0-0ubuntu1~cloud0 all library for creating OVSDB applications - Python 3.x

python-ovsdbapp was removed during the upgrade:

/var/log/apt/term.log:Preparing to unpack .../060-python-ovsdbapp_0.12.0-0ubuntu1~cloud0_all.deb ...
/var/log/apt/term.log:Unpacking python-ovsdbapp (0.12.0-0ubuntu1~cloud0) over (0.9.1-0ubuntu1) ...
/var/log/apt/term.log:Setting up python-ovsdbapp (0.12.0-0ubuntu1~cloud0) ...
/var/log/apt/term.log:Selecting previously unselected package python3-ovsdbapp.
/var/log/apt/term.log:Preparing to unpack .../119-python3-ovsdbapp_0.12.0-0ubuntu1~cloud0_all.deb ...
/var/log/apt/term.log:Unpacking python3-ovsdbapp (0.12.0-0ubuntu1~cloud0) ...
/var/log/apt/term.log:Setting up python3-ovsdbapp (0.12.0-0ubuntu1~cloud0) ...
/var/log/apt/term.log:Removing python-ovsdbapp (0.12.0-0ubuntu1~cloud0) ...

State before a manual neutron-l3-agent restart (python2.7):

systemctl status neutron-l3-agent
● neutron-l3-agent.service - OpenStack Neutron L3 agent
   Loaded: loaded (/lib/systemd/system/neutron-l3-agent.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-07-03 21:48:25 UTC; 1 day 18h ago
 Main PID: 1411876 (neutron-l3-agen)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/neutron-l3-agent.service
           ├─1411876 /usr/bin/python2.7 /usr/bin/neutron-l3-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/l3_agent.ini --config-file=/etc/neutron/
           └─1413914 /usr/bin/python2.7 /usr/bin/privsep-helper --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/l3_agent.ini --config-file /etc/neutron/fw

After (python3.6):

systemctl status neutron-l3-agent
● neutron-l3-agent.service - OpenStack Neutron L3 agent
   Loaded: loaded (/lib/systemd/system/neutron-l3-agent.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-07-05 16:02:18 UTC; 1s ago
 Main PID: 2085206 (neutron-l3-agen)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/neutron-l3-agent.service
           └─2085206 /usr/bin/python3.6 /usr/bin/neutron-l3-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/l3_agent.ini --config-file=/etc/neutron/

Jul 05 16:02:18 adze systemd[1]: Started OpenStack Neutron L3 agent.

description: updated
James Page (james-page)
tags: added: py3 upgrade
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Subscribed ~field-high as it affects cloud upgrade operations.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

It seems this should have been fixed by https://review.opendev.org/#/c/618192/ but I'm guessing that subordinate services are not being restarted.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I was able to reproduce this. DVR is a requirement of course as l3 agent is running on nova-compute. So e.g. something like the following is needed to recreate. I think we'll need some updates to charm-nova-compute to restart subordinate services after upgrading to rocky.

    neutron-gateway:
      charm: cs:~openstack-charmers-next/neutron-gateway
      constraints: mem=4G
      options:
        instance-mtu: 1300
    neutron-api:
      charm: cs:~openstack-charmers-next/neutron-api
      constraints: mem=1G
      options:
        neutron-security-groups: True
        enable-ml2-port-security: True
        enable-qos: True
        enable-vlan-trunking: True
        flat-network-providers: physnet1
        enable-dvr: True
        l2-population: True
    neutron-openvswitch:
      charm: cs:~openstack-charmers-next/neutron-openvswitch
      options:
        enable-local-dhcp-and-metadata: True
        bridge-mappings: physnet1:br-ex

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The problem is that remote services are not restarted after an action managed upgrade. Note they are correctly restarted after a traditional upgrade.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is a bug with charm-nova-compute.

Changed in charm-nova-compute:
status: New → Triaged
importance: Undecided → High
Changed in charm-neutron-openvswitch:
status: New → Invalid
Changed in charm-neutron-gateway:
status: New → Invalid
tags: added: openstack-upgrade
Changed in charm-nova-compute:
status: Triaged → In Progress
assignee: nobody → Alex Kavanagh (ajkavanagh)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (master)

Fix proposed to branch: master
Review: https://review.opendev.org/701368

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/19.10)

Fix proposed to branch: stable/19.10
Review: https://review.opendev.org/701389

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/701368
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=dc8a6d005657eb3dc4b9e41a6b98309e7dc7429b
Submitter: Zuul
Branch: master

commit dc8a6d005657eb3dc4b9e41a6b98309e7dc7429b
Author: Alex Kavanagh <email address hidden>
Date: Tue Jan 7 14:01:50 2020 +0000

    Fix for not restarting subordinate services on managed upgrade

    This fixes the referenced bug by ensuring that the action does initiate
    remote restarts for container scoped related units.

    Change-Id: I149b753355b64113adfd8fd4eea972978b7ed20b
    Closes-Bug:#1835557

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/19.10)

Reviewed: https://review.opendev.org/701389
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=762f2d8eb41a404d1694ea7d5b03058c46c98ba0
Submitter: Zuul
Branch: stable/19.10

commit 762f2d8eb41a404d1694ea7d5b03058c46c98ba0
Author: Alex Kavanagh <email address hidden>
Date: Tue Jan 7 14:01:50 2020 +0000

    Fix for not restarting subordinate services on managed upgrade

    This fixes the referenced bug by ensuring that the action does initiate
    remote restarts for container scoped related units.

    Change-Id: Icf82fc9e29c68353dcd287b772a0914a7c75c59c
    Closes-Bug:#1835557

Changed in charm-nova-compute:
assignee: Alex Kavanagh (ajkavanagh) → nobody
James Page (james-page)
Changed in charm-nova-compute:
milestone: none → 20.02
Liam Young (gnuoy)
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.