CNI container unable to recover after openvswitch service restart

Bug #1825931 reported by Nayan Deshmukh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
In Progress
Undecided
Nayan Deshmukh

Bug Description

When the openvswitch service is restarted the "/var/run/openvswitch" folder which was mounted on the CNI container gets unmounted and hence the cni is unable to execute any of the ovs-vsctl commands which are required by VIFOpenSwitchDriver.

Manual solution: deleting the cni container fixes the issue as Kubernetes recreates the container and it has the "/var/run/openvswitch" properly mounted.

The automatic solution is to check for the existence of the directory "/var/run/openvswitch" and mark the cni as unhealty if the directory is missing.

The cni logs for the error:

2019-04-23 02:56:31.712 203 INFO kuryr_kubernetes.cni.main [-] Using 'CNIDaemonizedRunner'
2019-04-23 02:56:31.713 203 DEBUG kuryr_kubernetes.cni.api [-] Making request to CNI Daemon. POST http://127.0.0.1:50036/addNetwork
{'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/5204/ns/net', 'CNI_PATH': '/opt/cni/bin', 'CNI_ARGS': 'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=multi-port-test-84c56cd4bd-r824b;K8S_POD_INFRA_CONTAINER_ID=9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_DAEMON': 'True', 'CNI_CONFIG_DIR_PATH': '/etc/cni/net.d', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_BIN_DIR_PATH': '/opt/cni/bin', 'config_kuryr': {u'debug': True, u'cniVersion': u'0.3.1', u'type': u'kuryr-cni', u'kuryr_conf': u'/etc/kuryr/kuryr.conf', u'name': u'kuryr'}} _make_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:179
2019-04-23 02:56:31.728 220 DEBUG kuryr_kubernetes.cni.daemon.service [-] Received ADD request. CNI Params: {'CNI_IFNAME': u'eth0', 'CNI_NETNS': u'/proc/5204/ns/net', 'CNI_PATH': u'/opt/cni/bin', 'CNI_ARGS': u'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=multi-port-test-84c56cd4bd-r824b;K8S_POD_INFRA_CONTAINER_ID=9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_DAEMON': u'True', 'CNI_CONFIG_DIR_PATH': u'/etc/cni/net.d', 'CNI_COMMAND': u'ADD', 'CNI_BIN_DIR_PATH': u'/opt/cni/bin', 'CNI_CONTAINERID': u'9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27'} _prepare_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/daemon/service.py:69
2019-04-23 02:56:32.639 29 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available for event MODIFIED Pod:multi-port-test-84c56cd4bd-r824b __call__ /usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/dispatch.py:66
2019-04-23 02:56:32.642 29 DEBUG kuryr_kubernetes.cni.handlers [-] Got VIFs from annotation: {'eth0': VIFOpenVSwitch(active=False,address=fa:16:3e:19:34:9d,bridge_name='br-int',has_traffic_filtering=True,id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef,network=Network(364eafde-63fb-4afe-9d05-8d806277e98c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5f00664c-5c')} _get_vifs /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/handlers.py:81
2019-04-23 02:56:32.645 29 DEBUG oslo_concurrency.lockutils [-] Acquired lock "default/multi-port-test-84c56cd4bd-r824b" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:265
2019-04-23 02:56:32.646 29 DEBUG oslo_concurrency.lockutils [-] Acquired external semaphore "default/multi-port-test-84c56cd4bd-r824b" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:272
2019-04-23 02:56:32.648 29 DEBUG oslo_concurrency.lockutils [-] Releasing lock "default/multi-port-test-84c56cd4bd-r824b" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:281
2019-04-23 02:56:32.859 220 DEBUG os_vif [-] Plugging vif VIFOpenVSwitch(active=False,address=fa:16:3e:19:34:9d,bridge_name='br-int',has_traffic_filtering=True,id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef,network=Network(364eafde-63fb-4afe-9d05-8d806277e98c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5f00664c-5c') plug /usr/lib/python2.7/site-packages/os_vif/__init__.py:76
2019-04-23 02:56:32.863 220 INFO oslo.privsep.daemon [-] Running privsep helper: ['privsep-helper', '--privsep_context', 'vif_plug_ovs.privsep.vif_plug', '--privsep_sock_path', '/tmp/tmpGeKC6o/privsep.sock']
2019-04-23 02:56:33.628 220 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2019-04-23 02:56:33.630 220 DEBUG oslo.privsep.daemon [-] Accepted privsep connection to /tmp/tmpGeKC6o/privsep.sock __init__ /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:332
2019-04-23 02:56:33.578 235 INFO oslo.privsep.daemon [-] privsep daemon starting
2019-04-23 02:56:33.585 235 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2019-04-23 02:56:33.591 235 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_NET_ADMIN/CAP_NET_ADMIN/none
2019-04-23 02:56:33.592 235 INFO oslo.privsep.daemon [-] privsep daemon running as pid 235
2019-04-23 02:56:34.274 220 INFO os_vif [-] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:19:34:9d,bridge_name='br-int',has_traffic_filtering=True,id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef,network=Network(364eafde-63fb-4afe-9d05-8d806277e98c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap5f00664c-5c')
2019-04-23 02:56:34.795 220 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2019-04-23 02:56:35.267 220 DEBUG oslo_concurrency.processutils [-] CMD "ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr" returned: 1 in 0.472s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2019-04-23 02:56:35.268 220 DEBUG oslo_concurrency.processutils [-] u'ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr' failed. Not Retrying. execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:457
2019-04-23 02:56:35.269 220 ERROR kuryr_kubernetes.linux_net_utils [-] Unable to execute ['ovs-vsctl', '--', '--if-exists', 'del-port', u'tap5f00664c-5c', '--', 'add-port', u'br-int', u'tap5f00664c-5c', '--', 'set', 'Interface', u'tap5f00664c-5c', u'external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef', 'external-ids:iface-status=active', u'external-ids:attached-mac=fa:16:3e:19:34:9d', 'external-ids:vm-uuid=kuryr']. Exception: Unexpected error while running command.
Command: ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr
Exit code: 1
Stdout: u''
Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n': ProcessExecutionError: Unexpected error while running command.
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': u'eth0', 'CNI_NETNS': u'/proc/5204/ns/net', 'CNI_PATH': u'/opt/cni/bin', 'CNI_ARGS': u'IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=multi-port-test-84c56cd4bd-r824b;K8S_POD_INFRA_CONTAINER_ID=9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27', 'CNI_DAEMON': u'True', 'CNI_CONFIG_DIR_PATH': u'/etc/cni/net.d', 'CNI_COMMAND': u'ADD', 'CNI_BIN_DIR_PATH': u'/opt/cni/bin', 'CNI_CONTAINERID': u'9c5768414ffda81c555017d23686b57dec581a07093f5e0c4857b9bb4b02ce27'}: ProcessExecutionError: Unexpected error while running command.
Command: ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr
Exit code: 1
Stdout: u''
Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n'
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last):
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 81, in add
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service vif = self.plugin.add(params)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 51, in add
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service vifs = self._do_work(params, b_base.connect)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 138, in _do_work
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service container_id=params.CNI_CONTAINERID)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/base.py", line 100, in connect
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service driver.connect(vif, ifname, netns, container_id)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/bridge.py", line 105, in connect
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service vif.address, instance_id)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/linux_net_utils.py", line 53, in create_ovs_vif_port
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service _ovs_vsctl(_create_ovs_vif_cmd(bridge, dev, iface_id, mac, instance_id))
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/linux_net_utils.py", line 34, in _ovs_vsctl
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service return processutils.execute(*full_args, run_as_root=True)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 424, in execute
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service cmd=sanitized_cmd)
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service ProcessExecutionError: Unexpected error while running command.
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Command: ovs-vsctl -- --if-exists del-port tap5f00664c-5c -- add-port br-int tap5f00664c-5c -- set Interface tap5f00664c-5c external-ids:iface-id=5f00664c-5ce3-4d6b-b038-eaeffe34dbef external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:19:34:9d external-ids:vm-uuid=kuryr
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Exit code: 1
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Stdout: u''
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)\n'
2019-04-23 02:56:35.272 220 ERROR kuryr_kubernetes.cni.daemon.service
2019-04-23 02:56:35.277 220 INFO werkzeug [-] 127.0.0.1 - - [23/Apr/2019 02:56:35] "POST /addNetwork HTTP/1.1" 500 -
2019-04-23 02:56:35.280 203 DEBUG kuryr_kubernetes.cni.api [-] CNI Daemon returned "500 INTERNAL SERVER ERROR". _make_request /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:187
2019-04-23 02:56:35.280 203 ERROR kuryr_kubernetes.cni.api [-] CNI daemon returned error "500 INTERNAL SERVER ERROR".
2019-04-23 02:56:35.283 203 DEBUG kuryr_kubernetes.cni.api [-] CNI output: {'msg': 'Got invalid status code from CNI daemon.', 'cniVersion': '0.3.1', 'code': 100, 'details': 'Traceback (most recent call last):\n File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py", line 81, in run\n vif = self._add(params)\n File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py", line 154, in _add\n resp = self._make_request(\'addNetwork\', params, httplib.ACCEPTED)\n File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py", line 191, in _make_request\n raise k_exc.CNIError(\'Got invalid status code from CNI daemon.\')\nCNIError: Got invalid status code from CNI daemon.\n'} _write_dict /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/api.py:55
2019-04-23 02:56:35.285 203 DEBUG kuryr_kubernetes.cni.main [-] Exiting with status 1 run /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/main.py:78
2019-04-23 02:56:35.605 37 WARNING pyroute2.ipdb.main [-] shutdown in progress
2019-04-23 02:56:35.605 37 DEBUG kuryr_kubernetes.cni.health [-] Kuryr CNI Liveness verified. liveness_status /usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/health.py:151
2019-04-23 02:56:35.606 37 INFO werkzeug [-] 172.214.0.6 - - [23/Apr/2019 02:56:35] "GET /alive HTTP/1.1" 200 -

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kuryr-kubernetes (master)

Fix proposed to branch: master
Review: https://review.opendev.org/657909

Changed in kuryr-kubernetes:
assignee: nobody → Nayan Deshmukh (ndesh26)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kuryr-kubernetes (master)

Change abandoned by Michał Dulko (<email address hidden>) on branch: master
Review: https://review.opendev.org/657909

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.