Pre-cached neutron-port resource residues

Bug #1905219 reported by liujinxin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
New
Undecided
Unassigned

Bug Description

I have a node deleted and the port previously cached on that node still exists.

Residual resources in two cases.

1、When the ovs-agent exception on a kuryr node results in the creation of a neutron-port binding failed. This situation will result in the creation of a large number of neutron-ports on that node, and there is no place to remove the port that failed to be bound.
2、When a node with k8s is removed, the neutron-port previously cached on that node still exists.

Also, I have a question:

In the interface _recover_precreated_ports().
            if not port.binding_vif_type or not port.binding_host_id:
                # NOTE(ltomasbo): kuryr-controller is running without the
                # rights to get the needed information to recover the ports.
                # Thus, removing the port instead
                os_net = clients.get_network_client()
                os_net.delete_port(port.id)
                continue
In case port.binding_vif_type is binding_failed, should the port be deleted as well?

Revision history for this message
Michal Dulko (michal-dulko-f) wrote :

1. I don't think we ever saw this happening, can you elaborate?
2 We have this part of code that's being run periodically: https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/vif_pool.py#L482. Do you think it doesn't work? Do you use nested or neutron VIF driver?

3. Right, it may make sense to delete them there, though binding_failed is most of the time indication of some Neutron issues, so underlying cluster might be just broken.

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

can you please paste the status of a port belonging to a deleted node?

Revision history for this message
liujinxin (scilla) wrote :

Thank you for your reply

> 1. I don't think we ever saw this happening, can you elaborate?
When the state of the kubernetes node is ready and can be scheduled, and the state of the ovs-agent on this node is abnormal or unavailable. In this case, a new pod creation can be scheduled to the node, but the kuryr-controller creates and binds the neutron-port to the node and the neutron-port fails to bind.In this case if the kuryr-controller is restarted you will see an error message in the kuryr-controller's log.

2 We have this part of code that's being run periodically: https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/vif_pool.py#L482. Do you think it doesn't work? Do you use nested or neutron VIF driver?

The version of kuryr deployed in my environment is older and does not yet have the _cleanup_removed_nodes interface. I will try to update to the new kuryr code and try again.

Revision history for this message
Michal Dulko (michal-dulko-f) wrote :

> When the state of the kubernetes node is ready and can be scheduled, and the state of the ovs-agent on this node is abnormal or unavailable. In this case, a new pod creation can be scheduled to the node, but the kuryr-controller creates and binds the neutron-port to the node and the neutron-port fails to bind.In this case if the kuryr-controller is restarted you will see an error message in the kuryr-controller's log.

Ah, I see, well that's kinda expected if Neutron's broken. We can't do much about the underlying Neutron, it's way easier to assume it got to be working fine. Do you say that this causes superfluous ports to get created by controller?

Revision history for this message
liujinxin (scilla) wrote :

 Do you say that this causes superfluous ports to get created by controller?
yeah,I think so.
In the latest code it is possible that the interface _check_port_binding will solve a problem. Is this right?

Revision history for this message
Michal Dulko (michal-dulko-f) wrote :

Uhm, well, creation of additional ports is not really expected, but yeah, _check_port_binding should help here.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.