[OVN] Lack of AZs awareness in L3 port scheduler

Bug #2030741 reported by morice
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Undecided
Rodolfo Alonso

Bug Description

The OVN L3 port scheduler assigns the router ports to gateway chassis. It retrieves the chassis list from nodes configured as gateway (external_ids:ovn-cms-options=enable-chassis-as-gw). This list could be filtered by availability zones. In this case, the scheduler will filter out chassis from invalid AZs (scheduler/l3_ovn_scheduler.py).

As a result, we have a list of all eligible chassis for gateway ports, in all AZs where it could be scheduled.

Then, both chance and leastloaded scheduler select 5 nodes from this list (hardcoded in common/ovn/constants.py:MAX_GW_CHASSIS = 5) regardless of AZs membership. Everything seems OK but when more than 5 nodes are available in one of the AZs, the gateway for a router can be scheduled in *only* one unique AZ.

In some use cases, where AZs are mapped to “failure domains”, this could be a problem. While in OVS l3_ha mode, router instances where placed by “neutron.scheduler.l3_agent_scheduler.AZ*Scheduler” taking care of AZs and so were their ports, this seems not to be feasible out-of-box - right now - using OVN.

Tags: ovn
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Morice:

The behaviour of the OVN L3 scheduler, with AZ filtering, that you are describing is correct. And this the expected behaviour: if the OVN L3 scheduler returns several ports (in any order depending on the scheduler) and this 5 ports [1] belong to the same AZ, the router will be scheduled to this single AZ. The OVN L3 scheduler won't distribute the ports among the available AZs. In that case, you can create a new OVN L3 scheduler class if that is what you need. The OVN L3 scheduler is configurable.

What does it mean "failure domains"? If you have GW chassis that should be disabled, then you should disabled them manually or remove the AZ tag from them. I would like to know what is the use case you need and what you are expecting from the scheduler.

Regards.

[1] As you correctly commented, there is a hardcoded limit of 5 router ports.

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: New → Incomplete
Revision history for this message
morice (yannmorice) wrote (last edit ):

Hello Rodolpho Alonso:

Thanks a lot for your returns.

By "Failure domains", I meant an equipment group (server, network hardware, power supply, etc...) that could be lost for any reason without affecting the other groups and consequently the service globally.

For example, our openstack nodes are distributed over multiple rooms in a datacenter. So that, in each room, we have some of our network-dedicated nodes. From this point of view, each room could be considered as a "failure domain". The goal is to have an automatic failover in case of any failure affecting one single room.

Until now, we used ML2/OVS. Nodes from room 1 were placed in the same AZ az-1, nodes from room2 in az-2... Using l3_ha mode and AZ aware schedulers we had router instances spawned on each of these AZs, so that we could lose any single zone without affecting the service (after vrrp timers of course).

For different reasons, we’d like to use OVN in the near future (and do the same). Out of the box, in our use case, using ML2/OVN, that should be OK until having no more than 5 * number of AZs network nodes. But, that may not be sufficient for us in the future as we already use slightly more.

Disabling or removing nodes should be feasible (even temporarily) but it won’t be automatic.

As you suggested, to deal with that and still use OVN, we wrote a little patch to add two new schedulers that optionally reorder the list taking care of AZs… and that works !

I share with you a refreshed version of this patch for trunk (we work on a previous version of openstack neutron that needs some other back-ports for that to work). The only draw-back is the need to call get_chassis_and_azs from sb_idl and therefore propagating it (again) along scheduler functions.

Please let me know if you have any questions.

Regards.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Morice:

Please, propose this patch to https://review.opendev.org. This is the best way to review it and get it approved. I have one concern about this new schedulers. They are "AZ aware", but the initial two ones "leastloaded" and "chance"), are aware too. That means: if these two current schedulers find AZ hits in the routers, they will use only those ones. Your implementation reorders the selected routers to distribute the LRP among the AZ (that is the goal, of course).

In the L3 agent scheduler (for L3 agent routers, non-OVN), we have an specific "AZLeastRoutersScheduler" and this is the only scheduler aware of AZs. As commented, this is not the same in the OVN schedulers.

In a nutshell, what we need is:
Alternative 1) To find a way to preserve the current scheduler behaviour but removing the "AZ awareness" and introduce your schedulers.
Alternative 2) Modify the current schedulers to introduce your AZ reorder algorithm.
Alternative 3) Something else...

This is something I would like to discuss in the Neutron meeting [1] next Tuesday. I'll add a "on demand" topic. But please, propose the patch to https://review.opendev.org.

Regards and thanks for your efforts.

[1]https://meetings.opendev.org/#Neutron_Team_Meeting

Changed in neutron:
status: Incomplete → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/892604

Changed in neutron:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.