Avoid neutron to return error 500 when deleting port if designate is down

Bug #1846703 reported by Gregoire Mahe
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
Gregoire Mahe

Bug Description

Hello,

We discovered that when designate is configured on neutron, and the service is down, neutron will fails to delete port.

The current behavior for port creation is not the same for port deletion.

When we create a port, while designate is down, neutron will ignore it, and create the port without the record on designate.

When we delete a port, while designate is down, neutron will return a 500 error because designate is down.

We need to normalize the behavior. For now, I propose to catch the exception, and just logs the exception and continue to delete the port.

In a next step, later on, I maybe suggest to create a "dead queue" on the neutron rabbitmq, and store record creation on error, which can be retried later manually, to keep consistancy.

Tags: dns
Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

Not tested locally, but that sounds like a nice enhancement/behaviour standardisation

tags: added: dns
Changed in neutron:
importance: Undecided → Wishlist
status: New → Confirmed
Sapna Jadhav (sapana45)
Changed in neutron:
assignee: nobody → Sapna Jadhav (sapana45)
Revision history for this message
Gregoire Mahe (gregoiremahe) wrote :

@sapana45 : I started to work on it already : https://review.opendev.org/#/c/685644/6

By I think we should do things different :

By default I suggest to return a 502 error on neutron when we want to create port if designate is down, plus, modify 500 error to 502 error on neutron when we want to delete port

That behavior allow neutron and designate to be consistent.

Then, we should create an option on the neutron configuration file to disable this consistency, and to return 200 even if designate is down.

In a later step, maybe we should store unconsistent data on database to allow to retry creation/deletion/update later.

Sapna Jadhav (sapana45)
Changed in neutron:
assignee: Sapna Jadhav (sapana45) → nobody
Changed in neutron:
assignee: nobody → Gregoire Mahe (gregoiremahe)
Changed in neutron:
importance: Wishlist → Medium
Revision history for this message
Gregoire Mahe (gregoiremahe) wrote :

Here is the bug explanation.

https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/extensions/dns_integration.py#L527

Is that function, both _create_port_in_external_dns_service and _delete_port_in_external_dns_service returns the same exception uppon designate failure.

Uppon CREATION :
First neutron creates port, and then neutron tries to create record. If record creation is failing, neutron has already created the port, and the record creation failure is ignored.

Uppon DELETION :
First neutron tries to delete record, and if that is failing, neutron doesn't continue and return an error 500.

Revision history for this message
Gregoire Mahe (gregoiremahe) wrote :

Here are the steps :

STEP #1 : Strengthen neutron + designate consistency

After discussion with Slaweq, here is what we have to do :
 - if the second action fails, we have to revert the first.
So, uppon creation (port then record creation) :
 - if adding record fails, we have to delete port
Uppon deletion (record then port deletion) :
 - if deleting port fails, we have to recreate record

STEP #2 : Improve error on designate failure

I suggest here to return a 502 or 503 error instead to tell that designate is down

STEP #3 : Allow user to disable consistency

Next, we need to add an option on neutron.conf, to disable consistency (which is by default enabled)
We have to ignore designate failures on this case, and don't revert anything uppon failure

STEP 4# : Store consistency issues when consistency is disabled

The aim is to improve consistency by storing all consistency problems in the neutron database to be able to delete / creates record to recreate the consistency later

For now, I'll no have time to do this step, but I'll plan to do it in many months.

---

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

IMO Step #1 and Step #2 are ok to go. Maybe in step #1 You should explore what is easier/faster to revert (deletion of created port in case when designate fails or deletion of dns record in designate in case when port creation fails).

As for step #3 and #4 - I think it should be covered by separate RFE with some real use case why this should be needed to add such logic to neutron.
So let's use this bug to track only steps #1 and #2.

Revision history for this message
Gregoire Mahe (gregoiremahe) wrote :

Ok, let's create an RFE for steps #3 and #4

Actually, we need to make neutron and designate inconsistent to avoid neutron failure if designate is down.

So if this RFE is confirmed, I'll work on steps 1 to 4

Revision history for this message
Miguel Lavalle (minsel) wrote :

I have to say that there are many people using the current DNS integration (including my employer) without a problem. So whatever is done in regards to this bug, has to preserve the current behavior. Whatever is done, has to be an optional behavior separate from the current driver

Revision history for this message
Gregoire Mahe (gregoiremahe) wrote :

So what I propose, is to keep this dns_integration with consistency for port deletion, and inconsistency for port creation.

Then, I propose to create another dns_integration module to either strengthen consistency or remove it.

Is that okay for everybody ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.