Race condition in zone serial generation on concurrent changes to recordsets

Bug #1940976 reported by Christian Rohmann
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
New
Undecided
Unassigned
Ubuntu Cloud Archive
New
Undecided
Unassigned
designate (Ubuntu)
New
Undecided
Unassigned

Bug Description

I discovered a reproducible race condition when updating multiple recordsets of a single zone at the same time. There was an issue https://bugs.launchpad.net/bugs/1871332 about multiple designate instances and their coordination / distributed locking, but I also observe the issue with just a single instance and its multiple worker threads targeting the same zone ... and this quite easily happens when using IaC tooling like terraform which utilize multiple threads and multiple connections when talking to a cloud API.

To trigger the race condition I used this piece of terraform to create three recordsets:

--- cut ---

resource "openstack_dns_recordset_v2" "testrecords" {
  count = 3

  zone_id = data.openstack_dns_zone_v2.myzone.id
  name = "record-${count.index}.${data.openstack_dns_zone_v2.myzone.name}"
  description = "test-${count.index}"
  ttl = 60
  type = "A"
  records = ["127.0.0.1"]
}

--- cut ---

those 3 records will be created independently / concurrently and in the end the zone one the nameserver does not contain all the records. When creating just one more record afterwards all the records are written / updated in the zonefile properly - so this is due to the serial being updated inconsistently.

Looking at the code one how the serial is created: https://opendev.org/openstack/designate/src/branch/master/designate/utils.py#L137, it appears to clearly be subject to race conditions when multiple threads are updating the zone currently and use the previously current zone timestamp read from the database and increment it "in code".

There is a not yet merged patchset by Nicolas Bock which does not refer to a bug, but apparently changes the way the serial is created and uses an update statement in the database to increase the serial: https://review.opendev.org/c/openstack/designate/+/776173

Revision history for this message
Michael Johnson (johnsom) wrote :

Did you have the distributed locking service enabled, even for the single process deployment?

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

@johnsom Could you elaborate what I should check for exactly. I tested this while having reduced our set of three designate instances to just one to rule out any side-effects in this regard.

Are you talking about setting up a coordinator as described here? https://docs.openstack.org/designate/latest/admin/ha.html#id8

Revision history for this message
Michael Johnson (johnsom) wrote :

Yes, that is what I am asking about.

The configuration setting:

[coordination]
backend_url = <DLM URL>

Such that the threads are using the distributed lock manager.

You can also look for this warning message:
https://github.com/openstack/designate/blob/05343d4226822da8b9776201ea18e000d366573d/designate/coordination.py#L72

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

@johnsom thanks a bunch for getting back to me.

NO I am not using a coordinator. As discussed on IRC, Designate apparently requires a DLM even when running just as a single instance to coordinate multiple updates to a single zone?

As I wrote in https://review.opendev.org/c/openstack/designate/+/776173 using the backend storage to synchronize the incrementation of the zone serial does appear to work and would be a nice addition to allow a DEV or otherwise run single instance to not require a DLM.

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

This post to the ML seems related / hitting the same issue:http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025292.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.