In delete action, the function get_serial() return nonzero number when 'recursion' is on

Bug #1802227 reported by ephem
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Designate
New
Undecided
Unassigned

Bug Description

backend: bind9

1.Create a zone named "zepp.com.".
2.Query the serial number from my nameservers with get_serial()(designate/worker/utils.py).
zepp.com. 3600 IN SOA ns3.zepp.com. admin.zepp.com. 1541648152 3546 600 86400 3600
3.Delete the zone.
4.Query the serial number from my nameservers with get_serial()(designate/worker/utils.py).
zepp.com. 794 IN SOA ns-805.awsdns-36.net. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
5.Expect 0 but serial number 1 was found, so func parse_query_results() says the zone "zepp.com." is still alive and I cannot delete the zone from designate's database.

code from master branch:

get_serial():

def get_serial(zone_name, host, port=53):
    """
    Possibly raises dns.exception.Timeout or dns.query.BadResponse.
    Possibly returns 0 if, e.g., the answer section is empty.
    """
    resp = dig(zone_name, host, dns.rdatatype.SOA, port=port)
    if not resp.answer:
        return 0
    rdataset = resp.answer[0].to_rdataset()
    if not rdataset:
        return 0
    return rdataset[0].serial

parse_query_results():

def parse_query_results(results, zone):
    """
    results is a [serial/None, ...]
    """
    delete = zone.action == 'DELETE'
    positives = 0
    no_zones = 0
    low_serial = 0

    for serial in results:
        if serial is None:
            # Intentionally don't handle None
            continue
        if delete:
            if serial == 0:
                no_zones += 1
                positives += 1
        else:
            if serial >= zone.serial:
                positives += 1

                # Update the lowest valid serial aka the consensus
                # serial
                if low_serial == 0 or serial < low_serial:
                    low_serial = serial
            else:
                if serial == 0:
                    no_zones += 1

    result = DNSQueryResult(positives, no_zones, low_serial, results)
    LOG.debug('Results for polling %(zone)s-%(serial)d: %(tup)s',
              {'zone': zone.name, 'serial': zone.serial, 'tup': result})
    return result

I suggest to use some attributes to distinguish the zones from different nameservers when 'recursion' is on.

ephem (tpiperatgod)
tags: removed: number
Revision history for this message
Shi Yan (yanshi-403) wrote :

I think I am meeting with the same issue here.

I create the zone in our test cloud bind server but fail to delete it.

The bind server has indeed already deleted the zone in its `zones` view, but since the forwarding is enabled in bind server, it will return the dns query for that zone from other upstream name servers.

Unfortunately, designate does not distinguish that, and keep thinking the serial is not correct and is stuck in ERROR state.

Revision history for this message
ephem (tpiperatgod) wrote :

Hi,@Shi Yan.I have taken some workaround like recognize the customized email address, and it works well. Btw, there are many bugs with designate, haha...

Revision history for this message
Shi Yan (yanshi-403) wrote :

@ephem haha yes, unfortunately, it is a similar situation for other OpenStack projects.

btw, have you open-sourced your workaround/patch somewhere, if so I am interested in knowing how you fix that :)

Revision history for this message
Graham Hayes (grahamhayes) wrote :

Unfortunately, we need to be able to query the zone on the configured Bind servers to see if they have actually been deleted (rndc is not perfect, and sometimes we need to retry).

We do not recommend combining authoritive and recursive DNS servers - this has been the guidance since the DNS Zone cache poisoning attacks from a few years ago.

Revision history for this message
ephem (tpiperatgod) wrote :

@Shi Yan I'm using designate in a private cloud and I can decide the zone's ns name. The code will match the unique ns name and return number '0' when doing the 'delete' action. I haven't publish my code for it is not universal.

@Graham Hayes you are right since designate dose not support dnssec yet.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

For those environments where recursive and authoritative bind servers are one in the same, could we solve this by expanding the get_serial() method to provide a specific exception if the returned query is non-authoritative such that if the response is non-authoritative, we consider the zone to be missing from the designate backend bind servers?

Revision history for this message
Graham Hayes (grahamhayes) wrote :

@Drew - yeah, that could work, and solves both problems.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.