bind restarts many times during the day

Bug #2039952 reported by Marian Gasparovic
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Triaged
High
Christian Grabowski
3.3
Triaged
Medium
Unassigned
3.4
Triaged
Medium
Unassigned

Bug Description

MAAS 3.3.5
We noticed in test runs that our staticaly defined VIP addresses for openstack services sometimes don't resolve and a minute later they are fine.
Turns out bind server restarts often, tens of times a day.

```
$ grep "starting BIND" named.log*|awk '{print $1}'|sed "s/.*://"|sort|uniq -c
     15 12-Oct-2023
     67 13-Oct-2023
    175 14-Oct-2023
    170 15-Oct-2023
     34 16-Oct-2023
     41 17-Oct-2023
     36 18-Oct-2023
     73 19-Oct-2023
     15 20-Oct-2023

```

Christian had a look and he found that BIND is getting restarted frequently because MAAS checks the serial for each zone it updates, if it fails the check on 2 additional retries, BIND is restarted. But reason is unknown.

To show how it is hitting us I ran a script which digs six statically defined records once a minute, it failed 56 times during 12 hours, there were several cases when they were not resolving for two minutes.

Tags: cdo-qa
Revision history for this message
Adam Collard (adam-collard) wrote :

@marian thanks for the bug report - can you please collect and attach an sos report?

Can you confirm if you're using the deb or a snap?

description: updated
Revision history for this message
Marian Gasparovic (marosg) wrote :

It is a snap.

Attaching sosreport.

Changed in maas:
importance: Undecided → Medium
milestone: none → 3.5.0
status: New → Triaged
Revision history for this message
Jeff Lane  (bladernr) wrote :

This would explain a LOT of our deployment failures, especially when I run multiple deployments concurrently.

I checked our own server and:
$ grep "starting BIND" named.log*|awk '{print $1}'|sed "s/.*://"|sort|uniq -c
    853 19-Oct-2023
   1127 20-Oct-2023
   1386 21-Oct-2023
   1335 22-Oct-2023
    827 23-Oct-2023
    830 24-Oct-2023
    619 25-Oct-2023
     22 26-Oct-2023
      3 27-Oct-2023
      6 28-Oct-2023

We are currently on this version of the MAAS snap:
maas 3.4.0~rc1-14302-g.bb0dc28c1 30266 3.4/candidate canonical✓ -

But have been seeing this same behaviour (lost of DNS resolution under load) since at least 3.3.x, perhaps a touch older.

I'm adding a sosreport from our own server as well.

Changed in maas:
assignee: nobody → Christian Grabowski (cgrabowski)
importance: Medium → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.