Rabbitmq mnesia database slowly growing over time

Bug #1617446 reported by Eugene Nikanorov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Dmitry Mescheryakov
7.0.x
Invalid
High
MOS Oslo
8.0.x
Invalid
High
MOS Oslo

Bug Description

Found on 7.0
8.0 is possibly affected too

/var/lib/rabbitmq/mnesia has grown to 30-40G on controllers.
Since default root partition size is 50Gb, it's dangerous because soon the lack of free space will make rabbitmq to stop silently. Also, ceph-monitors are terminating because of lack of disk space, too.
This creates conditions for outage of the whole cloud.

Currently there's no way this state can be fixed automatically.

Changed in mos:
assignee: nobody → MOS Oslo (mos-oslo)
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Eugene, please provide more information about the deployment and conditions which cause 30-40G size of mnesia db. Now it's not clear how to reproduce the issue

Changed in mos:
status: New → Incomplete
assignee: MOS Oslo (mos-oslo) → Eugene Nikanorov (enikanorov)
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I don't think it necessary requires reproduction in the same way as it was achieved in prod.
It might have been caused by ceilometer+oslo bug, when it generated tons of messages which was not consumed.

I'd like to see OCF script fixed in such way that it monitors consumed space, and if it is too much, it would restart rabbitmq clearing mnesia.

Changed in mos:
status: Incomplete → Confirmed
Changed in mos:
assignee: Eugene Nikanorov (enikanorov) → Dmitry Mescheryakov (dmitrymex)
milestone: none → 9.2
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Retargeted to 9.3 as there is no evidence this issue affects 9.2

tags: added: move-to-9.3
Changed in mos:
milestone: 9.2 → 9.3
Changed in mos:
milestone: 9.x-updates → 9.2-mu-1
Changed in mos:
milestone: 9.2-mu-1 → 9.x-updates
status: Confirmed → Invalid
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

I'm closing this as Invalid for the following reasons:
 * We have no evidence that the issue is recurrent
 * The proposed way of solving it looks like a hack
 * We didn't find the root cause of the issue so we are treating symptoms not the cause

Feel free to re-open it if it has occurred again and please provide enough data to analyze and reproduce it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.