Keystone sends too many notifications to Ceilometer

Bug #1423121 reported by Dmitry Nikishov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Bartłomiej Piotrowski
6.0.x
Invalid
High
Bartłomiej Piotrowski

Bug Description

Environment:
MOS 6.0, HA (reproduced with both 1 and 3 controllers), Ceilometer is enabled.

In 6.0, Keystone is configured to send notifications about it's activity to Ceilometer via MQ. (See notification_driver setting in keystone.conf). Due to it's nature (identity service), Keystone generates a huge stream of notifications. Almost all of them are "identity.authenticate.success". This creates additional load on the DB and can lead to poor performance. We have encountered this when Ceilometer-related OSTF and Tempest tests failed with timeout.

After ~20 hours after the deployment meter-list contained over 60k entries:

root@node-2:~# ceilometer meter-list | wc -l
62670

99.9% of them were keystone's "identity.authenticate.success". The cluster had 3 controllers and 2 computes. It mostly has been idle (no OSTF or Tempest runs; manually booted 4 instances).

The workaround:
1. Comment out notification_driver=messaging in keystone.conf on all controllers; restart keystone. This will stop it from sending metrics.
2. Configure DB entries TTL in ceilometer. Set time_to_live to preferred TTL is seconds in ceilometer.conf.
3. Manually run ceilometer-expirer to delete outdated data.

Changed in fuel:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 6.1
Revision history for this message
Mike Scherbakov (mihgen) wrote :

I think it's High priority issue as Ceilometer DB gets polluted. At the end, Ceilometer won't be functional due to DB overload. If we can avoid spam from Keystone - let's do it.

Changed in fuel:
importance: Medium → High
tags: added: low-hanging-fruit
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bartlomiej Piotrowski (bpiotrowski)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/159465

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/161124

Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

Time to live and ceilometer-expirer were implemented for ceilometer data as fix for the bug https://bugs.launchpad.net/fuel/+bug/1399164 . Why do we need to remove keystone notifications?]

 In addition ceilometer uses Mongodb that can work with huge amount of data.

Changed in fuel:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Bart?omiej Piotrowski (<email address hidden>) on branch: master
Review: https://review.openstack.org/159465
Reason: https://bugs.launchpad.net/fuel/+bug/1423121/comments/4

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.0)

Change abandoned by Bart?omiej Piotrowski (<email address hidden>) on branch: stable/6.0
Review: https://review.openstack.org/161124
Reason: https://bugs.launchpad.net/fuel/+bug/1423121/comments/4

Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

There are a lot of data that ceilometer collects. That data set is really huge, because it contains metrics related to ALL openstack components (and it's not only keystone notifications). And it's normal thing that we have a lot of data written during one day.
For that issues we have possible workarounds. First is time to live and ceilometer expirer. They was implemented as fix for https://bugs.launchpad.net/fuel/+bug/1399164 . Next workaround is described here https://bugs.launchpad.net/fuel/+bug/1434589 (please, pay attention on this comment https://bugs.launchpad.net/fuel/+bug/1434589/comments/15).
The main point is do not use requests for all meters (I mean "ceilometer meter-list") we should use timeouts, queries and etc.

So, I mark this bug as invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.