mongo aggregation pipeline for resource retrieval fails with excessive memory use
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceilometer |
Fix Released
|
High
|
Eoghan Glynn | ||
Havana |
Fix Released
|
High
|
Eoghan Glynn |
Bug Description
The mongodb storage driver currently uses an aggregation pipeline over the meter collection in order to construct a list of resources adorned with first & last sample timestamps etc.
The problem with this approach is that the mongodb aggregation framework performs sorting in-memory, in this case operating over a potentially very large collection (particularly if the GET /v2/resources was not constrained with query params, e.g. to limit to a single tenant for example).
It turns out the mongodb innards are hardcoded to abort any sorts in an aggregation pipeline that will consume more than 10% of physical memory. The net result is that we see failures in production such as:
ERROR wsme.api [-] Server-side error: "command SON([('aggregate', u'meter'), ('pipeline', [{'$match': {}},
{'$sort': {'timestamp': -1, 'project_id': -1, 'user_id': -1}}, {'$group': {'meters_unit': {'$push': '$counter_unit'},
'source': {'$first': '$source'}, 'project_id': {'$first': '$project_id'},
'user_id': {'$first': '$user_id'}, 'last_sample_
'meters_name': {'$push': '$counter_name'}, 'first_
'meters_type': {'$push': '$counter_type'}, '_id': '$resource_id', 'metadata': {'$first': '$resource_
failed: exception: terminating request: request heap use exceeded 10% of physical RAM"
Discussion of the fossil record on gerrit indicates that the use of the aggregation framework in this context was primarily for convenience:
https:/
Switching over to storing the first and last timestamps in the resource collection directly (and updating these on every sample insert) is not a workable approach, as there are no universal first and last timestamps for a resource that will always be applicable regardless on the constraints on the resource query.
Hence the workable approaches to resolving this issue are:
1. avoid the need for sorting in-memory by ensuring sufficient indices exist on the meter collection (currently the sort instructions for resource retrieval default to timestamp, project_id, user_id all descending)
2. avoid the aggregation framework altogether and instead revert to the equivalent map-reduce
Note that resource retrieval is the only case where the aggregation framework is currently used by the mongodb storage driver.
Changed in ceilometer: | |
milestone: | none → icehouse-2 |
assignee: | nobody → Eoghan Glynn (eglynn) |
importance: | Undecided → High |
status: | New → In Progress |
tags: | added: havana-backport-potential |
Changed in ceilometer: | |
status: | Fix Committed → Fix Released |
tags: | removed: havana-backport-potential |
Changed in ceilometer: | |
milestone: | icehouse-2 → 2014.1 |
Fix proposed to branch: master /review. openstack. org/65962
Review: https:/