[COS] Ceph Grafana dashboard has "no data" panels

Bug #2041500 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Dashboard Charm
New
Undecided
Unassigned

Bug Description

We all know that we have had issues with Ceph Grafana dashboard for some time. e.g.

https://bugs.launchpad.net/charm-ceph-dashboard/+bug/1982910
https://bugs.launchpad.net/charm-ceph-dashboard/+bug/1982912
https://bugs.launchpad.net/charm-ceph-dashboard/+bug/1989648
https://bugs.launchpad.net/charm-ceph-dashboard/+bug/1982537

And that hasn't been changed much even after migrating from Telegraf to node-exporter as part of COS.

This bug is to track the work to make dashboards work with Charmed Ceph. I captured screenshots both with the current JSONs in the charm in https://review.opendev.org/c/openstack/charm-ceph-dashboard/+/896248/5 and the upstream JSONs.

https://drive.google.com/drive/folders/1ds2gSRnOX_L4SfRv7HfptkrI9mA7mZ5Q?usp=sharing

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-high as Ceph (Grafana) dashboard is not functioning.

Nobuto Murata (nobuto)
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

One example of "no data" query is "sum(irate(ceph_osd_recovery_ops[1m]))" regardless it's from the charm or the upstream.

[charm]
https://github.com/openstack/charm-ceph-dashboard/blob/4ee08c02972ba174ba379728e9ab1f045bacd1a4/src/dashboards/ceph-cluster.json#L1426

[upstream]
https://github.com/ceph/ceph/blob/21548fe806cf259deac1421530d5ce720be17997/monitoring/ceph-mixin/dashboards_out/ceph-cluster.json#L1107

That's because the scrape_interval in COS is 1m although Ceph upstream expects 15s, and there are no two data points in the 1m range in the query above as a result.
https://prometheus.io/docs/prometheus/latest/querying/functions/#irate

And customizing the scrape_interval is "strongly discouraged" so a workaround is to use prometheus-scrape-config-k8s charm in the middle.
https://github.com/canonical/prometheus-k8s-operator/blob/16ba0e867b571d17ac8e87af7ab5720228d53d52/lib/charms/prometheus_k8s/v0/prometheus_scrape.py#L172-L190

# LP: #2041500
# the interval is from:
# https://docs.ceph.com/en/latest/mgr/prometheus/#confval-mgr-prometheus-scrape_interval
juju deploy -m cos prometheus-scrape-config-k8s prometheus-scrape-config --config scrape_interval=15s
juju integrate -m cos prometheus:metrics-endpoint prometheus-scrape-config:metrics-endpoint

juju offer -m cos prometheus-scrape-config:configurable-scrape-jobs
juju consume cos.prometheus-scrape-config cos-prometheus-scrape-config
juju integrate ceph-mon:metrics-endpoint cos-prometheus-scrape-config:configurable-scrape-jobs

Revision history for this message
Nobuto Murata (nobuto) wrote :

RBD panels are because of optional metrics by the mgr/prometheus module.
https://bugs.launchpad.net/charm-ceph-mon/+bug/2042405

Revision history for this message
Nobuto Murata (nobuto) wrote :

Fwiw, upstream JSONs are using $__rate_interval and it will be broken due to:
https://github.com/canonical/prometheus-k8s-operator/issues/543

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.