Comment 4 for bug 1899030

Revision history for this message
Drew Freiberger (afreiberger) wrote : Re: Exporter returns malformed or invalid data using latest/edge channel

It appears that the gauges changed from ceph_osds_in to ceph_osd_in and as such, we may have to alter some of the dashboards around this change.

# HELP ceph_osd_in OSD In Status
# TYPE ceph_osd_in gauge
ceph_osd_in{cluster="ceph",device_class="hdd",host="juju-13f892-test-1",osd="osd.2",rack="",root="default"} 1
ceph_osd_in{cluster="ceph",device_class="hdd",host="juju-13f892-test-2",osd="osd.0",rack="",root="default"} 1
ceph_osd_in{cluster="ceph",device_class="hdd",host="juju-13f892-test-3",osd="osd.1",rack="",root="default"} 1

ceph_osds_in{cluster="ceph"} 0

On the dashboard Ceph Cluster, the ceph_osds_up query needs to change to:

sum(ceph_osd_in{job="$job"})

the remnant of ceph_osds_in within the metrics endpoint is odd and I'm thinking it's due to a cache in prometheus-ceph-exporter that survives the restart of the service. I'm attempting to reproduce now by deploying a new p-c-e unit that starts on the edge snap v3.0.0-nautilus.

Here's the upstream code for osd_in and osds_down, but note the lack of osds_in or osds_up.
https://github.com/digitalocean/ceph_exporter/blob/3.0.0-nautilus/collectors/osd.go

The dashboard is already updated to run the sum(ceph_osd_up{job="$job"}), I think the osds_in and out queries just needs the same treatment in the dashboard.