bluestore-block-db-size default size should be at least 4%

Bug #1827463 reported by Ashley Lai
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Wishlist
Unassigned

Bug Description

The ceph-osd charm sets the default bluestore db size at 1024 MB which is too small. The recommended bluestore db size is no less than 4% of the total block size.

https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#sizing
db_partition_size = num_data_disks * 0.04 * disk_size

Default value from the charm:
    # default sizes in MB
    _default_size = {
        'db': 1024,
        'wal': 576,
        'journal': 1024,
    }

Tags: sts
Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Medium
Changed in charm-ceph-osd:
assignee: nobody → Sahid Orentino (sahid-ferdjaoui)
Revision history for this message
Dan Hill (hillpd) wrote :

The 4% recommendation is generally accurate for rdb/cephfs. For object storage, this can trend higher when there are huge numbers of small objects.

Please be aware of Rocksdb limitations here. RocksDB will only place the next level of the dB on flash if there is enough room for the whole level. With ceph's default rocksdb settings, this means that sizes lockstep between roughly 3GB, 30GB, and 300GB [0]. Anything in-between results in wasted space.

[0] https://github.com/facebook/rocksdb/wiki/Leveled-Compaction

Changed in charm-ceph-osd:
milestone: none → 20.08
assignee: Sahid Orentino (sahid-ferdjaoui) → nobody
tags: added: sts
Changed in charm-ceph-osd:
assignee: nobody → Ponnuvel Palaniyappan (pponnuvel)
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

The Octopus documentation on the recommended `db` sizes notes [0]:

"The general recommendation is to have block.db size in between 1% to 4% of block size. For RGW workloads, it is recommended that the block.db size isn’t smaller than 4% of block, because RGW heavily uses it to store its metadata. For example, if the block size is 1TB, then block.db shouldn’t be less than 40GB. For RBD workloads, 1% to 2% of block size is usually enough."

Should we have different default size based whether it's RGW or RBD worklaod?

[0] https://docs.ceph.com/docs/octopus/rados/configuration/bluestore-config-ref/#sizing

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733613

Changed in charm-ceph-osd:
status: Triaged → In Progress
James Page (james-page)
Changed in charm-ceph-osd:
milestone: 20.08 → none
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-osd (master)

Change abandoned by "Chris MacNaughton <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-ceph-osd/+/733613
Reason: Closing this review as it has updates requested for over a year and no updates submitted.

Revision history for this message
Trent Lloyd (lathiat) wrote :

ceph-volume will calculate this for you. If we just change the charm to set no default, it will just use the ceph-volume 4% instead. Then if they want/need to customise to a smaller size, the user can set it explicitly.

Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

I am not sure if ceph-volume calculates 4% wal size. Can't find any reference in the docs.

If it indeed does calculate 4% or simply divides the total space available of the DB device(s) equally across all the OSD devices, then either is probably OK and certainly better than the hardcoded values we have in the charms.

Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :
Changed in charm-ceph-osd:
status: In Progress → New
assignee: Ponnuvel Palaniyappan (pponnuvel) → nobody
Changed in charm-ceph-osd:
status: New → Triaged
Changed in charm-ceph-osd:
importance: Medium → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.