Improvements to the default configuration of Kafka
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Juju Charms Collection |
In Progress
|
Undecided
|
Kevin W Monroe |
Bug Description
By default, Kafka will not enable the log cleaner. This means that in a production environment the log output from Kafka will consume all storage space after a while.
This while depends very much on the workload Kafka is used for. If it is fast paced (as we expect from Kafka) this can be a few hours.
From a "security" / "charm acceptance" PoV, a secure opinionated deployment may use the below settings for /opt/kafka/
# The minimum age of a log file to be eligible for deletion
# DEFAULT: log.retention.
log.retention.
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.
# DEFAULT: log.retention.
log.retention.
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
# DEFAULT: log.segment.
log.segment.
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.
# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
# If log.cleaner.
# DEFAULT : log.cleaner.
log.cleaner.
Related branches
Changed in charms: | |
status: | New → In Progress |
tags: | added: not-a-charm |
One may also want to update the log flush policy to enable it by default
####### ####### ####### ####### # Log Flush Policy ####### ####### ####### ####### #
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk interval. messages= 10000
# DEFAULT: Commented
log.flush.
# The maximum amount of time a message can sit in a log before we force a flush interval. ms=1000
# Default: Commented.
log.flush.