Fluentd "message does not exist" can happen when emitting logs

Bug #2064104 reported by Doug Szumski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Undecided
Doug Szumski

Bug Description

Seen with ElasticSearch output, centralised logging enabled. Example error:

```
"2024-03-14 11:14:36 +0000 [warn]: #0 dump an error event: error_class=ArgumentError error="message does not exist" location=nil tag="fluent.info" time=2024-03-14 11:14:18.336792552 +0000 record={
"Payload"=>"fluentd worker is now running worker=0",
"Hostname"=>"nova-svr2",
"programname"=>"fluent",
"log_level"=>"info",
 "@timestamp"=>"2024-03-14T11:14:18.336792552+00:00"}
```

How can this happen?

The Kolla Ansible Fluentd pipeline is not idempotent. For example, this section which is used to standardise the log message field removes the original field:

```
    <filter fluent.**>
        @type parser
        key_name message
        format /^(?<Payload>.*)$/
    </filter>
```

If a bulk request is sent to Elastic/OpenSearch and not all logs are processed, the default behaviour of Fluentd is to re-emit the unprocessed logs using their original tag. These logs are then re-processed by the Fluentd pipeine. Since the pipeline is not idempotent, this can result in processing errors. In the above example, the 'message' field is no longer present, resulting in the "message does not exist" error.

There are various ways to fix this. Making the pipeline idempotent is not necessary if we use the `retry_tag` in the Elastic/OpenSearch output plugin. Eg. https://github.com/fluent/fluent-plugin-opensearch?tab=readme-ov-file#retry_tag

Doug Szumski (dszumski)
Changed in kolla-ansible:
assignee: nobody → Doug Szumski (dszumski)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)
Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/917434
Committed: https://opendev.org/openstack/kolla-ansible/commit/a2273026d740c1eef5105bcf4e250bd401054d26
Submitter: "Zuul (22348)"
Branch: master

commit a2273026d740c1eef5105bcf4e250bd401054d26
Author: Doug Szumski <email address hidden>
Date: Mon Apr 29 16:13:43 2024 +0100

    Define retry_tag for unprocessed Fluentd logs

    The Kolla Ansible Fluentd config is not idempotent. In practise this
    can be an issue if some logs are rejected by OpenSearch /
    ElasticSearch bulk API. In this case, by default, the unprocessed
    logs will be reprocessed from the start of the Fluentd pipeline,
    leading to error messages.

    The solution proposed here is to explicitly set the retry_tag as
    documented in [1,2], and add a dedicated output for retried logs.

    An alternative fix could be to make the pipeline idempotent. This
    would require a compromise of either duplicating content, or having
    non-standard fields for the log payload/message.

    [1] https://github.com/fluent/fluent-plugin-opensearch?tab=readme-ov-file#retry_tag
    [2] https://github.com/uken/fluent-plugin-elasticsearch?tab=readme-ov-file#retry_tag

    Closes-Bug: #2064104
    Change-Id: I310fc1b8e002ce9f0ba60f8bb67b7f372a589314

Changed in kolla-ansible:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.