OpenSearch indexing error causes lost messages #21282

drewmiranda-gl · 2025-01-07T15:54:59Z

Certain conditions may cause OpenSearch to respond with an exception when graylog attempts to perform bulk indexing of messages:

ERROR (MessagesAdapterOS2) Failed to index [<count>] messages. Please check the index error log in your web interface for the reason. Error: failure in bulk execution:
[7]: index [<index_name>], id [<id>], message [OpenSearchException[OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [indices:
data/write/bulk[s]] would be 32371387766/30.1gb], which is larger than the limit of [316216967716/29.4gb], real usage: [32370871568/30.1gb], new byes reserved: [516198/504kb], usages [request=0/0b, 
fielddata=4805648447/4.4gb, in_flight_requests=516606/504.4kb]]]

Messages are logged in Processing and Indexing Failures stream, however, Graylog does not retry the batch and messages are essentially lost forever.

Expected Behavior

Graylog will retry failed bulk indexing requests if they fail with an exception.

Current Behavior

Graylog does not retry failed bulk indexing requests unless they fail with one of the following:

Possible Solution

Add more robust exception and retry handling.

Steps to Reproduce (for bugs)

I've not found a good way to reliably reproduce this.

Context

When talking with user it appears that a Graylog index rotation, which causes an index optimization (Force Merge) may be putting memory pressure on OpenSearch nodes. For context this issue as reported by the user first started a few months back when migrating from Elasticsearch to OpenSearch, despite the cluster sizing and resources being the same.

Please let me know if there are any questions or if you wish to discuss this further via a call.

Your Environment

Graylog Version: 6.x
OpenSearch Version: 2.x

The text was updated successfully, but these errors were encountered:

drewmiranda-gl · 2025-01-07T19:39:40Z

I've reached out to the user to confirm the following:

Graylog Version:
Java Version:
OpenSearch Version:
MongoDB Version:
Operating System:

drewmiranda-gl · 2025-01-08T21:39:35Z

Info provided by user:

Graylog Version: 6.0.5
Java Version: Bundled with Graylog, openjdk version "17.0.12" 2024-07-16
OpenSearch Version: 2.15.0
MongoDB Version: 6.0.12
Operating System: Red Hat Enterprise Linux release 9.5 (Plow)

drewmiranda-gl added the bug label Jan 7, 2025

dennisoelkers self-assigned this Jan 8, 2025

dennisoelkers linked a pull request Jan 10, 2025 that will close this issue

Retry individual messages/requests when failing with 429/Data too large. #21313

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSearch indexing error causes lost messages #21282

OpenSearch indexing error causes lost messages #21282

drewmiranda-gl commented Jan 7, 2025

drewmiranda-gl commented Jan 7, 2025

drewmiranda-gl commented Jan 8, 2025

OpenSearch indexing error causes lost messages #21282

OpenSearch indexing error causes lost messages #21282

Comments

drewmiranda-gl commented Jan 7, 2025

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

drewmiranda-gl commented Jan 7, 2025

drewmiranda-gl commented Jan 8, 2025