Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch indexing error causes lost messages #21282

Open
drewmiranda-gl opened this issue Jan 7, 2025 · 2 comments · May be fixed by #21313
Open

OpenSearch indexing error causes lost messages #21282

drewmiranda-gl opened this issue Jan 7, 2025 · 2 comments · May be fixed by #21313
Assignees
Labels

Comments

@drewmiranda-gl
Copy link
Member

Certain conditions may cause OpenSearch to respond with an exception when graylog attempts to perform bulk indexing of messages:

ERROR (MessagesAdapterOS2) Failed to index [<count>] messages. Please check the index error log in your web interface for the reason. Error: failure in bulk execution:
[7]: index [<index_name>], id [<id>], message [OpenSearchException[OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [indices:
data/write/bulk[s]] would be 32371387766/30.1gb], which is larger than the limit of [316216967716/29.4gb], real usage: [32370871568/30.1gb], new byes reserved: [516198/504kb], usages [request=0/0b, 
fielddata=4805648447/4.4gb, in_flight_requests=516606/504.4kb]]]

Messages are logged in Processing and Indexing Failures stream, however, Graylog does not retry the batch and messages are essentially lost forever.

Expected Behavior

Graylog will retry failed bulk indexing requests if they fail with an exception.

Current Behavior

Graylog does not retry failed bulk indexing requests unless they fail with one of the following:

Possible Solution

Add more robust exception and retry handling.

Steps to Reproduce (for bugs)

I've not found a good way to reliably reproduce this.

Context

When talking with user it appears that a Graylog index rotation, which causes an index optimization (Force Merge) may be putting memory pressure on OpenSearch nodes. For context this issue as reported by the user first started a few months back when migrating from Elasticsearch to OpenSearch, despite the cluster sizing and resources being the same.

Please let me know if there are any questions or if you wish to discuss this further via a call.

Your Environment

  • Graylog Version: 6.x
  • OpenSearch Version: 2.x
@drewmiranda-gl
Copy link
Member Author

I've reached out to the user to confirm the following:

  • Graylog Version:
  • Java Version:
  • OpenSearch Version:
  • MongoDB Version:
  • Operating System:

@dennisoelkers dennisoelkers self-assigned this Jan 8, 2025
@drewmiranda-gl
Copy link
Member Author

Info provided by user:

  • Graylog Version: 6.0.5
  • Java Version: Bundled with Graylog, openjdk version "17.0.12" 2024-07-16
  • OpenSearch Version: 2.15.0
  • MongoDB Version: 6.0.12
  • Operating System: Red Hat Enterprise Linux release 9.5 (Plow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants