You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Implement Dynamic Adjustment of flush_timeout and bulk_size for OpenSearch Sink.
Currently, the OpenSearch sink in Data Prepper uses static values for flush_timeout and bulk_size. This can lead to suboptimal performance in varying traffic conditions. We need to implement a dynamic adjustment mechanism for these properties based on the observed traffic pattern.
Describe the solution you'd like
Implement Adaptive Batching:
Instead of adjusting flush_timeout and bulk_size, implement an adaptive batching mechanism that dynamically adjusts the batch size based on the current traffic rate. This could involve starting with smaller batches and gradually increasing the size as traffic increases and vice-versa.
Monitor the incoming traffic rate to the OpenSearch sink.
For high TPS (Transactions Per Second) scenarios:
Use default values for flush_timeout and bulk_size to optimize for throughput.
For low TPS or sporadic request patterns:
Set flush_timeout to -1 for immediate flushing.
Potentially reduce bulk_size to ensure timely processing of smaller batches.
Benefits:
Improved performance across varying traffic patterns.
Reduced latency for low-traffic scenarios.
Better resource utilization during high-traffic periods.
Enhanced user experience without manual configuration changes.
Potential Challenges:
Determining optimal thresholds and adjustment strategies.
Ensuring thread-safety in dynamic property adjustments.
Avoiding frequent oscillations in settings.
Describe alternatives you've considered (Optional)
Traffic-Based Worker Scaling:
Implement a system that scales the number of worker threads processing the sink based on the incoming traffic. This could help manage both high and low traffic scenarios without changing the flush_timeout or bulk_size.
Time-Based Flushing with Backpressure:
Instead of using a fixed flush_timeout, implement a time-based flushing mechanism with backpressure. This would flush based on time for low traffic, but could delay flushing if the system is under high load, effectively adapting to traffic patterns.
Machine Learning-Based Prediction:
Implement a machine learning model that predicts traffic patterns and adjusts sink parameters proactively, rather than reactively. This could be particularly effective for systems with recurring traffic patterns.
Hybrid Approach:
Combine multiple strategies, such as using adaptive batching for high-traffic scenarios and immediate flushing for low-traffic periods, switching between modes based on observed patterns.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Implement Dynamic Adjustment of flush_timeout and bulk_size for OpenSearch Sink.
Currently, the OpenSearch sink in Data Prepper uses static values for flush_timeout and bulk_size. This can lead to suboptimal performance in varying traffic conditions. We need to implement a dynamic adjustment mechanism for these properties based on the observed traffic pattern.
Describe the solution you'd like
Implement Adaptive Batching:
Instead of adjusting flush_timeout and bulk_size, implement an adaptive batching mechanism that dynamically adjusts the batch size based on the current traffic rate. This could involve starting with smaller batches and gradually increasing the size as traffic increases and vice-versa.
Benefits:
Potential Challenges:
Describe alternatives you've considered (Optional)
Traffic-Based Worker Scaling:
Implement a system that scales the number of worker threads processing the sink based on the incoming traffic. This could help manage both high and low traffic scenarios without changing the flush_timeout or bulk_size.
Time-Based Flushing with Backpressure:
Instead of using a fixed flush_timeout, implement a time-based flushing mechanism with backpressure. This would flush based on time for low traffic, but could delay flushing if the system is under high load, effectively adapting to traffic patterns.
Machine Learning-Based Prediction:
Implement a machine learning model that predicts traffic patterns and adjusts sink parameters proactively, rather than reactively. This could be particularly effective for systems with recurring traffic patterns.
Hybrid Approach:
Combine multiple strategies, such as using adaptive batching for high-traffic scenarios and immediate flushing for low-traffic periods, switching between modes based on observed patterns.
The text was updated successfully, but these errors were encountered: