Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stress testing framework, with basic metrics example to demonstrate. #3241

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Jan 10, 2025

Changes

This PR adds a basic stress testing framework to validate the scalability and reliability of the functionality under high-concurrency and long-running workloads. Unlike Google Benchmark, which focuses on micro-benchmarking and latency measurements for isolated operations, this framework tries to simulate sustained, multi-threaded workloads to test a given workload. The idea is to complement the existing benchmarks by adding stress-tests to addressing long-duration and high-concurrency use-cases.

This is already implemented for .Net and Rust, and most of the ideas are taken from there. I felt the need for this to test some optimizations I am doing for metrics, but feel to comment if this doesn't seem helpful.

Also added a basic stress-testing example for metrics to demonstrate. Below are the results from the metrics stress test as an example:

$ ./stress_metrics
Starting stress test with 16 threads...
Throughput: 5009490 it/s | Avg: 4885764 | Min: 4734280 | Max: 5132395
 
Test completed:
Total iterations: 203373637
Duration: 42 seconds
Average throughput: 4885764 iterations/sec
$

It’s still in the early stages and will need further enhancements but should be a good starting point. Future improvements could include adding memory and CPU usage information alongside the existing throughput, as well as refining the initial warm-up period to sustain consistent data collection.

Implementation Details:

Worker Threads:
- The worker threads (default to number of cores) are spawned to execute the workload.
- Each worker thread executes the workload function (func) in a loop until a global STOP flag is set. (ctrl-c)
- Each thread maintains its own iteration count to minimize contention.

Throughput Monitoring:
- A separate controller thread monitors throughput by periodically summing up iteration counts across threads.
- Throughput is calculated over a sliding window (SLIDING_WINDOW_SIZE) and displayed dynamically.

Final Summary:
- At the end of the test, the program calculates and prints the total iterations, duration, and average throughput.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@lalitb lalitb requested a review from a team as a code owner January 10, 2025 19:30
@lalitb lalitb marked this pull request as draft January 10, 2025 19:30
Copy link

netlify bot commented Jan 10, 2025

Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name Link
🔨 Latest commit eead3a0
🔍 Latest deploy log https://app.netlify.com/sites/opentelemetry-cpp-api-docs/deploys/6784eef5722f2a000895043d

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.73%. Comparing base (d693e95) to head (eead3a0).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3241   +/-   ##
=======================================
  Coverage   87.73%   87.73%           
=======================================
  Files         198      198           
  Lines        6258     6258           
=======================================
  Hits         5490     5490           
  Misses        768      768           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant