Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool #2

SachinVarghese · 2025-01-20T15:56:29Z

Inference-perf proposal doc describes many vital components for its functioning. This document recommends building some of this capability on top of an existing mature load gen tooling in k6. Given the current requirements and constraints, a k6 based wrapper design can be hugely beneficial to quickly build and provide the following capabilities from the initial proposal.

Load Generator
Load Generator is the component which generates different traffic patterns based on user input. K6 can generate fixed or custom load pattern for a defined duration as deemed necessary for the requirement.

Request Processor
Request Processor provides a way to support different model servers and their corresponding request payload with different configurable parameters. K6 supports http and grpc based request for direct and distributed testing.

Response Processor / Data Collector
Response Processor / Data Collector component allows us to process the response and measure the actual performance of the model server in terms of request latency, TPOT, TTFT and throughput. K6 scripting can be leveraged for advanced data/metrics computation.

Report Generator / Metrics Exporter
Report Generator / Metrics Exporter generates a report based on the data collected during benchmarking. It can also export the different metrics that we collected during benchmarking as metrics into Prometheus which can then be consumed by other monitoring or visualization solutions. k6 supports real-time metrics streaming to services like Prometheus, New Relic etc.

Key Benefits

Key advantages of building on top of k6

Existing mature OSS ecosystem
Support for custom load generation patterns
Support for HTTP and GRPC request processing
Built-in kubernetes based distributed testing and associated k8s operator
Real-time metrics collection and export to a variety of data stores
Many built-in memory optimizations (like ability to discard response bodies)

SachinVarghese · 2025-01-22T21:46:37Z

Examples from the industry: Huggingface TGI uses k6 for benchmarking results

achandrasekar · 2025-01-23T04:11:03Z

Like the idea of using a well-tested loadgen. But we need to make sure that the core benchmarking library is python based and can be used as such if needed. I'm not sure if we can instrument k6 loadgen via python. But I would be interested in learning more and discussing the options we have.

SachinVarghese · 2025-01-23T22:47:02Z

Yes, with this proposal the benchmarking library can be Python-based. There are many reasons to prefer Python for this project data manipulation, tokenization, reporting, etc. and k6 can merely bring an underlying set of utilities aimed at simply load design and request processing. Such a model would help us leverage the best of both worlds.

In many load generation cases, a single node cannot process/maintain production-grade loads, especially long-context loads with LLMs, and in such cases distributed testing becomes a necessity. Further, we have seen from the initial project proposal too that distributed testing on Kubernetes would be a key differentiating factor. Many existing LLM perf tools lack in this specific area. A huge benefit of using k6 here would be the distributed testing that we get out of the box with minimal lift. There are also additional extensions to script in "python" if needed. But the key is to leverage the right set of tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool #2

Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool #2

SachinVarghese commented Jan 20, 2025

SachinVarghese commented Jan 22, 2025

achandrasekar commented Jan 23, 2025

SachinVarghese commented Jan 23, 2025

Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool #2

Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool #2

Comments

SachinVarghese commented Jan 20, 2025

Key Benefits

SachinVarghese commented Jan 22, 2025

achandrasekar commented Jan 23, 2025

SachinVarghese commented Jan 23, 2025