You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inference-perf proposal doc describes many vital components for its functioning. This document recommends building some of this capability on top of an existing mature load gen tooling in k6. Given the current requirements and constraints, a k6 based wrapper design can be hugely beneficial to quickly build and provide the following capabilities from the initial proposal.
Load Generator
Load Generator is the component which generates different traffic patterns based on user input. K6 can generate fixed or custom load pattern for a defined duration as deemed necessary for the requirement.
Request Processor
Request Processor provides a way to support different model servers and their corresponding request payload with different configurable parameters. K6 supports http and grpc based request for direct and distributed testing.
Response Processor / Data Collector
Response Processor / Data Collector component allows us to process the response and measure the actual performance of the model server in terms of request latency, TPOT, TTFT and throughput. K6 scripting can be leveraged for advanced data/metrics computation.
Report Generator / Metrics Exporter
Report Generator / Metrics Exporter generates a report based on the data collected during benchmarking. It can also export the different metrics that we collected during benchmarking as metrics into Prometheus which can then be consumed by other monitoring or visualization solutions. k6 supports real-time metrics streaming to services like Prometheus, New Relic etc.
Key Benefits
Key advantages of building on top of k6
Existing mature OSS ecosystem
Support for custom load generation patterns
Support for HTTP and GRPC request processing
Built-in kubernetes based distributed testing and associated k8s operator
Real-time metrics collection and export to a variety of data stores
Many built-in memory optimizations (like ability to discard response bodies)
The text was updated successfully, but these errors were encountered:
Like the idea of using a well-tested loadgen. But we need to make sure that the core benchmarking library is python based and can be used as such if needed. I'm not sure if we can instrument k6 loadgen via python. But I would be interested in learning more and discussing the options we have.
Yes, with this proposal the benchmarking library can be Python-based. There are many reasons to prefer Python for this project data manipulation, tokenization, reporting, etc. and k6 can merely bring an underlying set of utilities aimed at simply load design and request processing. Such a model would help us leverage the best of both worlds.
In many load generation cases, a single node cannot process/maintain production-grade loads, especially long-context loads with LLMs, and in such cases distributed testing becomes a necessity. Further, we have seen from the initial project proposal too that distributed testing on Kubernetes would be a key differentiating factor. Many existing LLM perf tools lack in this specific area. A huge benefit of using k6 here would be the distributed testing that we get out of the box with minimal lift. There are also additional extensions to script in "python" if needed. But the key is to leverage the right set of tools.
Inference-perf proposal doc describes many vital components for its functioning. This document recommends building some of this capability on top of an existing mature load gen tooling in k6. Given the current requirements and constraints, a k6 based wrapper design can be hugely beneficial to quickly build and provide the following capabilities from the initial proposal.
Load Generator
Load Generator is the component which generates different traffic patterns based on user input. K6 can generate fixed or custom load pattern for a defined duration as deemed necessary for the requirement.
Request Processor
Request Processor provides a way to support different model servers and their corresponding request payload with different configurable parameters. K6 supports http and grpc based request for direct and distributed testing.
Response Processor / Data Collector
Response Processor / Data Collector component allows us to process the response and measure the actual performance of the model server in terms of request latency, TPOT, TTFT and throughput. K6 scripting can be leveraged for advanced data/metrics computation.
Report Generator / Metrics Exporter
Report Generator / Metrics Exporter generates a report based on the data collected during benchmarking. It can also export the different metrics that we collected during benchmarking as metrics into Prometheus which can then be consumed by other monitoring or visualization solutions. k6 supports real-time metrics streaming to services like Prometheus, New Relic etc.
Key Benefits
Key advantages of building on top of k6
The text was updated successfully, but these errors were encountered: