OpenPO simplifies building synthetic dataset with AI feedback and state-of-art evaluation methods.
Resources | Notebooks |
---|---|
Building dataset with OpenPO and PairRM | π Notebook |
Using OpenPO with Prometheus 2 | π Notebook |
Evaluating with LLM Judge | π Notebook |
Building dataset using vLLM | π Notebook |
-
π€ Multiple LLM Support: Collect diverse set of outputs from 200+ LLMs
-
β‘ High Performance Inference: Native vLLM support for optimized inference
-
π Scalable Processing: Built-in batch processing capabilities for efficient large-scale data generation
-
π Research-Backed Evaluation Methods: Support for state-of-art evaluation methods for data synthesis
-
πΎ Flexible Storage: Out of the box storage providers for HuggingFace and S3.
OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:
pip install openpo
# to use vllm
pip install openpo[vllm]
# for running evaluation models
pip install openpo[eval]
Clone the repository first then run the follow command
cd openpo
poetry install
set your environment variable first
# for completions
export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>
# for evaluations
export OPENAI_API_KEY=<your-openai-api-key>
export ANTHROPIC_API_KEY=<your-anthropic-api-key>
To get started with collecting LLM responses, simply pass in a list of model names of your choice
Note
OpenPO requires provider name to be prepended to the model identifier.
import os
from openpo import OpenPO
client = OpenPO()
response = client.completion.generate(
models = [
"huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
"huggingface/mistralai/Mistral-7B-Instruct-v0.3",
"huggingface/microsoft/Phi-3.5-mini-instruct",
],
messages=[
{"role": "system", "content": PROMPT},
{"role": "system", "content": MESSAGE},
],
)
You can also call models with OpenRouter.
# make request to OpenRouter
client = OpenPO()
response = client.completion.generate(
models = [
"openrouter/qwen/qwen-2.5-coder-32b-instruct",
"openrouter/mistralai/mistral-7b-instruct-v0.3",
"openrouter/microsoft/phi-3.5-mini-128k-instruct",
],
messages=[
{"role": "system", "content": PROMPT},
{"role": "system", "content": MESSAGE},
],
)
OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.
response = client.completion.generate(
models = [
"huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
"huggingface/mistralai/Mistral-7B-Instruct-v0.3",
"huggingface/microsoft/Phi-3.5-mini-instruct",
],
messages=[
{"role": "system", "content": PROMPT},
{"role": "system", "content": MESSAGE},
],
params={
"max_tokens": 500,
"temperature": 1.0,
}
)
OpenPO offers various ways to synthesize your dataset.
To use single judge to evaluate your response data, use evaluate.eval
client = OpenPO()
res = openpo.evaluate.eval(
models=['openai/gpt-4o'],
questions=questions,
responses=responses,
)
To use multi judge, pass multiple judge models
res_a, res_b = openpo.evaluate.eval(
models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
questions=questions,
responses=responses,
)
# get consensus for multi judge responses.
result = openpo.evaluate.get_consensus(
eval_A=res_a,
eval_B=res_b,
)
OpnePO supports batch processing for evaluating large dataset in a cost-effective way.
Note
Batch processing is an asynchronous operation and could take up to 24 hours (usually completes much faster).
info = openpo.batch.eval(
models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
questions=questions,
responses=responses,
)
# check status
status = openpo.batch.check_status(batch_id=info.id)
For multi-judge with batch processing:
batch_a, batch_b = openpo.batch.eval(
models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
questions=questions,
responses=responses,
)
result = openpo.batch.get_consensus(
batch_A=batch_a_result,
batch_B=batch_b_result,
)
You can use pre-trained open source evaluation models. OpenPo currently supports two types of models: PairRM
and Prometheus2
.
Note
Appropriate hardware with GPU and memory is required to make inference with pre-trained models.
To use PairRM to rank responses:
from openpo import PairRM
pairrm = PairRM()
res = pairrm.eval(prompts, responses)
To use Prometheus2:
from openpo import Prometheus2
pm = Prometheus2(model="prometheus-eval/prometheus-7b-v2.0")
feedback = pm.eval_relative(
instructions=instructions,
responses_A=response_A,
responses_B=response_B,
rubric='reasoning',
)
Use out of the box storage class to easily upload and download data.
from openpo.storage import HuggingFaceStorage
hf_storage = HuggingFaceStorage()
# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(repo_id="my-hf-repo", data=preference)
# Load data from repo
data = hf_storage.load_from_repo(path="my-hf-repo")
Contributions are what makes open source amazingly special! Here's how you can help:
- Clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
- Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
- Install dependencies
poetry install
- Create a new branch for your feature
git checkout -b feature-name
- Submit a Pull Request
- Write a clear description of your changes
- Reference any related issues