OpenPO 🐼

OpenPO simplifies building synthetic dataset with AI feedback and state-of-art evaluation methods.

Resources	Notebooks
Building dataset with OpenPO and PairRM	📔 Notebook
Using OpenPO with Prometheus 2	📔 Notebook
Evaluating with LLM Judge	📔 Notebook
Building dataset using vLLM	📔 Notebook

Key Features

🤖 Multiple LLM Support: Collect diverse set of outputs from 200+ LLMs
⚡ High Performance Inference: Native vLLM support for optimized inference
🚀 Scalable Processing: Built-in batch processing capabilities for efficient large-scale data generation
📊 Research-Backed Evaluation Methods: Support for state-of-art evaluation methods for data synthesis
💾 Flexible Storage: Out of the box storage providers for HuggingFace and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

# to use vllm
pip install openpo[vllm]

# for running evaluation models
pip install openpo[eval]

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

set your environment variable first

# for completions
export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>

# for evaluations
export OPENAI_API_KEY=<your-openai-api-key>
export ANTHROPIC_API_KEY=<your-anthropic-api-key>

Completion

To get started with collecting LLM responses, simply pass in a list of model names of your choice

Note

OpenPO requires provider name to be prepended to the model identifier.

import os
from openpo import OpenPO

client = OpenPO()

response = client.completion.generate(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

You can also call models with OpenRouter.

# make request to OpenRouter
client = OpenPO()

response = client.completion.generate(
    models = [
        "openrouter/qwen/qwen-2.5-coder-32b-instruct",
        "openrouter/mistralai/mistral-7b-instruct-v0.3",
        "openrouter/microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completion.generate(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Evaluation

OpenPO offers various ways to synthesize your dataset.

LLM-as-a-Judge

To use single judge to evaluate your response data, use evaluate.eval

client = OpenPO()

res = openpo.evaluate.eval(
    models=['openai/gpt-4o'],
    questions=questions,
    responses=responses,
)

To use multi judge, pass multiple judge models

res_a, res_b = openpo.evaluate.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

# get consensus for multi judge responses.
result = openpo.evaluate.get_consensus(
    eval_A=res_a,
    eval_B=res_b,
)

OpnePO supports batch processing for evaluating large dataset in a cost-effective way.

Note

Batch processing is an asynchronous operation and could take up to 24 hours (usually completes much faster).

info = openpo.batch.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

# check status
status = openpo.batch.check_status(batch_id=info.id)

For multi-judge with batch processing:

batch_a, batch_b = openpo.batch.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

result = openpo.batch.get_consensus(
    batch_A=batch_a_result,
    batch_B=batch_b_result,
)

Pre-trained Models

You can use pre-trained open source evaluation models. OpenPo currently supports two types of models: PairRM and Prometheus2.

Note

Appropriate hardware with GPU and memory is required to make inference with pre-trained models.

To use PairRM to rank responses:

from openpo import PairRM

pairrm = PairRM()
res = pairrm.eval(prompts, responses)

To use Prometheus2:

from openpo import Prometheus2

pm = Prometheus2(model="prometheus-eval/prometheus-7b-v2.0")

feedback = pm.eval_relative(
    instructions=instructions,
    responses_A=response_A,
    responses_B=response_B,
    rubric='reasoning',
)

Storing Data

Use out of the box storage class to easily upload and download data.

from openpo.storage import HuggingFaceStorage
hf_storage = HuggingFaceStorage()

# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(repo_id="my-hf-repo", data=preference)

# Load data from repo
data = hf_storage.load_from_repo(path="my-hf-repo")

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

Clone the repository

git clone https://github.com/yourusername/openpo.git
cd openpo

Install Poetry (dependency management tool)

curl -sSL https://install.python-poetry.org | python3 -

Install dependencies

poetry install

Development Workflow

Create a new branch for your feature

git checkout -b feature-name

Submit a Pull Request

Write a clear description of your changes
Reference any related issues

Name		Name	Last commit message	Last commit date
Latest commit History 260 Commits
.github/workflows		.github/workflows
docs		docs
openpo		openpo
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenPO 🐼

Key Features

Installation

Install from PyPI (recommended)

Install from source

Getting Started

Completion

Evaluation

LLM-as-a-Judge

Pre-trained Models

Storing Data

Contributing

Development Setup

Development Workflow

About

Releases

Packages

Contributors 2

Languages

License

dannylee1020/openpo

Folders and files

Latest commit

History

Repository files navigation

OpenPO 🐼

Key Features

Installation

Install from PyPI (recommended)

Install from source

Getting Started

Completion

Evaluation

LLM-as-a-Judge

Pre-trained Models

Storing Data

Contributing

Development Setup

Development Workflow

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages