Skip to content

dannylee1020/openpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenPO 🐼

PyPI version License Documentation

OpenPO simplifies building synthetic dataset with AI feedback and state-of-art evaluation methods.

Resources Notebooks
Building dataset with OpenPO and PairRM πŸ“” Notebook
Using OpenPO with Prometheus 2 πŸ“” Notebook
Evaluating with LLM Judge πŸ“” Notebook
Building dataset using vLLM πŸ“” Notebook

Key Features

  • πŸ€– Multiple LLM Support: Collect diverse set of outputs from 200+ LLMs

  • ⚑ High Performance Inference: Native vLLM support for optimized inference

  • πŸš€ Scalable Processing: Built-in batch processing capabilities for efficient large-scale data generation

  • πŸ“Š Research-Backed Evaluation Methods: Support for state-of-art evaluation methods for data synthesis

  • πŸ’Ύ Flexible Storage: Out of the box storage providers for HuggingFace and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

# to use vllm
pip install openpo[vllm]

# for running evaluation models
pip install openpo[eval]

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

set your environment variable first

# for completions
export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>

# for evaluations
export OPENAI_API_KEY=<your-openai-api-key>
export ANTHROPIC_API_KEY=<your-anthropic-api-key>

Completion

To get started with collecting LLM responses, simply pass in a list of model names of your choice

Note

OpenPO requires provider name to be prepended to the model identifier.

import os
from openpo import OpenPO

client = OpenPO()

response = client.completion.generate(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

You can also call models with OpenRouter.

# make request to OpenRouter
client = OpenPO()

response = client.completion.generate(
    models = [
        "openrouter/qwen/qwen-2.5-coder-32b-instruct",
        "openrouter/mistralai/mistral-7b-instruct-v0.3",
        "openrouter/microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completion.generate(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Evaluation

OpenPO offers various ways to synthesize your dataset.

LLM-as-a-Judge

To use single judge to evaluate your response data, use evaluate.eval

client = OpenPO()

res = openpo.evaluate.eval(
    models=['openai/gpt-4o'],
    questions=questions,
    responses=responses,
)

To use multi judge, pass multiple judge models

res_a, res_b = openpo.evaluate.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

# get consensus for multi judge responses.
result = openpo.evaluate.get_consensus(
    eval_A=res_a,
    eval_B=res_b,
)

OpnePO supports batch processing for evaluating large dataset in a cost-effective way.

Note

Batch processing is an asynchronous operation and could take up to 24 hours (usually completes much faster).

info = openpo.batch.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

# check status
status = openpo.batch.check_status(batch_id=info.id)

For multi-judge with batch processing:

batch_a, batch_b = openpo.batch.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

result = openpo.batch.get_consensus(
    batch_A=batch_a_result,
    batch_B=batch_b_result,
)

Pre-trained Models

You can use pre-trained open source evaluation models. OpenPo currently supports two types of models: PairRM and Prometheus2.

Note

Appropriate hardware with GPU and memory is required to make inference with pre-trained models.

To use PairRM to rank responses:

from openpo import PairRM

pairrm = PairRM()
res = pairrm.eval(prompts, responses)

To use Prometheus2:

from openpo import Prometheus2

pm = Prometheus2(model="prometheus-eval/prometheus-7b-v2.0")

feedback = pm.eval_relative(
    instructions=instructions,
    responses_A=response_A,
    responses_B=response_B,
    rubric='reasoning',
)

Storing Data

Use out of the box storage class to easily upload and download data.

from openpo.storage import HuggingFaceStorage
hf_storage = HuggingFaceStorage()

# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(repo_id="my-hf-repo", data=preference)

# Load data from repo
data = hf_storage.load_from_repo(path="my-hf-repo")

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

  1. Clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
  1. Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
  1. Install dependencies
poetry install

Development Workflow

  1. Create a new branch for your feature
git checkout -b feature-name
  1. Submit a Pull Request
  • Write a clear description of your changes
  • Reference any related issues