Skip to main content

Build high quality synthetic datasets with AI feedback from 200+ LLMs

Project description

OpenPO 🐼

PyPI version License Documentation Python

OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.

Resources Notebooks
Building dataset with OpenPO and PairRM 📔 Notebook

What is OpenPO?

OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and synthesizing them using research-proven methodologies, OpenPO helps developers build better, more fine-tuned language models with minimal effort.

Key Features

  • 🤖 Multiple LLM Support: Collect diverse set of outputs from 200+ LLMs

  • 📊 Research-Backed Evaluation Methods: Support for state-of-art evaluation methods for data synthesis

  • 💾 Flexible Storage: Out of the box storage providers for HuggingFace and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

set your environment variable first

# for completions
export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>

# for evaluations
export OPENAI_API_KEY=<your-openai-api-key>
export ANTHROPIC_API_KEY=<your-anthropic-api-key>

Completion

To get started with collecting LLM responses, simply pass in a list of model names of your choice

[!NOTE] OpenPO requires provider name to be prepended to the model identifier.

import os
from openpo import OpenPO

client = OpenPO()

response = client.completions(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

You can also call models with OpenRouter.

# make request to OpenRouter
client = OpenPO()

response = client.completions(
    models = [
        "openrouter/qwen/qwen-2.5-coder-32b-instruct",
        "openrouter/mistralai/mistral-7b-instruct-v0.3",
        "openrouter/microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completions(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Evaluation

OpenPO offers various ways to synthesize your dataset. To run evaluation, install extra dependencies by running

pip install openpo[eval]

LLM-as-a-Judge

To use single judge to evaluate your response data, use eval_single

client = OpenPO()

res = openpo.eval_single(
    model='openai/gpt-4o',
    data=responses,
)

To use multi judge, use eval_multi

res = openpo.eval_multi(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    data=responses,
)

Pre-trained Models

You can use pre-trained open source evaluation models. OpenPo currently supports two types of models: PairRM and Prometheus2.

[!NOTE] Appropriate hardware with GPU and memory is required to make inference with pre-trained models.

To use PairRM to rank responses:

from openpo import PairRM

pairrm = PairRM()
res = pairrm.eval(prompts, responses)

To use Prometheus2:

from openpo import Prometheus2
from openpo.resources.provider.vllm import VLLM

model = VLLM<(model="prometheus-eval/prometheus-7b-v2.0")
pm = Prometheus2(model=model)

feedback = pm.eval_relative(
    instructions=instructions,
    responses_A=response_A,
    responses_B=response_B,
    rubric='reasoning',
)

Storing Data

Use out of the box storage class to easily upload and download data.

from openpo.storage import HuggingFaceStorage
hf_storage = HuggingFaceStorage(repo_id="my-dataset-repo")

# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(data=preference)

# Load data from repo
data = hf_storage.load_from_repo()

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

  1. Clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
  1. Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
  1. Install dependencies
poetry install

Development Workflow

  1. Create a new branch for your feature
git checkout -b feature-name
  1. Submit a Pull Request
  • Write a clear description of your changes
  • Reference any related issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpo-0.5.2.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openpo-0.5.2-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file openpo-0.5.2.tar.gz.

File metadata

  • Download URL: openpo-0.5.2.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/24.1.0

File hashes

Hashes for openpo-0.5.2.tar.gz
Algorithm Hash digest
SHA256 8b40689b12cc09d88aea716fc0c74f7808a2429ca01304caafb7d3f4b218f134
MD5 785560533acfbbfb8cc5ec6a9a0abd4a
BLAKE2b-256 935eeb56c4163d708bcff94daed9eb6f624707590fa70e3ceac9aa48d1ce7fba

See more details on using hashes here.

File details

Details for the file openpo-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: openpo-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/24.1.0

File hashes

Hashes for openpo-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e1df55ef942662af557b6c5abb9c7d09882a3abb18aa271bbab9f318f51cc15
MD5 9e3fabacd1da49899634e3e215d1d5e5
BLAKE2b-256 b38b02a57bf67263088e8b3d99308be8627c80b56d283e4cf3aaf6e7fae03f0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page