Build high quality synthetic datasets with AI feedback from 200+ LLMs

These details have not been verified by PyPI

Project links

Project description

OpenPO 🐼

Python

OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.

Resources	Notebooks
Building dataset with OpenPO and PairRM	📔 Notebook
Using OpenPO with Prometheus 2	📔 Notebook
Evaluating with LLM-as-a-Judge	📔 Notebook

What is OpenPO?

OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and synthesizing them using research-proven methodologies, OpenPO helps developers build better, more fine-tuned language models with minimal effort.

Key Features

🤖 Multiple LLM Support: Collect diverse set of outputs from 200+ LLMs
📊 Research-Backed Evaluation Methods: Support for state-of-art evaluation methods for data synthesis
💾 Flexible Storage: Out of the box storage providers for HuggingFace and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

set your environment variable first

# for completions
export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>

# for evaluations
export OPENAI_API_KEY=<your-openai-api-key>
export ANTHROPIC_API_KEY=<your-anthropic-api-key>

Completion

To get started with collecting LLM responses, simply pass in a list of model names of your choice

[!NOTE] OpenPO requires provider name to be prepended to the model identifier.

import os
from openpo import OpenPO

client = OpenPO()

response = client.completions(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

You can also call models with OpenRouter.

# make request to OpenRouter
client = OpenPO()

response = client.completions(
    models = [
        "openrouter/qwen/qwen-2.5-coder-32b-instruct",
        "openrouter/mistralai/mistral-7b-instruct-v0.3",
        "openrouter/microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completions(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Evaluation

OpenPO offers various ways to synthesize your dataset. To run evaluation, first install extra dependencies by running

pip install openpo[eval]

LLM-as-a-Judge

To use single judge to evaluate your response data, use evaluate.eval

client = OpenPO()

res = openpo.evaluate.eval(
    models=['openai/gpt-4o'],
    questions=questions,
    responses=responses,
)

To use multi judge, pass multiple judge models

res_a, res_b = openpo.evaluate.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

# get consensus for multi judge responses.
result = openpo.evaluate.get_consensus(
    eval_A=res_a,
    eval_B=res_b,
)

OpnePO supports batch processing for evaluating large dataset in a cost-effective way.

[!NOTE] Batch processing is an asynchronous operation and could take up to 24 hours (usually completes much faster).

info = openpo.batch.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

# check status
status = openpo.batch.check_status(batch_id=info.id)

For multi-judge with batch processing:

batch_a, batch_b = openpo.batch.eval(
    models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
    questions=questions,
    responses=responses,
)

result = openpo.batch.get_consensus(
    batch_A=batch_a_result,
    batch_B=batch_b_result,
)

Pre-trained Models

You can use pre-trained open source evaluation models. OpenPo currently supports two types of models: PairRM and Prometheus2.

[!NOTE] Appropriate hardware with GPU and memory is required to make inference with pre-trained models.

To use PairRM to rank responses:

from openpo import PairRM

pairrm = PairRM()
res = pairrm.eval(prompts, responses)

To use Prometheus2:

from openpo import Prometheus2
from openpo.resources.provider import VLLM

model = VLLM<(model="prometheus-eval/prometheus-7b-v2.0")
pm = Prometheus2(model=model)

feedback = pm.eval_relative(
    instructions=instructions,
    responses_A=response_A,
    responses_B=response_B,
    rubric='reasoning',
)

Storing Data

Use out of the box storage class to easily upload and download data.

from openpo.storage import HuggingFaceStorage
hf_storage = HuggingFaceStorage()

# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(repo_id="my-hf-repo", data=preference)

# Load data from repo
data = hf_storage.load_from_repo(path="my-hf-repo")

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

Clone the repository

git clone https://github.com/yourusername/openpo.git
cd openpo

Install Poetry (dependency management tool)

curl -sSL https://install.python-poetry.org | python3 -

Install dependencies

poetry install

Development Workflow

Create a new branch for your feature

git checkout -b feature-name

Submit a Pull Request

Write a clear description of your changes
Reference any related issues

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.7

Dec 26, 2024

0.7.6

Dec 25, 2024

0.7.5

Dec 25, 2024

0.7.4

Dec 25, 2024

0.7.3

Dec 25, 2024

0.7.2

Dec 25, 2024

0.7.1

Dec 25, 2024

0.7.0

Dec 25, 2024

0.6.5

Dec 21, 2024

This version

0.6.4

Dec 21, 2024

0.6.3

Dec 21, 2024

0.6.2

Dec 20, 2024

0.6.1

Dec 20, 2024

0.6.0

Dec 18, 2024

0.5.13

Dec 16, 2024

0.5.12

Dec 16, 2024

0.5.11

Dec 16, 2024

0.5.10

Dec 16, 2024

0.5.9

Dec 15, 2024

0.5.8

Dec 15, 2024

0.5.7

Dec 15, 2024

0.5.6

Dec 15, 2024

0.5.5

Dec 14, 2024

0.5.4

Dec 14, 2024

0.5.3

Dec 13, 2024

0.5.2

Dec 13, 2024

0.5.1

Dec 13, 2024

0.5.0

Dec 13, 2024

0.4.2

Dec 11, 2024

0.4.1

Dec 10, 2024

0.4.0

Dec 6, 2024

0.3.0

Dec 3, 2024

0.2.0

Nov 26, 2024

0.1.2

Nov 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpo-0.6.4.tar.gz (24.9 kB view details)

Uploaded Dec 21, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openpo-0.6.4-py3-none-any.whl (39.0 kB view details)

Uploaded Dec 21, 2024 Python 3

File details

Details for the file openpo-0.6.4.tar.gz.

File metadata

Download URL: openpo-0.6.4.tar.gz
Upload date: Dec 21, 2024
Size: 24.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/24.1.0

File hashes

Hashes for openpo-0.6.4.tar.gz
Algorithm	Hash digest
SHA256	`8ee6ce8a57490d9bddc07d44f5664a38337bc0b3ad698b2835463074b446be0b`
MD5	`cbcf66ef3957ff34ddbd772538873c97`
BLAKE2b-256	`d323f1901c1934b11c351c6c9db555a761aef47549c6c34ff9fa479a4f270695`

See more details on using hashes here.

File details

Details for the file openpo-0.6.4-py3-none-any.whl.

File metadata

Download URL: openpo-0.6.4-py3-none-any.whl
Upload date: Dec 21, 2024
Size: 39.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/24.1.0

File hashes

Hashes for openpo-0.6.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c6d32f44cee5f426b560e876ae164db48d185b01fd88f9c7a0b984aaf7505df4`
MD5	`bc9048d61b28479060cef29daff6046f`
BLAKE2b-256	`03f62d7902d0f3c861650f535a5e069b7749c34520572220d9e51dd7793113d6`

See more details on using hashes here.

openpo 0.6.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OpenPO 🐼

What is OpenPO?

Key Features

Installation

Install from PyPI (recommended)

Install from source

Getting Started

Completion

Evaluation

LLM-as-a-Judge

Pre-trained Models

Storing Data

Contributing

Development Setup

Development Workflow

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes