Skip to main content

Build high quality synthetic datasets with AI feedback from 200+ LLMs

Project description

OpenPO 🐼

PyPI version License Documentation Python

OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.

Resources Notebooks
Building dataset with OpenPO and PairRM 📔 Notebook

What is OpenPO?

OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and evaluating them using research-proven methodologies, OpenPO helps developers build better, more fine-tuned language models with minimal effort.

Key Features

  • 🔌 Multiple LLM Support: Call 200+ models from HuggingFace and OpenRouter

  • 🧪 Research-Backed Methodologies: Implementation of methodologies for data synthesis from latest research papers.

  • 🤝 OpenAI API Compatibility: Support for OpenAI API format

  • 💾 Flexible Storage: Out of the box storage providers for HuggingFace and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

set environment variable first

export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>

To get started, simply pass in a list of model names of your choice

[!NOTE] OpenPo requires provider name to be prepended to the model identifier.

import os
from openpo.client import OpenPO

client = OpenPO()

response = client.completions(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

You can also call models with OpenPO.

# make request to OpenRouter
client = OpenPO()

response = client.completions(
    models = [
        "openrouter/qwen/qwen-2.5-coder-32b-instruct",
        "openrouter/mistralai/mistral-7b-instruct-v0.3",
        "openrouter/microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completions(
    models = [
        "huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
        "huggingface/mistralai/Mistral-7B-Instruct-v0.3",
        "huggingface/microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Storing Data

Use out of the box storage class to easily upload and download data.

from openpo.storage.huggingface import HuggingFaceStorage
hf_storage = HuggingFaceStorage(repo_id="my-dataset-repo")

# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(data=preference)

# Load data from repo
data = hf_storage.load_from_repo()

Structured Outputs (JSON Mode)

OpenPO supports structured outputs using Pydantic model.

[!NOTE] OpenRouter does not natively support structured outputs. This leads to inconsistent behavior from some models when structured output is used with OpenRouter.

It is recommended to use HuggingFace models for structured output.

from pydantic import BaseModel
from openpo.client import OpenPO

client = OpenPO()

class ResponseModel(BaseModel):
    response: str


res = client.completions(
    models=["huggingface/Qwen/Qwen2.5-Coder-32B-Instruct"],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params = {
        "response_format": ResponseFormat,
    }
)

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

  1. Clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
  1. Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
  1. Install dependencies
poetry install

Development Workflow

  1. Create a new branch for your feature
git checkout -b feature-name
  1. Submit a Pull Request
  • Write a clear description of your changes
  • Reference any related issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpo-0.4.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openpo-0.4.1-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file openpo-0.4.1.tar.gz.

File metadata

  • Download URL: openpo-0.4.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/23.2.0

File hashes

Hashes for openpo-0.4.1.tar.gz
Algorithm Hash digest
SHA256 e69f47306f1260d6e8486818ef3dd31de9ed9336b39fea2353eec912b4f80073
MD5 c57885dfc2de3463e5e309c16f3e48c7
BLAKE2b-256 d82e35ded322ce83945e022766bbd882112cc75487b3bd88dae91493e6f5151e

See more details on using hashes here.

File details

Details for the file openpo-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: openpo-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/23.2.0

File hashes

Hashes for openpo-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b8b283383f70831351e2687403cad435bdedb6a89ad26c410c19d838954b9ee8
MD5 e09d29405ee5807a712fa0456281e6e4
BLAKE2b-256 709e3092cbe180a94ffc0249dafb5d24b47c68e289b3d98e9be7fc85dc687d0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page