Skip to main content

Build high quality synthetic datasets with AI feedback from 200+ LLMs

Project description

OpenPO 🐼

PyPI version License Documentation Python

OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.

What is OpenPO?

OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and ranking them using various techniques, OpenPO helps developers build better, more fine-tuned language models with minimal effort.

Key Features

  • 🔌 Multiple LLM Support: Call 200+ models from HuggingFace and OpenRouter

  • 🧪 Research-Backed Methodologies: Implementation of various methodologies on data synthesis from latest research papers. (feature coming soon!)

  • 🤝 OpenAI API Compatibility: Fully support OpenAI API format

  • 💾 Flexible Storage: Out of the box storage providers for Hugging Face and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

OpenPO defaults to Hugging Face when provider argument is not set.

import os
from openpo.client import OpenPO

client = OpenPO(api_key="your-huggingface-api-key") # no need to pass in the key if environment variable is already set.

response = client.completions(
    models = [
        "Qwen/Qwen2.5-Coder-32B-Instruct",
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

To use with OpenRouter, set the provider to openrouter

# make request to OpenRouter
client = OpenPO(api_key="<your-openrouter-api-key", provider='openrouter')

response = client.completions(
    models = [
        "qwen/qwen-2.5-coder-32b-instruct",
        "mistralai/mistral-7b-instruct-v0.3",
        "microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completions(
    models = [
        "Qwen/Qwen2.5-Coder-32B-Instruct",
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Storing Data

Use out of the box storage class to easily upload and download data.

from openpo.storage.huggingface import HuggingFaceStorage
hf_storage = HuggingFaceStorage(repo_id="my-dataset-repo", api_key="hf-token") # api_key can also be set as environment variable.

# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(data=preference)

# Load data from repo
data = hf_storage.load_from_repo()

Structured Outputs (JSON Mode)

OpenPO supports structured outputs using Pydantic model.

[!NOTE] OpenRouter does not natively support structured outputs. This leads to inconsistent behavior from some models when structured output is used with OpenRouter.

It is recommended to use HuggingFace models for structured output.

from pydantic import BaseModel
from openpo.client import OpenPO

client = OpenPO(api_key="your-huggingface-api-key")

class ResponseModel(BaseModel):
    response: str


res = client.completions(
    models=["Qwen/Qwen2.5-Coder-32B-Instruct"],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params = {
        "response_format": ResponseFormat,
    }
)

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

  1. Fork and clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
  1. Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
  1. Install dependencies
poetry install

Development Workflow

  1. Create a new branch for your feature
git checkout -b feature-name
  1. Submit a Pull Request
  • Write a clear description of your changes
  • Reference any related issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpo-0.4.0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openpo-0.4.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file openpo-0.4.0.tar.gz.

File metadata

  • Download URL: openpo-0.4.0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/23.2.0

File hashes

Hashes for openpo-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d79f446f2236d3c9819a5c5c35ed845ad8503954e9070b0962cf68c6f221eb3a
MD5 d0c21c3789a9b939f3141c239adb624a
BLAKE2b-256 724d3a19f16d4d599a610fa6c0de955fa5de8039f87c1d8564955d1563a13ced

See more details on using hashes here.

File details

Details for the file openpo-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: openpo-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/23.2.0

File hashes

Hashes for openpo-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6367880ca7a00f3953cf2f9a08c1e710b97be231de77a1cb990a0a076d3f597b
MD5 31dbc6091c5a3a7e25f99512c9857b1a
BLAKE2b-256 1eb7d53500ca5934542ae32cd83d11bd369677d07e51d85931ee9db994b0614f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page