Skip to main content

Build high quality synthetic datasets with AI feedback from 200+ LLMs

Project description

OpenPO 🐼

PyPI version License Documentation Python

OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.

What is OpenPO?

OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and ranking them using various techniques, OpenPO helps developers build better, more fine-tuned language models with minimal effort.

Key Features

  • 🔌 Multiple LLM Support: Call 200+ models from HuggingFace and OpenRouter

  • 🧪 Research-Backed Methodologies: Implementation of various methodologies on data synthesis from latest research papers. (feature coming soon!)

  • 🤝 OpenAI API Compatibility: Fully support OpenAI API format

  • 💾 Flexible Storage: Out of the box storage providers for Hugging Face and S3.

Installation

Install from PyPI (recommended)

OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:

pip install openpo

Install from source

Clone the repository first then run the follow command

cd openpo
poetry install

Getting Started

OpenPO defaults to Hugging Face when provider argument is not set.

import os
from openpo.client import OpenPO

client = OpenPO(api_key="your-huggingface-api-key") # no need to pass in the key if environment variable is already set.

response = client.completions(
    models = [
        "Qwen/Qwen2.5-Coder-32B-Instruct",
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
)

To use with OpenRouter, set the provider to openrouter

# make request to OpenRouter
client = OpenPO(api_key="<your-openrouter-api-key", provider='openrouter')

response = client.completions(
    models = [
        "qwen/qwen-2.5-coder-32b-instruct",
        "mistralai/mistral-7b-instruct-v0.3",
        "microsoft/phi-3.5-mini-128k-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],

)

OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.

response = client.completions(
    models = [
        "Qwen/Qwen2.5-Coder-32B-Instruct",
        "mistralai/Mistral-7B-Instruct-v0.3",
        "microsoft/Phi-3.5-mini-instruct",
    ],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params={
        "max_tokens": 500,
        "temperature": 1.0,
    }
)

Saving Data

Use out of the box storage class to easily upload and download data.

import os
from openpo.client import OpenPO
from openpo.storage.huggingface import HuggingFaceStorage

storage = HuggingFaceStorage(repo_id="my-dataset-repo", api_key="hf-token")
client = OpenPO(api_key="your-huggingface-token")

preference = {} # preference data needs to be in the format {"prompt": ..., "preferred": ..., "rejected": ...} for finetuning
storage.push_to_hub(data=preference, filename="my-data.json")

Structured Outputs (JSON Mode)

OpenPO supports structured outputs using Pydantic model.

[!NOTE] OpenRouter does not natively support structured outputs. This leads to inconsistent behavior from some models when structured output is used with OpenRouter.

It is recommended to use HuggingFace models for structured output.

from pydantic import BaseModel
from openpo.client import OpenPO

client = OpenPO(api_key="your-huggingface-api-key")

class ResponseModel(BaseModel):
    response: str


res = client.completions(
    models=["Qwen/Qwen2.5-Coder-32B-Instruct"],
    messages=[
        {"role": "system", "content": PROMPT},
        {"role": "system", "content": MESSAGE},
    ],
    params = {
        "response_format": ResponseFormat,
    }
)

Contributing

Contributions are what makes open source amazingly special! Here's how you can help:

Development Setup

  1. Fork and clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
  1. Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
  1. Install dependencies
poetry install

Development Workflow

  1. Create a new branch for your feature
git checkout -b feature-name
  1. Submit a Pull Request
  • Write a clear description of your changes
  • Reference any related issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpo-0.3.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openpo-0.3.0-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file openpo-0.3.0.tar.gz.

File metadata

  • Download URL: openpo-0.3.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/23.2.0

File hashes

Hashes for openpo-0.3.0.tar.gz
Algorithm Hash digest
SHA256 acbd0976b8cf7963d5881c062f0c1f77ea19a31f58ce15e39ee11b385855c7c5
MD5 4e82e9585ea74371c53456edee407897
BLAKE2b-256 70755bcb4af109eabfa927ab0cd329d40573385d203783f8a2174f2a481b01c7

See more details on using hashes here.

File details

Details for the file openpo-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: openpo-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Darwin/23.2.0

File hashes

Hashes for openpo-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30a8924723b97db7b39201cd5d13349d7a16805bbeed0ab5987b5aaff4229777
MD5 dff99cb20710583edacb3b62112fc7e7
BLAKE2b-256 45112d1c040cb7380a83d87257424f25edcc1c219b66cba99a1e17c739e6f284

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page