AI Feedback (AIF) framework

These details have not been verified by PyPI

Project links

Project description

⚗️ distilabel

AI Feedback (AIF) framework for building datasets and labellers with LLMs

overview

[!TIP] To discuss, get support, or give feedback join Argilla's Slack Community and you will be able to engage with our amazing community and also with the core developers of argilla and distilabel.

What's `distilabel`?

distilabel is a framework for AI engineers to align LLMs using RLHF-related methods (e.g. reward models, DPO).

The initial focus is LLM fine-tuning and adaptation but we'll be extending it for predictive NLP use cases soon.

Main use cases are:

As an AI engineer I want to build domain-specific instruction datasets to fine-tune OSS LLMs with increased accuracy.
As an AI engineer I want to build domain-specific and diverse preference datasets to use RLHF-related methods and align LLMs (e.g, increase the ability to follow instructions or give truthful responses).

[!WARNING] distilabel is currently under active development and we're iterating quickly, so take into account that we may introduce breaking changes in the releases during the upcoming weeks, and also the README might be outdated the best place to get started is the documentation.

Motivation

🔥 Recent projects like Zephyr and Tulu have shown it's possible to build powerful open-source models with DPO and AI Feedback (AIF) datasets.

👩‍🔬 There's a lot of exciting research in the AIF space, such as UltraFeedback (the dataset leveraged by Zephyr and Tulu), JudgeLM, or Prometheus.

🚀 However, going beyond research efforts and applying AIF at scale it's different. For enterprise and production use, we need framework that implements key AIF methods on a robust, efficient and scalable way. This framework should enable AI engineers to build custom datasets at scale for their own use cases.

👩‍🎓 This, combined with humans-in-the-loop for improving dataset quality is the next big leap for OSS LLM models.

⚗️ distilabel aims to bridge this gap.

Key features

🤖 Leverage OSS models and APIs: 🤗 transformers, OpenAI, 🤗 Inference Endpoints, vLLM, llama.cpp, and more to come.
💻 Scalable and extensible: Scalable implementations of existing methods (e.g. UltraFeedback). Easily extensible to build and configure your own labellers.
🧑‍🦱 Human-in-the-loop: One line of code integration with Argilla to improve and correct datasets.

Quickstart

Installation

Install with pip (requires Python 3.8+):

pip install distilabel[openai,argilla]

Try it out

After installing, you can immediately start experimenting with distilabel:

Explore Locally: Follow the example below to build a preference dataset for DPO/RLHF.
Interactive Notebook: Prefer an interactive experience? Try our Google Colab Notebook!

Example: Build a preference dataset for DPO/RLHF

from datasets import load_dataset
from distilabel.llm import OpenAILLM
from distilabel.pipeline import pipeline
from distilabel.tasks import TextGenerationTask

# Load a dataset with instructions from the Hub
dataset = (
    load_dataset("HuggingFaceH4/instruction-dataset", split="test[:5]")
    .remove_columns(["completion", "meta"])
    .rename_column("prompt", "input")
)

# Use `OpenAILLM` (running `gpt-3.5-turbo`) to generate responses for given inputs
generator = OpenAILLM(
    task=TextGenerationTask(),
    max_new_tokens=512,
    # openai_api_key="sk-...",
)

pipeline = pipeline("preference", "instruction-following", generator=generator)

# Build a preference dataset comparing two responses focused on the instruction-following skill of the LLM
dataset = pipeline.generate(dataset)

The resulting dataset can already be used for preference tuning (a larger version of it). But beware these AIF dataset are imperfect. To get the most out of AIF, push to Argilla for human feedback:

import argilla as rg

rg.init(
    api_key="<YOUR_ARGILLA_API_KEY>",
    api_url="<YOUR_ARGILLA_API_URL>"
)

rg_dataset = dataset.to_argilla()
rg_dataset.push_to_argilla(name="preference-dataset", workspace="admin")

https://github.com/argilla-io/distilabel/assets/1107111/be34c95c-8be4-46ef-9437-cbd2a7687e30

More examples

Find more examples of different use cases of distilabel under examples/.

Roadmap

Add Critique Models and support for Prometheus OSS
Add a generator with multiple models
Train OSS labellers to replace OpenAI labellers
Add labellers to evolve instructions generated with self-instruct
Add labellers for predictive NLP tasks: text classification, information extraction, etc.
Open an issue to suggest a feature!

Contribute

To directly contribute with distilabel, check our good first issues or open a new one.

References

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.3

Jan 28, 2025

1.5.2

Jan 22, 2025

1.5.1

Jan 17, 2025

1.5.0

Jan 17, 2025

1.4.2

Dec 18, 2024

1.4.1

Oct 16, 2024

1.4.0

Oct 8, 2024

1.3.2

Aug 23, 2024

1.3.1

Aug 7, 2024

1.3.0

Aug 6, 2024

1.2.4

Jul 23, 2024

1.2.3

Jul 23, 2024

1.2.2

Jul 12, 2024

1.2.1

Jul 1, 2024

1.2.0

Jun 18, 2024

1.1.1

May 22, 2024

1.1.0

May 20, 2024

1.0.3

Apr 25, 2024

1.0.2

Apr 24, 2024

1.0.1

Apr 19, 2024

1.0.0

Apr 17, 2024

0.6.0

Mar 1, 2024

0.5.1

Mar 1, 2024

0.5.0

Feb 2, 2024

0.4.0

Jan 19, 2024

0.3.0

Jan 9, 2024

0.2.1

Dec 27, 2023

0.2.0

Dec 21, 2023

0.1.1

Dec 11, 2023

This version

0.1.0

Nov 29, 2023

0.1.0rc2 pre-release

Nov 23, 2023

0.1.0rc1 pre-release

Nov 15, 2023

0.1.0rc0 pre-release

Nov 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distilabel-0.1.0.tar.gz (69.0 kB view details)

Uploaded Nov 29, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

distilabel-0.1.0-py3-none-any.whl (63.5 kB view details)

Uploaded Nov 29, 2023 Python 3

File details

Details for the file distilabel-0.1.0.tar.gz.

File metadata

Download URL: distilabel-0.1.0.tar.gz
Upload date: Nov 29, 2023
Size: 69.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for distilabel-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`52a0989e6c25a3854378ff642ddb3d29ee348b7c9dd2ebd4ef04310f454f40c7`
MD5	`f133cf99b04d4eceb1bef48f2391f162`
BLAKE2b-256	`9768a1d2f02618828aa2fe4f114fa532098b68266086e00fde6b3dae265315ff`

See more details on using hashes here.

File details

Details for the file distilabel-0.1.0-py3-none-any.whl.

File metadata

Download URL: distilabel-0.1.0-py3-none-any.whl
Upload date: Nov 29, 2023
Size: 63.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for distilabel-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd363daa6cac0434ab722227e4f2be62872f143ddf1ab92162b521ecc0f411fa`
MD5	`23894e9d915bb20c89b92d4d52b2e4d3`
BLAKE2b-256	`d893afa129a097766bce87cc82686af5f3acb36013c5afb6f8e2facfd33fe6ce`

See more details on using hashes here.

distilabel 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚗️ distilabel

What's `distilabel`?

Motivation

Key features

Quickstart

Installation

Try it out

Example: Build a preference dataset for DPO/RLHF

More examples

Roadmap

Contribute

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

distilabel 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚗️ distilabel

What's distilabel?

Motivation

Key features

Quickstart

Installation

Try it out

Example: Build a preference dataset for DPO/RLHF

More examples

Roadmap

Contribute

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What's `distilabel`?