Skip to main content

AI Feedback (AIF) framework

Project description

⚗️ distilabel

AI Feedback (AIF) framework for building datasets with and for LLMs.

[!TIP] To discuss, get support, or give feedback join Argilla's Slack Community and you will be able to engage with our amazing community and also with the core developers of argilla and distilabel.

overview

Features

  • Integrations with the most popular libraries and APIs for LLMs: HF Transformers, OpenAI, vLLM, etc.
  • Multiple tasks for Self-Instruct, Preference datasets and more.
  • Dataset export to Argilla for easy data exploration and further annotation.

[!WARNING] distilabel is currently under active development and we're iterating quickly, so take into account that we may introduce breaking changes in the releases during the upcoming weeks, and also the README might be outdated the best place to get started is the documentation.

Installation

pip install distilabel --upgrade

Requires Python 3.8+

In addition, the following extras are available:

  • hf-transformers: for using models available in transformers package via the TransformersLLM integration.
  • hf-inference-endpoints: for using the HuggingFace Inference Endpoints via the InferenceEndpointsLLM integration.
  • openai: for using OpenAI API models via the OpenAILLM integration.
  • vllm: for using vllm serving engine via the vLLM integration.
  • llama-cpp: for using llama-cpp-python as Python bindings for llama.cpp.
  • ollama: for using Ollama and their available models via their Python client.
  • together: for using Together Inference via their Python client.
  • anyscale: for using Anyscale endpoints.
  • ollama: for using Ollama.
  • mistralai: for using Mistral AI via their Python client.
  • vertexai: for using both Google Vertex AI offerings: their proprietary models and endpoints via their Python client google-cloud-aiplatform.
  • argilla: for exporting the generated datasets to Argilla.

Example

To run the following example you must install distilabel with both openai and argilla extras:

pip install "distilabel[openai,argilla]" --upgrade

Then run the following example:

from datasets import load_dataset
from distilabel.llm import OpenAILLM
from distilabel.pipeline import pipeline
from distilabel.tasks import TextGenerationTask

dataset = (
    load_dataset("HuggingFaceH4/instruction-dataset", split="test[:10]")
    .remove_columns(["completion", "meta"])
    .rename_column("prompt", "input")
)

# Create a `Task` for generating text given an instruction.
task = TextGenerationTask()

# Create a `LLM` for generating text using the `Task` created in
# the first step. As the `LLM` will generate text, it will be a `generator`.
generator = OpenAILLM(task=task, max_new_tokens=512)

# Create a pre-defined `Pipeline` using the `pipeline` function and the
# `generator` created in step 2. The `pipeline` function will create a
# `labeller` LLM using `OpenAILLM` with the `UltraFeedback` task for
# instruction following assessment.
pipeline = pipeline("preference", "instruction-following", generator=generator)

dataset = pipeline.generate(dataset)

Additionally, you can push the generated dataset to Argilla for further exploration and annotation:

import argilla as rg

rg.init(api_url="<YOUR_ARGILLA_API_URL>", api_key="<YOUR_ARGILLA_API_KEY>")

# Convert the dataset to Argilla format
rg_dataset = dataset.to_argilla()

# Push the dataset to Argilla
rg_dataset.push_to_argilla(name="preference-dataset", workspace="admin")

More examples

Find more examples of different use cases of distilabel under examples/.

Or check out the following Google Colab Notebook:

Open In Colab

Badges

If you build something cool with distilabel consider adding one of these badges to your dataset or model card.

[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)

Built with Distilabel

[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)

Built with Distilabel

Contribute

To directly contribute with distilabel, check our good first issues or open a new one.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distilabel-0.5.0.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distilabel-0.5.0-py3-none-any.whl (130.7 kB view details)

Uploaded Python 3

File details

Details for the file distilabel-0.5.0.tar.gz.

File metadata

  • Download URL: distilabel-0.5.0.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for distilabel-0.5.0.tar.gz
Algorithm Hash digest
SHA256 9f6c0ebce54bd1a62ab4e500b4d5ce4da60dc43ae678667176ad58ec80269911
MD5 bdac5502dbff398da47657fbc72d3c1f
BLAKE2b-256 645f849ff528967b5e67531d0152f53c2fab0a12ce66bb067b21a43e541bdf18

See more details on using hashes here.

File details

Details for the file distilabel-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: distilabel-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 130.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for distilabel-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb6c73250bf0aacf20c5097e14d1a3201007e832fc52815205147a4238379740
MD5 86ef8c997fd04409a424b112f496eb9f
BLAKE2b-256 3d647e4cc7ab6e632cb9c868df8c3243cabbb79c8d6e8cac3955c5059a94ec89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page