Skip to main content

Load any mixture of text to text data in one line of code

Project description

Image Description

version license python tests Coverage Status Read the Docs downloads

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

Why Unitxt?

  • 🌐 Comprehensive: Evaluate text, tables, vision, speech, and code in one unified framework
  • 💼 Enterprise-Ready: Battle-tested components with extensive catalog of benchmarks
  • 🧠 Model Agnostic: Works with HuggingFace, OpenAI, WatsonX, and custom models
  • 🔒 Reproducible: Shareable, modular components ensure consistent results

Quick Links

Installation

pip install unitxt

Quick Start

Command Line Evaluation

# Simple evaluation
unitxt-evaluate \
    --tasks "card=cards.mmlu_pro.engineering" \
    --model cross_provider \
    --model_args "model_name=llama-3-1-8b-instruct" \
    --limit 10

# Multi-task evaluation
unitxt-evaluate \
    --tasks "card=cards.text2sql.bird+card=cards.mmlu_pro.engineering" \
    --model cross_provider \
    --model_args "model_name=llama-3-1-8b-instruct,max_tokens=256" \
    --split test \
    --limit 10 \
    --output_path ./results/evaluate_cli \
    --log_samples \
    --apply_chat_template

# Benchmark evaluation
unitxt-evaluate \
    --tasks "benchmarks.tool_calling" \
    --model cross_provider \
    --model_args "model_name=llama-3-1-8b-instruct,max_tokens=256" \
    --split test \
    --limit 10 \
    --output_path ./results/evaluate_cli \
    --log_samples \
    --apply_chat_template

Loading as Dataset

Load thousands of datasets in chat API format, ready for any model:

from unitxt import load_dataset

dataset = load_dataset(
    card="cards.gpqa.diamond",
    split="test",
    format="formats.chat_api",
)

📊 Available on The Catalog

Tasks Datasets Prompts Benchmarks Metrics

🚀 Interactive Dashboard

Launch the graphical user interface to explore datasets and benchmarks:

pip install unitxt[ui]
unitxt-explore

Complete Python Example

Evaluate your own data with any model:

# Import required components
from unitxt import evaluate, create_dataset
from unitxt.blocks import Task, InputOutputTemplate
from unitxt.inference import HFAutoModelInferenceEngine

# Question-answer dataset
data = [
    {"question": "What is the capital of Texas?", "answer": "Austin"},
    {"question": "What is the color of the sky?", "answer": "Blue"},
]

# Define the task and evaluation metric
task = Task(
    input_fields={"question": str},
    reference_fields={"answer": str},
    prediction_type=str,
    metrics=["metrics.accuracy"],
)

# Create a template to format inputs and outputs
template = InputOutputTemplate(
    instruction="Answer the following question.",
    input_format="{question}",
    output_format="{answer}",
    postprocessors=["processors.lower_case"],
)

# Prepare the dataset
dataset = create_dataset(
    task=task,
    template=template,
    format="formats.chat_api",
    test_set=data,
    split="test",
)

# Set up the model (supports Hugging Face, WatsonX, OpenAI, etc.)
model = HFAutoModelInferenceEngine(
    model_name="Qwen/Qwen1.5-0.5B-Chat", max_new_tokens=32
)

# Generate predictions and evaluate
predictions = model(dataset)
results = evaluate(predictions=predictions, data=dataset)

# Print results
print("Global Results:\n", results.global_scores.summary)
print("Instance Results:\n", results.instance_scores.summary)

Contributing

Read the contributing guide for details on how to contribute to Unitxt.

Citation

If you use Unitxt in your research, please cite our paper:

@inproceedings{bandel-etal-2024-unitxt,
    title = "Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative {AI}",
    author = "Bandel, Elron  and
      Perlitz, Yotam  and
      Venezian, Elad  and
      Friedman, Roni  and
      Arviv, Ofir  and
      Orbach, Matan  and
      Don-Yehiya, Shachar  and
      Sheinwald, Dafna  and
      Gera, Ariel  and
      Choshen, Leshem  and
      Shmueli-Scheuer, Michal  and
      Katz, Yoav",
    editor = "Chang, Kai-Wei  and
      Lee, Annie  and
      Rajani, Nazneen",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-demo.21",
    pages = "207--215",
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitxt-1.26.9.tar.gz (28.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unitxt-1.26.9-py3-none-any.whl (32.9 MB view details)

Uploaded Python 3

File details

Details for the file unitxt-1.26.9.tar.gz.

File metadata

  • Download URL: unitxt-1.26.9.tar.gz
  • Upload date:
  • Size: 28.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for unitxt-1.26.9.tar.gz
Algorithm Hash digest
SHA256 7a18e9486da33646489e4b07917b37e05a405dbf731a881b816a7deff5424604
MD5 2b5c7babf0e92f2212bafab8ace10425
BLAKE2b-256 7a0c01cd186889d2b44f8ddc51c5992b5d9e37fcd2d93e92ba7c6739a9c3fa52

See more details on using hashes here.

Provenance

The following attestation bundles were made for unitxt-1.26.9.tar.gz:

Publisher: pipy.yml on IBM/unitxt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file unitxt-1.26.9-py3-none-any.whl.

File metadata

  • Download URL: unitxt-1.26.9-py3-none-any.whl
  • Upload date:
  • Size: 32.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for unitxt-1.26.9-py3-none-any.whl
Algorithm Hash digest
SHA256 0c95c8ad562243684f2ec1ac088d0df9f6b3fab8c93ca50ab847a7cef88be2b8
MD5 979e31c964108875606aa4c560f96aca
BLAKE2b-256 fe2d1c720fbb87b82ba7d13334c95e02ebaf8c476bde3753ada39d2acd65093e

See more details on using hashes here.

Provenance

The following attestation bundles were made for unitxt-1.26.9-py3-none-any.whl:

Publisher: pipy.yml on IBM/unitxt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page