Skip to main content

Load any mixture of text to text data in one line of code

Project description

Image Description

version license python tests Coverage Status Read the Docs downloads

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

Why Unitxt?

  • 🌐 Comprehensive: Evaluate text, tables, vision, speech, and code in one unified framework
  • 💼 Enterprise-Ready: Battle-tested components with extensive catalog of benchmarks
  • 🧠 Model Agnostic: Works with HuggingFace, OpenAI, WatsonX, and custom models
  • 🔒 Reproducible: Shareable, modular components ensure consistent results

Quick Links

Installation

pip install unitxt

Quick Start

Command Line Evaluation

# Simple evaluation
unitxt-evaluate \
    --tasks "card=cards.mmlu_pro.engineering" \
    --model cross_provider \
    --model_args "model_name=llama-3-1-8b-instruct" \
    --limit 10

# Multi-task evaluation
unitxt-evaluate \
    --tasks "card=cards.text2sql.bird+card=cards.mmlu_pro.engineering" \
    --model cross_provider \
    --model_args "model_name=llama-3-1-8b-instruct,max_tokens=256" \
    --split test \
    --limit 10 \
    --output_path ./results/evaluate_cli \
    --log_samples \
    --apply_chat_template

# Benchmark evaluation
unitxt-evaluate \
    --tasks "benchmarks.tool_calling" \
    --model cross_provider \
    --model_args "model_name=llama-3-1-8b-instruct,max_tokens=256" \
    --split test \
    --limit 10 \
    --output_path ./results/evaluate_cli \
    --log_samples \
    --apply_chat_template

Loading as Dataset

Load thousands of datasets in chat API format, ready for any model:

from unitxt import load_dataset

dataset = load_dataset(
    card="cards.gpqa.diamond",
    split="test",
    format="formats.chat_api",
)

📊 Available on The Catalog

Tasks Datasets Prompts Benchmarks Metrics

🚀 Interactive Dashboard

Launch the graphical user interface to explore datasets and benchmarks:

pip install unitxt[ui]
unitxt-explore

Complete Python Example

Evaluate your own data with any model:

# Import required components
from unitxt import evaluate, create_dataset
from unitxt.blocks import Task, InputOutputTemplate
from unitxt.inference import HFAutoModelInferenceEngine

# Question-answer dataset
data = [
    {"question": "What is the capital of Texas?", "answer": "Austin"},
    {"question": "What is the color of the sky?", "answer": "Blue"},
]

# Define the task and evaluation metric
task = Task(
    input_fields={"question": str},
    reference_fields={"answer": str},
    prediction_type=str,
    metrics=["metrics.accuracy"],
)

# Create a template to format inputs and outputs
template = InputOutputTemplate(
    instruction="Answer the following question.",
    input_format="{question}",
    output_format="{answer}",
    postprocessors=["processors.lower_case"],
)

# Prepare the dataset
dataset = create_dataset(
    task=task,
    template=template,
    format="formats.chat_api",
    test_set=data,
    split="test",
)

# Set up the model (supports Hugging Face, WatsonX, OpenAI, etc.)
model = HFAutoModelInferenceEngine(
    model_name="Qwen/Qwen1.5-0.5B-Chat", max_new_tokens=32
)

# Generate predictions and evaluate
predictions = model(dataset)
results = evaluate(predictions=predictions, data=dataset)

# Print results
print("Global Results:\n", results.global_scores.summary)
print("Instance Results:\n", results.instance_scores.summary)

Contributing

Read the contributing guide for details on how to contribute to Unitxt.

Citation

If you use Unitxt in your research, please cite our paper:

@inproceedings{bandel-etal-2024-unitxt,
    title = "Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative {AI}",
    author = "Bandel, Elron  and
      Perlitz, Yotam  and
      Venezian, Elad  and
      Friedman, Roni  and
      Arviv, Ofir  and
      Orbach, Matan  and
      Don-Yehiya, Shachar  and
      Sheinwald, Dafna  and
      Gera, Ariel  and
      Choshen, Leshem  and
      Shmueli-Scheuer, Michal  and
      Katz, Yoav",
    editor = "Chang, Kai-Wei  and
      Lee, Annie  and
      Rajani, Nazneen",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-demo.21",
    pages = "207--215",
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitxt-1.26.10.tar.gz (28.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unitxt-1.26.10-py3-none-any.whl (32.9 MB view details)

Uploaded Python 3

File details

Details for the file unitxt-1.26.10.tar.gz.

File metadata

  • Download URL: unitxt-1.26.10.tar.gz
  • Upload date:
  • Size: 28.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for unitxt-1.26.10.tar.gz
Algorithm Hash digest
SHA256 f588045669c222a02ed703386aad8b02d0d41aa87a0122a06182e7623fad4ec2
MD5 005d96df9045930c950ffb1f946d4ba4
BLAKE2b-256 8d68cda80f4fdc32aa5f8753d5ffe70d5110b9dc25956b15a8b579c72e033c95

See more details on using hashes here.

Provenance

The following attestation bundles were made for unitxt-1.26.10.tar.gz:

Publisher: pipy.yml on IBM/unitxt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file unitxt-1.26.10-py3-none-any.whl.

File metadata

  • Download URL: unitxt-1.26.10-py3-none-any.whl
  • Upload date:
  • Size: 32.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for unitxt-1.26.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2dd0937a54c4f3b26773d9312bf5bcdcbff78971030af84584703be5497b1dff
MD5 adb5b5569b7d2473dbc2bbb6ed917f5f
BLAKE2b-256 709fa22c18b83d345b1651eb245ddc03c77e94f6d2294d7cd041c6dc98978af8

See more details on using hashes here.

Provenance

The following attestation bundles were made for unitxt-1.26.10-py3-none-any.whl:

Publisher: pipy.yml on IBM/unitxt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page