Load any mixture of text to text data in one line of code
Project description
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking
Why Unitxt?
- 🌐 Comprehensive: Evaluate text, tables, vision, speech, and code in one unified framework
- 💼 Enterprise-Ready: Battle-tested components with extensive catalog of benchmarks
- 🧠 Model Agnostic: Works with HuggingFace, OpenAI, WatsonX, and custom models
- 🔒 Reproducible: Shareable, modular components ensure consistent results
Quick Links
Installation
pip install unitxt
Quick Start
Command Line Evaluation
# Simple evaluation
unitxt-evaluate \
--tasks "card=cards.mmlu_pro.engineering" \
--model cross_provider \
--model_args "model_name=llama-3-1-8b-instruct" \
--limit 10
# Multi-task evaluation
unitxt-evaluate \
--tasks "card=cards.text2sql.bird+card=cards.mmlu_pro.engineering" \
--model cross_provider \
--model_args "model_name=llama-3-1-8b-instruct,max_tokens=256" \
--split test \
--limit 10 \
--output_path ./results/evaluate_cli \
--log_samples \
--apply_chat_template
# Benchmark evaluation
unitxt-evaluate \
--tasks "benchmarks.tool_calling" \
--model cross_provider \
--model_args "model_name=llama-3-1-8b-instruct,max_tokens=256" \
--split test \
--limit 10 \
--output_path ./results/evaluate_cli \
--log_samples \
--apply_chat_template
Loading as Dataset
Load thousands of datasets in chat API format, ready for any model:
from unitxt import load_dataset
dataset = load_dataset(
card="cards.gpqa.diamond",
split="test",
format="formats.chat_api",
)
📊 Available on The Catalog
🚀 Interactive Dashboard
Launch the graphical user interface to explore datasets and benchmarks:
pip install unitxt[ui]
unitxt-explore
Complete Python Example
Evaluate your own data with any model:
# Import required components
from unitxt import evaluate, create_dataset
from unitxt.blocks import Task, InputOutputTemplate
from unitxt.inference import HFAutoModelInferenceEngine
# Question-answer dataset
data = [
{"question": "What is the capital of Texas?", "answer": "Austin"},
{"question": "What is the color of the sky?", "answer": "Blue"},
]
# Define the task and evaluation metric
task = Task(
input_fields={"question": str},
reference_fields={"answer": str},
prediction_type=str,
metrics=["metrics.accuracy"],
)
# Create a template to format inputs and outputs
template = InputOutputTemplate(
instruction="Answer the following question.",
input_format="{question}",
output_format="{answer}",
postprocessors=["processors.lower_case"],
)
# Prepare the dataset
dataset = create_dataset(
task=task,
template=template,
format="formats.chat_api",
test_set=data,
split="test",
)
# Set up the model (supports Hugging Face, WatsonX, OpenAI, etc.)
model = HFAutoModelInferenceEngine(
model_name="Qwen/Qwen1.5-0.5B-Chat", max_new_tokens=32
)
# Generate predictions and evaluate
predictions = model(dataset)
results = evaluate(predictions=predictions, data=dataset)
# Print results
print("Global Results:\n", results.global_scores.summary)
print("Instance Results:\n", results.instance_scores.summary)
Contributing
Read the contributing guide for details on how to contribute to Unitxt.
Citation
If you use Unitxt in your research, please cite our paper:
@inproceedings{bandel-etal-2024-unitxt,
title = "Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative {AI}",
author = "Bandel, Elron and
Perlitz, Yotam and
Venezian, Elad and
Friedman, Roni and
Arviv, Ofir and
Orbach, Matan and
Don-Yehiya, Shachar and
Sheinwald, Dafna and
Gera, Ariel and
Choshen, Leshem and
Shmueli-Scheuer, Michal and
Katz, Yoav",
editor = "Chang, Kai-Wei and
Lee, Annie and
Rajani, Nazneen",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-demo.21",
pages = "207--215",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unitxt-1.26.9.tar.gz.
File metadata
- Download URL: unitxt-1.26.9.tar.gz
- Upload date:
- Size: 28.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a18e9486da33646489e4b07917b37e05a405dbf731a881b816a7deff5424604
|
|
| MD5 |
2b5c7babf0e92f2212bafab8ace10425
|
|
| BLAKE2b-256 |
7a0c01cd186889d2b44f8ddc51c5992b5d9e37fcd2d93e92ba7c6739a9c3fa52
|
Provenance
The following attestation bundles were made for unitxt-1.26.9.tar.gz:
Publisher:
pipy.yml on IBM/unitxt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unitxt-1.26.9.tar.gz -
Subject digest:
7a18e9486da33646489e4b07917b37e05a405dbf731a881b816a7deff5424604 - Sigstore transparency entry: 815804078
- Sigstore integration time:
-
Permalink:
IBM/unitxt@20b951e5bd109cd43018f33018dfc7b3dbff3c9f -
Branch / Tag:
refs/tags/1.26.9 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pipy.yml@20b951e5bd109cd43018f33018dfc7b3dbff3c9f -
Trigger Event:
release
-
Statement type:
File details
Details for the file unitxt-1.26.9-py3-none-any.whl.
File metadata
- Download URL: unitxt-1.26.9-py3-none-any.whl
- Upload date:
- Size: 32.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c95c8ad562243684f2ec1ac088d0df9f6b3fab8c93ca50ab847a7cef88be2b8
|
|
| MD5 |
979e31c964108875606aa4c560f96aca
|
|
| BLAKE2b-256 |
fe2d1c720fbb87b82ba7d13334c95e02ebaf8c476bde3753ada39d2acd65093e
|
Provenance
The following attestation bundles were made for unitxt-1.26.9-py3-none-any.whl:
Publisher:
pipy.yml on IBM/unitxt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unitxt-1.26.9-py3-none-any.whl -
Subject digest:
0c95c8ad562243684f2ec1ac088d0df9f6b3fab8c93ca50ab847a7cef88be2b8 - Sigstore transparency entry: 815804080
- Sigstore integration time:
-
Permalink:
IBM/unitxt@20b951e5bd109cd43018f33018dfc7b3dbff3c9f -
Branch / Tag:
refs/tags/1.26.9 - Owner: https://github.com/IBM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pipy.yml@20b951e5bd109cd43018f33018dfc7b3dbff3c9f -
Trigger Event:
release
-
Statement type: