The Tau-Trait package
Project description
Tau-Trait
Collinear AI
Tau-Trait is a benchmark for evaluating large language models (LLMs) with realistic, persona-aware simulations. It builds on Tau-Bench but introduces two key modifications:
- TraitBasis-generated personas – more accurate and interpretable user simulations.
- Domain-specific evaluation – tasks drawn from retail, airline, telecom, and telehealth settings.
Tau-Trait is designed to test model robustness, personalization, and fairness in high-impact, customer-facing domains where user traits strongly influence interaction quality.
✨ Features
-
Persona Simulation with TraitBasis Generate diverse, coherent user personas with different traits.
-
Domain Coverage Tau-Trait includes evaluation tasks in four industries:
- 🛒 Retail
- ✈️ Airline
- 📱 Telecom
- 🩺 Telehealth
🚀 Getting Started
Installation
pip install tau-trait
Usage
import argparse
from tau_trait.types import RunConfig
from tau_trait.run import run
from litellm import provider_list
from tau_trait.envs.user import UserStrategy
from tau_trait.types import RunConfig
from tau_trait.run import run
config = RunConfig(
model_provider="openai",
user_model_provider="steer",
model=CLIENT_ASSISTANT_MODEL_NAME,
user_model="", # steer api abstracts the model
num_trials=1,
env="retail",
agent_strategy="tool-calling",
temperature=0.7,
task_split="test",
start_index=0,
end_index=-1,
task_ids=[4],
log_dir="results",
max_concurrency=1,
seed=10,
shuffle=0,
user_strategy="llm",
few_shot_displays_path=None,
trait_dict={"impatience": 1, "confusion": 0, "skeptical": 0, "incoherence": 0},
)
Some definitions of the settings are below.
Tau-Hard Config Settings
General
-
--num-trials(int, default: 1)
Number of independent trials to run. -
--seed(int, default: 10)
Random seed for reproducibility. -
--shuffle(int, default: 0)
Whether to shuffle task order (0 = no, 1 = yes). -
--log-dir(str, default:results)
Directory where logs and results are stored.
Environment & Tasks
-
--env(str, choices:retail,airline, default:retail)
Domain environment in which to run simulations. -
--task-split(str, choices:train,test,dev, default:test)
Dataset split of tasks to run (applies only to the retail domain currently). -
--start-index(int, default: 0)
Index of the first task to run. -
--end-index(int, default: -1)
Index of the last task to run. Use-1to run all remaining tasks. -
--task-ids(list of int, optional)
Explicit list of task IDs to run (overrides index ranges).
Agent Configuration
-
--model(str, required)
The model to use for the agent. -
--model-provider(str, choices fromprovider_list)
Provider for the agent’s model. -
--agent-strategy(str, choices:tool-calling,act,react,few-shot, default:tool-calling)
Strategy used by the agent to interact with the environment.tool-calling: Invoke external tools.act: Pure action selection.react: Reason + act alternation.few-shot: Use few-shot exemplars.
-
--temperature(float, default: 0.0)
Sampling temperature for the action model (higher = more randomness). -
--few-shot-displays-path(str, optional)
Path to a JSONL file containing few-shot demonstration examples.
User Simulator Configuration
-
--user-model(str, default:gpt-4o)
Model to use for the user simulator. -
--user-model-provider(str, optional)
Provider for the user simulator’s model. -
--user-strategy(str, choices fromUserStrategy, default:llm)
Strategy for the simulated user (e.g., LLM-based).
Execution Controls
--max-concurrency(int, default: 1)
Number of tasks to run in parallel.
@misc{curator-evals,
author = {Mackey, Tsach and Shafique, Muhammad Ali and Kumar, Anand},
title = {Curator Evals: A Benchmark for High-quality Post-training Data Curation},
year = {2025},
month = {Sep},
howpublished = {\url{https://github.com/collinear-ai/curator-evals}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tau_trait-0.1.0.tar.gz.
File metadata
- Download URL: tau_trait-0.1.0.tar.gz
- Upload date:
- Size: 842.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe599fee8cb17da35a194f3bfdf386ade5679566b58fba0ef1cf9ba001a60651
|
|
| MD5 |
b793b7a872c54d5d909001bdaefc6062
|
|
| BLAKE2b-256 |
ee9b7b21d59e14e40351fee7eeec109d1e4313eccaac9065a019c57699ee9cdc
|
File details
Details for the file tau_trait-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tau_trait-0.1.0-py3-none-any.whl
- Upload date:
- Size: 970.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8857dfa66090bb4b1725b3a7c5c22506b34dcce57983b5f14df558b0248e51b6
|
|
| MD5 |
97f36d7647cbffa3b66052e62bba2166
|
|
| BLAKE2b-256 |
b131009dd6f23f22876b7a45207113a61da54622350bd9d6e2cf1c4347702bda
|