Skip to main content

The Tau-Trait package

Project description

Tau-Trait

Collinear AI

License: MIT

Tau-Trait is a benchmark for evaluating large language models (LLMs) with realistic, persona-aware simulations. It builds on Tau-Bench but introduces two key modifications:

  1. TraitBasis-generated personas – more accurate and interpretable user simulations.
  2. Domain-specific evaluation – tasks drawn from retail, airline, telecom, and telehealth settings.

Tau-Trait is designed to test model robustness, personalization, and fairness in high-impact, customer-facing domains where user traits strongly influence interaction quality.


✨ Features

  • Persona Simulation with TraitBasis Generate diverse, coherent user personas with different traits.

  • Domain Coverage Tau-Trait includes evaluation tasks in four industries:

    • 🛒 Retail
    • ✈️ Airline
    • 📱 Telecom
    • 🩺 Telehealth

🚀 Getting Started

Installation

pip install tau-trait

Usage

import argparse
from tau_trait.types import RunConfig
from tau_trait.run import run
from litellm import provider_list
from tau_trait.envs.user import UserStrategy

from tau_trait.types import RunConfig
from tau_trait.run import run

config = RunConfig(
    model_provider="openai",
    user_model_provider="steer",
    model=CLIENT_ASSISTANT_MODEL_NAME,
    user_model="", # steer api abstracts the model
    num_trials=1,
    env="retail",
    agent_strategy="tool-calling",
    temperature=0.7,
    task_split="test",
    start_index=0,
    end_index=-1,
    task_ids=[4],
    log_dir="results",
    max_concurrency=1,
    seed=10,
    shuffle=0,
    user_strategy="llm",
    few_shot_displays_path=None,
    trait_dict={"impatience": 1, "confusion": 0, "skeptical": 0, "incoherence": 0},
)

Some definitions of the settings are below.

Tau-Hard Config Settings

General

  • --num-trials (int, default: 1)
    Number of independent trials to run.

  • --seed (int, default: 10)
    Random seed for reproducibility.

  • --shuffle (int, default: 0)
    Whether to shuffle task order (0 = no, 1 = yes).

  • --log-dir (str, default: results)
    Directory where logs and results are stored.

Environment & Tasks

  • --env (str, choices: retail, airline, default: retail)
    Domain environment in which to run simulations.

  • --task-split (str, choices: train, test, dev, default: test)
    Dataset split of tasks to run (applies only to the retail domain currently).

  • --start-index (int, default: 0)
    Index of the first task to run.

  • --end-index (int, default: -1)
    Index of the last task to run. Use -1 to run all remaining tasks.

  • --task-ids (list of int, optional)
    Explicit list of task IDs to run (overrides index ranges).

Agent Configuration

  • --model (str, required)
    The model to use for the agent.

  • --model-provider (str, choices from provider_list)
    Provider for the agent’s model.

  • --agent-strategy (str, choices: tool-calling, act, react, few-shot, default: tool-calling)
    Strategy used by the agent to interact with the environment.

    • tool-calling: Invoke external tools.
    • act: Pure action selection.
    • react: Reason + act alternation.
    • few-shot: Use few-shot exemplars.
  • --temperature (float, default: 0.0)
    Sampling temperature for the action model (higher = more randomness).

  • --few-shot-displays-path (str, optional)
    Path to a JSONL file containing few-shot demonstration examples.

User Simulator Configuration

  • --user-model (str, default: gpt-4o)
    Model to use for the user simulator.

  • --user-model-provider (str, optional)
    Provider for the user simulator’s model.

  • --user-strategy (str, choices from UserStrategy, default: llm)
    Strategy for the simulated user (e.g., LLM-based).

Execution Controls

  • --max-concurrency (int, default: 1)
    Number of tasks to run in parallel.
@misc{curator-evals,
  author       = {Mackey, Tsach and Shafique, Muhammad Ali and Kumar, Anand},
  title        = {Curator Evals: A Benchmark for High-quality Post-training Data Curation},
  year         = {2025},
  month        = {Sep},
  howpublished = {\url{https://github.com/collinear-ai/curator-evals}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tau_trait-0.1.0.tar.gz (842.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tau_trait-0.1.0-py3-none-any.whl (970.8 kB view details)

Uploaded Python 3

File details

Details for the file tau_trait-0.1.0.tar.gz.

File metadata

  • Download URL: tau_trait-0.1.0.tar.gz
  • Upload date:
  • Size: 842.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for tau_trait-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe599fee8cb17da35a194f3bfdf386ade5679566b58fba0ef1cf9ba001a60651
MD5 b793b7a872c54d5d909001bdaefc6062
BLAKE2b-256 ee9b7b21d59e14e40351fee7eeec109d1e4313eccaac9065a019c57699ee9cdc

See more details on using hashes here.

File details

Details for the file tau_trait-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tau_trait-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 970.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for tau_trait-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8857dfa66090bb4b1725b3a7c5c22506b34dcce57983b5f14df558b0248e51b6
MD5 97f36d7647cbffa3b66052e62bba2166
BLAKE2b-256 b131009dd6f23f22876b7a45207113a61da54622350bd9d6e2cf1c4347702bda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page