Python SDK for CAT Cafe - Continuous Alignment Testing platform for LLM observability

Project description

CAT Cafe SDK

Python SDK for CAT Cafe - Continuous Alignment Testing platform for LLM observability.

Installation

pip install cat-cafe-sdk

Quick Start

from cat_cafe_sdk import CATTestRunClient, DatasetImport, DatasetExample

# Initialize the client
client = CATTestRunClient(base_url="http://localhost:8000")

# Create a dataset
dataset = DatasetImport(
    name="My Test Dataset",
    description="Sample dataset for testing",
    examples=[
        DatasetExample(
            input=[{"role": "user", "content": "What's the weather?"}],
            expected_output=[{"role": "assistant", "content": "Weather info"}],
            tags=["weather"]
        )
    ]
)

# Import the dataset
result = client.import_dataset(dataset)
dataset_id = result["dataset"]["id"]

# Define a test function
def my_test_function(example):
    # Your AI system logic here
    user_question = example.input[-1]["content"]
    return f"Response to: {user_question}"

# Define evaluators
def accuracy_evaluator(actual_output, expected_output):
    # Your evaluation logic here
    score = 0.8  # Example score
    reason = "Good response"
    return score, reason

# Run tests on the dataset
experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=my_test_function,
    evaluators=[accuracy_evaluator],
    name="My Experiment",
    description="Testing my AI system"
)

print(f"Experiment completed: {experiment_id}")

Core Classes

CATTestRunClient / CATExperimentClient

The main client for interacting with CAT Cafe:

client = CATTestRunClient(base_url="http://localhost:8000")

Dataset Models

DatasetImport: For creating new datasets with examples
DatasetExample: Individual examples in a dataset
Dataset: Structured dataset object returned from API
Example: Individual example object returned from API

Experiment Models

Experiment: Experiment configuration
ExperimentResult: Results from running experiments

Key Methods

Dataset Operations

# Import a complete dataset
result = client.import_dataset(dataset_import)

# Fetch dataset as structured object
dataset = client.fetch_dataset(dataset_id)

# Find dataset by name
dataset = client.fetch_dataset_by_name("My Dataset")

# List all datasets
datasets = client.list_datasets()

Experiment Operations

# Run tests on a dataset (all-in-one)
experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=my_test_func,
    evaluators=[evaluator1, evaluator2],
    name="My Test Run"
)

# Manual experiment workflow
experiment_id = client.start_experiment(experiment_config)
client.submit_results(experiment_id, results)
client.complete_experiment(experiment_id)

Test Functions

Test functions receive an Example object and should return a string output:

def my_test_function(example: Example) -> str:
    # Access the input messages
    user_message = example.input[-1]["content"]
    
    # Your AI system logic here
    response = call_my_ai_system(user_message)
    
    return response

Evaluators

Evaluators receive the actual output and expected output, returning a score and reason:

def my_evaluator(actual_output: str, expected_output: list) -> tuple[float, str]:
    # Your evaluation logic
    if "correct_info" in actual_output:
        return 1.0, "Contains correct information"
    else:
        return 0.0, "Missing correct information"

Advanced Usage

Async Test Functions

async def async_test_function(example: Example) -> str:
    # Async AI system call
    response = await my_async_ai_system(example.input)
    return response

# Note: Async functions work but have limitations in certain contexts
experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=async_test_function,
    name="Async Test"
)

Custom Metadata

def metadata_function(example: Example, output: str) -> dict:
    return {
        "response_length": len(output),
        "example_tags": example.tags
    }

experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=my_test_function,
    metadata_function=metadata_function,
    name="Test with Metadata"
)

Manual Experiment Control

# Create experiment configuration
experiment_config = Experiment(
    name="Manual Experiment",
    description="Step-by-step experiment",
    dataset_id=dataset_id,
    tags=["manual", "testing"]
)

# Start experiment
experiment_id = client.start_experiment(experiment_config)

# Run your tests and collect results
results = []
dataset = client.fetch_dataset(dataset_id)

for example in dataset.examples:
    output = my_test_function(example)
    
    result = ExperimentResult(
        example_id=example.id,
        input_data={"input": example.input},
        expected_output=str(example.expected_output),
        actual_output=output,
        evaluation_scores={"manual_score": 0.8}
    )
    results.append(result)

# Submit results
client.submit_results(experiment_id, results)

# Complete experiment
client.complete_experiment(experiment_id, {"total_examples": len(results)})

Requirements

Python 3.12+
httpx
CAT Cafe server running (default: http://localhost:8000)

License

MIT License

Project details

Release history Release notifications | RSS feed

0.1.2

Nov 20, 2025

0.1.1

Nov 19, 2025

This version

0.1.0

Aug 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cat_cafe_sdk-0.1.0-py3-none-any.whl (16.1 kB view details)

Uploaded Aug 20, 2025 Python 3

File details

Details for the file cat_cafe_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: cat_cafe_sdk-0.1.0-py3-none-any.whl
Upload date: Aug 20, 2025
Size: 16.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.8

File hashes

Hashes for cat_cafe_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`383d96d01a18583bd8c277dde2a64d6706f17320e9a30c3ac09ff3c39cd33b26`
MD5	`e0a50a4c1d96965107b0923b10e14f5c`
BLAKE2b-256	`231aa7329af6129125acd3075b05578934b900087a4591fce91b5886d877e4cb`

See more details on using hashes here.

cat-cafe-sdk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta