Skip to main content

Python client for CAT Cafe - Continuous Alignment Testing platform for LLM observability

Project description

CAT Cafe SDK

Python SDK for CAT Cafe - Continuous Alignment Testing platform for LLM observability.

Installation

pip install cat-cafe-client

Quick Start

from cat.cafe.client import CATCafeClient, DatasetImport, DatasetExample

# Initialize the client
client = CATCafeClient(base_url="http://localhost:8000")

# Create a dataset
dataset = DatasetImport(
    name="My Test Dataset",
    description="Sample dataset for testing",
    examples=[
        DatasetExample(
            input={"messages": [{"role": "user", "content": "What's the weather?"}]},
            output={"messages": [{"role": "assistant", "content": "Weather info"}]},
            metadata={"tags": ["weather"]},
        )
    ]
)

# Import the dataset
result = client.import_dataset(dataset)
dataset_id = result["dataset"]["id"]

# Define a test function
def my_test_function(example):
    # Your AI system logic here
    messages = example.input.get("messages", []) if isinstance(example.input, dict) else []
    user_question = messages[-1]["content"] if messages else ""
    return f"Response to: {user_question}"

# Define evaluators
def accuracy_evaluator(actual_output, reference_output):
    # Your evaluation logic here
    expected_messages = reference_output.get("messages", []) if isinstance(reference_output, dict) else []
    score = 0.8  # Example score
    reason = "Good response"
    return score, reason

# Run tests on the dataset
experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=my_test_function,
    evaluators=[accuracy_evaluator],
    name="My Experiment",
    description="Testing my AI system"
)

print(f"Experiment completed: {experiment_id}")

Core Classes

CATCafeClient

The main client for interacting with CAT Cafe:

client = CATCafeClient(base_url="http://localhost:8000")

Dataset Models

  • DatasetImport: For creating new datasets with examples
  • DatasetExample: Individual examples in a dataset
  • Dataset: Structured dataset object returned from API
  • Example: Individual example object returned from API

Experiment Models

  • Experiment: Experiment configuration
  • ExperimentResult: Results from running experiments

Key Methods

Dataset Operations

# Import a complete dataset
result = client.import_dataset(dataset_import)

# Fetch dataset as structured object
dataset = client.fetch_dataset(dataset_id)

# Find dataset by name
dataset = client.fetch_dataset_by_name("My Dataset")

# List all datasets
datasets = client.list_datasets()

Experiment Operations

# Run tests on a dataset (all-in-one)
experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=my_test_func,
    evaluators=[evaluator1, evaluator2],
    name="My Test Run"
)

# Manual experiment workflow (run stream)
experiment_id = client.start_experiment(experiment_config)
# send runs as you produce them
client.create_run(
    experiment_id,
    {
        "run_id": "run-1",
        "example_id": "example-1",
        "input_data": {"prompt": "Hello"},
        "output": {"text": "Hi"},
        "actual_output": {"text": "Hi"},
    },
)
client.append_evaluation(
    experiment_id,
    "run-1",
    {"evaluator_name": "quality", "score": 0.9, "metadata": {"comment": "good"}},
)
client.complete_experiment(experiment_id)

Test Functions

Test functions receive an Example object and should return a string output:

def my_test_function(example: Example) -> str:
    # Access the input messages
    messages = example.input.get("messages", []) if isinstance(example.input, dict) else list(example.input)
    user_message = messages[-1]["content"] if messages else ""
    
    # Your AI system logic here
    response = call_my_ai_system(user_message)
    
    return response

Evaluators

Evaluators receive the actual output and reference output payload, returning a score and reason:

def my_evaluator(actual_output: str, reference_output: list) -> tuple[float, str]:
    # Your evaluation logic
    if "correct_info" in actual_output:
        return 1.0, "Contains correct information"
    else:
        return 0.0, "Missing correct information"

Advanced Usage

Experiment Runner

Experiment runner orchestration (listeners, tracing, caching, etc.) now lives in the cat-experiments package. Install and import cat.experiments if you need the higher-level runner utilities; this SDK now focuses on the core client and evaluation primitives.

Async Test Functions

async def async_test_function(example: Example) -> str:
    # Async AI system call
    response = await my_async_ai_system(example.input)
    return response

# Note: Async functions work but have limitations in certain contexts
experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=async_test_function,
    name="Async Test"
)

Custom Metadata

def metadata_function(example: Example, output: str) -> dict:
    return {
        "response_length": len(output),
        "example_tags": example.tags
    }

experiment_id = client.run_test_on_dataset(
    dataset=dataset_id,
    test_function=my_test_function,
    metadata_function=metadata_function,
    name="Test with Metadata"
)

Manual Experiment Control

# Create experiment configuration
experiment_config = Experiment(
    name="Manual Experiment",
    description="Step-by-step experiment",
    dataset_id=dataset_id,
    tags=["manual", "testing"]
)

# Start experiment
experiment_id = client.start_experiment(experiment_config)

# Run your tests and collect results
dataset = client.fetch_dataset(dataset_id)

for example in dataset.examples:
    output = my_test_function(example)
    
    client.create_run(
        experiment_id,
        {
            "run_id": f"run-{example.id}",
            "example_id": example.id,
            "input_data": {"input": example.input},
            "output": dict(example.output),
            "actual_output": output,
            "evaluation_scores": {"manual_score": 0.8},
        },
    )

# Complete experiment
client.complete_experiment(experiment_id, {"total_examples": len(dataset.examples)})

Requirements

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cat_cafe_client-0.0.2.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cat_cafe_client-0.0.2-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file cat_cafe_client-0.0.2.tar.gz.

File metadata

  • Download URL: cat_cafe_client-0.0.2.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cat_cafe_client-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5b614fe9be47badad01cb1e2e7c71b36179cb9b6bae1b8ac2367eccd34bf1a83
MD5 8febd0ae881bceaa47a01c1d9817d0b4
BLAKE2b-256 fcc211890484897c326aaf5ea495bd8c89d206ad09a3cf2663f0e5204df3d83e

See more details on using hashes here.

File details

Details for the file cat_cafe_client-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: cat_cafe_client-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cat_cafe_client-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5491b74199f79c47e284d512be500baebe2eabcbb999d4f82bcc63806fbea1ae
MD5 826609c2d34be2423f0c64291d4c1eec
BLAKE2b-256 9ce36a1bf24ed8df188d1aa5a6fdc00995a5c1aacd7bce53956af7a6daed5e9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page