Turn coding agent traces into training data

Project description

Teich

Turn coding agent sessions into training data.

Generate → Convert → Train

Run Codex or Pi, capture raw traces, and convert them into structured training examples for fine-tuning.

⚡ Quick Start

pip install teich

teich init my-project && cd my-project
teich generate -c config.yaml

⭐ What Teich Does

Trace-first data collection: Run real coding agents, keep the raw session traces
Multi-agent support: Works with Codex and Pi
Structured output: Converts traces into chat messages with tool calls, reasoning, and tool results
SFT-ready formatting: Applies chat templates and creates assistant masks for supervised fine-tuning
Hugging Face integration: Load traces from local folders or dataset repos like badlogicgames/pi-mono (or any datasets generated with the tool)

📥 Install

pip install teich

Requirements for trace generation:

Docker
OpenAI/OpenRouter API key (or local OpenAI-compatible endpoint)

The Python utilities work without Docker if you already have traces.

🚀 Usage

Generate traces from prompts

# Initialize project
teich init my-project
cd my-project

# Add prompts to prompts.csv, then:
export OPENAI_API_KEY=sk-...
teich generate -c config.yaml

Outputs: raw traces in output/, sandboxes in sandbox/, and a README.md.

Convert traces to training data

from teich import convert_traces_to_training_data
from pathlib import Path

examples = convert_traces_to_training_data(Path("./output"))
print(examples[0]["messages"])

Load and format for training

from teich import load_traces, format_and_mask

# Load from local folder or HF dataset
dataset = load_traces("badlogicgames/pi-mono", split="train")

# Apply chat template and create masks
training_data = format_and_mask(
    dataset,
    tokenizer,
    chat_template_kwargs={"enable_thinking": True}
)

# Preview a formatted example
print(training_data.preview())

📋 Configuration

config.yaml:

agent:
  provider: codex  # or pi

model:
  model: codex-mini-latest
  approval_policy: never
  sandbox: danger-full-access

prompts_file: prompts.csv

output:
  traces_dir: ./output
  sandbox_dir: ./sandbox

Local providers (LM Studio, Ollama)

export TEICH_PROVIDER=LMstudio
export TEICH_MODEL=gemma-4
export TEICH_BASE_URL=http://localhost:1234/v1
export TEICH_API_KEY=llm

teich generate -c config.yaml

🏗️ Data Structure

Training examples include:

prompt: initial task description
messages: chat history (system, user, assistant, tool)
tools: tool schemas used in the session
metadata: session info, model, timestamps

Assistant messages capture:

content: text response
reasoning_content: chain-of-thought traces
tool_calls: function calls with arguments

🔧 Python API

from teich import (
    load_traces,           # Load from folder or HF dataset
    format_and_mask,        # Apply chat template + assistant masks
    convert_traces_to_training_data,  # Convert raw traces to examples
    Config,                 # Load config.yaml
    TrainingExample         # Typed training example
)

📦 Trace-First Workflow

Teich preserves the raw agent session as the source of truth:

Collect: Run agents on real tasks → raw .jsonl traces
Inspect/Share: Traces are human-readable and uploadable
Convert: Transform to structured examples when ready
Format: Apply model-specific chat templates for training

This means you can:

Re-convert with different logic later
Share raw traces before releasing training data
Train on the same sessions with different model templates

🛠️ Development

uv pip install -e ".[dev]"
pytest tests/test_formatter.py tests/test_loader.py -q

📌 Status

Teich is alpha. The core workflow is stable and usable. APIs may evolve as more agent types and training workflows are added.

📄 License

Apache-2.0

Project details

Release history Release notifications | RSS feed

0.1.1a9 pre-release

May 4, 2026

0.1.1a8 pre-release

May 4, 2026

This version

0.1.1a7 pre-release

May 4, 2026

0.1.1a6 pre-release

May 4, 2026

0.1.1a5 pre-release

May 4, 2026

0.1.1a4 pre-release

May 4, 2026

0.1.1a3 pre-release

May 4, 2026

0.1.1a2 pre-release

May 4, 2026

0.1.1a1 pre-release

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teich-0.1.1a7.tar.gz (234.3 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

teich-0.1.1a7-py3-none-any.whl (43.1 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file teich-0.1.1a7.tar.gz.

File metadata

Download URL: teich-0.1.1a7.tar.gz
Upload date: May 4, 2026
Size: 234.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for teich-0.1.1a7.tar.gz
Algorithm	Hash digest
SHA256	`7b9f832f16b3cdc0711ef99c56019b110fad873d227a938e6aa19e22eb5a8c6a`
MD5	`1a2e8c41193d7576c25320ebe8b711ca`
BLAKE2b-256	`9c5527231f744775deea227d50884b15c18ea592e8b9be5d42b876397928a71b`

See more details on using hashes here.

File details

Details for the file teich-0.1.1a7-py3-none-any.whl.

File metadata

Download URL: teich-0.1.1a7-py3-none-any.whl
Upload date: May 4, 2026
Size: 43.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for teich-0.1.1a7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4fb24babd97b6e877421d21d0b6610d0d5fc9e7b33f8aacecd14d45d62be0edf`
MD5	`dac945f63a3308d659a8c8636d5c0e6d`
BLAKE2b-256	`5e8daa2a0e956f1ea8498e372b141e01aa4551c1cc3c22d2d207e933451c1a01`

See more details on using hashes here.

teich 0.1.1a7

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Teich

⚡ Quick Start

⭐ What Teich Does

📥 Install

🚀 Usage

Generate traces from prompts

Convert traces to training data

Load and format for training

📋 Configuration

Local providers (LM Studio, Ollama)

🏗️ Data Structure

🔧 Python API

📦 Trace-First Workflow

🛠️ Development

📌 Status

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes