Turn coding agent traces into training data
Project description
Teich
Turn coding agent sessions into training data.
Generate → Convert → Train
Run Codex or Pi, capture raw traces, and convert them into structured training examples for fine-tuning.
⚡ Quick Start
pip install teich
teich init my-project && cd my-project
teich generate -c config.yaml
⭐ What Teich Does
- Trace-first data collection: Run real coding agents, keep the raw session traces
- Multi-agent support: Works with Codex and Pi
- Structured output: Converts traces into chat messages with tool calls, reasoning, and tool results
- SFT-ready formatting: Applies chat templates and creates assistant masks for supervised fine-tuning
- Hugging Face integration: Load traces from local folders or dataset repos like
badlogicgames/pi-mono(or any datasets generated with the tool)
📥 Install
pip install teich
Requirements for trace generation:
- Docker
- OpenAI/OpenRouter API key (or local OpenAI-compatible endpoint)
The Python utilities work without Docker if you already have traces.
🚀 Usage
Generate traces from prompts
# Initialize project
teich init my-project
cd my-project
# Add prompts to prompts.csv, then:
export OPENAI_API_KEY=sk-...
teich generate -c config.yaml
Outputs: raw traces in output/, sandboxes in sandbox/, and a README.md.
Convert traces to training data
from teich import convert_traces_to_training_data
from pathlib import Path
examples = convert_traces_to_training_data(Path("./output"))
print(examples[0]["messages"])
Load and format for training
from teich import load_traces, format_and_mask
# Load from local folder or HF dataset
dataset = load_traces("badlogicgames/pi-mono", split="train")
# Apply chat template and create masks
training_data = format_and_mask(
dataset,
tokenizer,
chat_template_kwargs={"enable_thinking": True}
)
# Preview a formatted example
print(training_data.preview())
📋 Configuration
config.yaml:
agent:
provider: codex # or pi
model:
model: codex-mini-latest
approval_policy: never
sandbox: danger-full-access
prompts_file: prompts.csv
output:
traces_dir: ./output
sandbox_dir: ./sandbox
Local providers (LM Studio, Ollama)
export TEICH_PROVIDER=LMstudio
export TEICH_MODEL=gemma-4
export TEICH_BASE_URL=http://localhost:1234/v1
export TEICH_API_KEY=llm
teich generate -c config.yaml
🏗️ Data Structure
Training examples include:
prompt: initial task descriptionmessages: chat history (system, user, assistant, tool)tools: tool schemas used in the sessionmetadata: session info, model, timestamps
Assistant messages capture:
content: text responsereasoning_content: chain-of-thought tracestool_calls: function calls with arguments
🔧 Python API
from teich import (
load_traces, # Load from folder or HF dataset
format_and_mask, # Apply chat template + assistant masks
convert_traces_to_training_data, # Convert raw traces to examples
Config, # Load config.yaml
TrainingExample # Typed training example
)
📦 Trace-First Workflow
Teich preserves the raw agent session as the source of truth:
- Collect: Run agents on real tasks → raw
.jsonltraces - Inspect/Share: Traces are human-readable and uploadable
- Convert: Transform to structured examples when ready
- Format: Apply model-specific chat templates for training
This means you can:
- Re-convert with different logic later
- Share raw traces before releasing training data
- Train on the same sessions with different model templates
🛠️ Development
uv pip install -e ".[dev]"
pytest tests/test_formatter.py tests/test_loader.py -q
📌 Status
Teich is alpha. The core workflow is stable and usable. APIs may evolve as more agent types and training workflows are added.
📄 License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file teich-0.1.1a9.tar.gz.
File metadata
- Download URL: teich-0.1.1a9.tar.gz
- Upload date:
- Size: 235.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff1e6d1769349250c652c490527d426a8f1d3dc6d60208f1af8c07f9930e27aa
|
|
| MD5 |
0830f63cb8f0c61c3826f28ab9fa396f
|
|
| BLAKE2b-256 |
28a4173a2538ec3a104621973b6af151be3c913664377b4ab13e099703029927
|
File details
Details for the file teich-0.1.1a9-py3-none-any.whl.
File metadata
- Download URL: teich-0.1.1a9-py3-none-any.whl
- Upload date:
- Size: 44.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1608d9d95ce4f90726417b0223264c807b6eccd2443d2d78193cd38c099c9d23
|
|
| MD5 |
5a046783e5662bb7af0fc39712179582
|
|
| BLAKE2b-256 |
3cf6c5b79f009e5456ab2e49a4d7a81821611c1a7a2a797fccf258819bd3e4b3
|