Turn coding agent traces into auditable supervised fine-tuning data

Project description

Teich

Turn coding agent sessions into auditable supervised fine-tuning data.

Run codex or pi to capture raw coding-agent traces, or use chat mode to generate text-only training rows directly.

Load local folders, local files, or Hugging Face dataset repos; normalize them into messages/tools; and prepare trainer-friendly text rows that mask_data converts into audited response-only labels after SFTTrainer tokenization.

⚡ Quick Start

pip install teich

teich init my-project && cd my-project
teich generate -c config.yaml

Or use astral-uv

uvx teich init my-project && cd my-project
uvx teich generate -c config.yaml

Be sure to edit your config.yaml and prompts.csv file as needed

⭐ What Teich Does

Trace-first data collection: Run real coding agents and keep raw session traces as the source of truth
Multi-agent support: Works with Codex, Pi, and a text-only chat mode
Structured conversion: Converts traces into chat messages with tool calls, reasoning, tool results, metadata, and configured tool snapshots
SFT-ready preparation: Applies tokenizer chat templates, masks labels, builds a Teich collator, and audits the dataset before training
Hugging Face integration: Publishes dataset cards plus tools.json, and loads local or Hub datasets through one API

📥 Prerequisites

Requirements for agent trace generation:

Docker
OpenAI/OpenRouter API key (or local OpenAI-compatible endpoint)

agent.provider: chat does not require Docker. The Python utilities also work without Docker if you already have traces or structured JSONL datasets.

Training examples use your existing finetuning stack. For the TRL example below, install compatible versions of transformers, trl, and your model-loading stack separately.

🚀 Usage

Generate traces from prompts

# Initialize project
teich init my-project
cd my-project

# Add prompts to prompts.csv, then:
export OPENAI_API_KEY=sk-...
teich generate -c config.yaml

Outputs:

codex / pi: raw traces in output/, sandboxes in sandbox/, and a README.md
chat: text-only JSONL training rows in output/ and a dataset README.md

If publish.repo_id is configured, Teich also creates or updates the matching Hugging Face dataset repo and uploads the generated JSONL, README, and tools.json automatically.

If a long run is interrupted, use:

teich generate -c config.yaml --resume

Teich will scan existing outputs and skip prompts that already converted into completed training examples.

Prompt files can be CSV, text, JSONL/NDJSON, or JSON. JSONL is recommended for very long or multiline prompts.

Generate a text-only chat dataset

agent:
  provider: chat

model:
  model: gpt-4.1-mini

api:
  provider: openai
  wire_api: responses

Each generated JSONL line will look like:

{"messages":[{"role":"system","content":"You are a helpful assistant","thinking":null},{"role":"user","content":"Hello","thinking":null},{"role":"assistant","content":"Hi!","thinking":"I should greet the user."}],"system":"You are a helpful assistant","prompt":"Hello","thinking":"I should greet the user.","response":"Hi!","model":"gpt-4.1-mini"}

Train with Unsloth and TRL `SFTTrainer`

Use the trainer-first path: prepare_data renders trainer-friendly text rows with Teich supervision metadata, SFTTrainer tokenizes them, then mask_data applies multi-turn/tool-aware response-only labels to the trainer dataset.

import os

from unsloth import FastLanguageModel
import torch
from trl import SFTConfig, SFTTrainer

from teich import mask_data, prepare_data

MAX_SEQ_LEN = 32768
MODEL_NAME = "unsloth/Qwen3.5-0.8B"
TRAIN_ON_REASONING = True
CHAT_TEMPLATE_KWARGS = {"enable_thinking": True}
PUSH_TO_HUB_REPO_ID = "username/teich-sft-model"
HF_TOKEN = os.environ.get("HF_TOKEN") or ""

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LEN,
    load_in_4bit=False,
    load_in_8bit=False,
    full_finetuning=False,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "out_proj"],
    lora_alpha=64,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

train_dataset = prepare_data(
    "TeichAI/lordx64-claude-opus-4.7-max-cleaned",
    tokenizer,
    split="train",
    max_examples=500,
    chat_template_kwargs=CHAT_TEMPLATE_KWARGS,
    max_length=MAX_SEQ_LEN,
    drop_oversized_examples=True,
    tokenize=True,
    strict=True,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=None,
    args=SFTConfig(
        dataset_text_field="text",
        dataset_num_proc=1,
        max_length=MAX_SEQ_LEN,
        packing=False,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=1,
        learning_rate=2e-4,
        logging_steps=1,
        optim="muon",
        optim_target_modules="all-linear",
        weight_decay=0.001,
        lr_scheduler_type="linear",
        output_dir="outputs",
        seed=3407,
        report_to="none",
    ),
)
trainer = mask_data(
    trainer,
    tokenizer=tokenizer,
    train_on_reasoning=TRAIN_ON_REASONING,
    train_on_final_answers=True,
    train_on_tools=True,
)

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

trainer_stats = trainer.train(resume_from_checkpoint=False)

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime'] / 60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

model.push_to_hub_merged(PUSH_TO_HUB_REPO_ID, tokenizer, save_method="merged_16bit", token=HF_TOKEN)

prepare_data loads local folders, local files, Hugging Face datasets, or a list mixing any of those with already-loaded datasets.Dataset objects; applies the tokenizer chat template; optionally tokenizes only to drop rows above max_length; and returns trainer-friendly text rows with typed Teich span metadata for multi-turn/tool-aware masking. Pass tokenize=True for the Unsloth/TRL flow so trainer setup treats the dataset as already tokenized and preserves Teich's span metadata for mask_data. If you do not want Teich response-only masking, pass teich_masking=False; prepare_data() will return plain rendered text rows, plus input_ids and attention_mask when tokenize=True, ready for a standard trainer flow. Mixed chat-only and tool-call datasets are formatted separately before concatenation, so their schemas do not need to match beyond the normalized messages/tools fields.

mask_data follows the same trainer-first shape as Unsloth's response-only helper, but uses Teich's typed span metadata so multi-turn tool calls and tool responses are masked correctly. By default it trains on assistant reasoning, assistant final answers, and assistant tool calls, while keeping user/system/developer/tool-response text masked. You can override that policy with train_on_reasoning, train_on_final_answers, train_on_tools, train_on_user, train_on_system, train_on_developer, and train_on_tool_responses. It returns a compact trainer dataset with only input_ids and labels; the trainer collator builds attention masks dynamically. Keep packing=False for this flow because packed datasets merge row boundaries before masking. For long-context runs, max_supervised_tokens defaults to the trainer's max_length to cap the number of trainable answer tokens per row; pass a lower value if loss memory is still too high.

To combine datasets, pass a list of dataset IDs, local paths, or loaded datasets.Dataset objects:

train_dataset = prepare_data(
    ["username/chat-traces", "username/tool-traces"],
    tokenizer,
    max_length=MAX_SEQ_LEN,
    drop_oversized_examples=True,
    tokenize=True,
    chat_template_kwargs=CHAT_TEMPLATE_KWARGS,
)

Fallback manual flow with `load_traces`

Use load_traces directly only when you want to own the remaining training pipeline yourself: chat-template rendering, filtering, tokenization, label masking, packing policy, and auditing.

from teich import load_traces

dataset = load_traces("./output")
example = dataset[0]

rendered = tokenizer.apply_chat_template(
    example["messages"],
    tools=example.get("tools") or [],
    tokenize=False,
    add_generation_prompt=False,
    enable_thinking=True,
)
tokenized = tokenizer(rendered, truncation=True, max_length=32768)

📋 Configuration

config.yaml:

agent:
  provider: codex  # or pi or chat

model:
  model: codex-mini-latest
  approval_policy: never
  sandbox: danger-full-access

prompts_file: prompts.csv

output:
  traces_dir: ./output
  sandbox_dir: ./sandbox
  pretty_name: "My Agent Traces"

publish:
  repo_id: armand0e/my-dataset
  hf_token: hf_xxx
  private: false

Dataset tags are auto-generated from the provider and model:

codex / pi: agent-traces, <provider>, distillation, <model>, teich
chat: conversational, distillation, teich, <model>

If publish.hf_token is omitted, Teich also accepts HF_TOKEN, HUGGINGFACE_HUB_TOKEN, or TEICH_HF_TOKEN from the environment.

Local providers (LM Studio, Ollama)

export TEICH_PROVIDER=LMstudio
export TEICH_MODEL=gemma-4
export TEICH_BASE_URL=http://localhost:1234/v1
export TEICH_API_KEY=llm

teich generate -c config.yaml

🏗️ Data Structure

Training examples include:

prompt: initial task description
messages: chat history (system, user, assistant, tool)
tools: tool schemas used in the session
metadata: session info, model, timestamps, and usage when available

Structured chat datasets can also include convenience top-level fields like:

system
thinking
response
model

Assistant messages capture:

content: text response
reasoning_content: chain-of-thought traces
tool_calls: function calls with arguments

🔧 Python API

from teich import (
    prepare_data,        # Recommended: render trainer-friendly text rows
    mask_data,           # Recommended: apply Teich labels after SFTTrainer tokenization
    load_traces,         # Fallback: load rows for fully manual processing
    preview_sft_example, # Preview supervised vs masked tokens
    Config,              # Load config.yaml
    TrainingExample,     # Typed training example
)

README.md is the package readme used for PyPI, so these examples are the canonical public package docs.

📦 Trace-First Workflow

Teich preserves the raw agent session as the source of truth:

Collect: Run agents on real tasks → raw .jsonl traces
Inspect/Share: Traces are human-readable and uploadable
Convert: Transform to structured examples when ready
Prepare: Use prepare_data() + mask_data() to apply model-specific templates and labels through the trainer-first flow

If you choose agent.provider: chat, Teich skips the trace-preservation step and writes structured text-only JSONL rows directly.

This means you can:

Re-convert with different logic later
Share raw traces before releasing training data
Train on the same sessions with different model templates

🛠️ Development

uv pip install -e ".[dev]"
uv run pytest --ignore=tests/test_integration.py -q

📌 Status

Teich is alpha. The core workflow is stable and usable. APIs may evolve as more agent types and training workflows are added.

📄 License

Apache-2.0

Project details

Release history Release notifications | RSS feed

This version

0.1.1a45 pre-release

May 12, 2026

0.1.1a44 pre-release

May 12, 2026

0.1.1a43 pre-release

May 12, 2026

0.1.1a42 pre-release

May 12, 2026

0.1.1a41 pre-release

May 12, 2026

0.1.1a40 pre-release

May 11, 2026

0.1.1a39 pre-release

May 11, 2026

0.1.1a38 pre-release

May 11, 2026

0.1.1a37 pre-release

May 11, 2026

0.1.1a36 pre-release

May 11, 2026

0.1.1a35 pre-release

May 11, 2026

0.1.1a34 pre-release

May 11, 2026

0.1.1a33 pre-release

May 10, 2026

0.1.1a32 pre-release

May 9, 2026

0.1.1a31 pre-release

May 9, 2026

0.1.1a30 pre-release

May 9, 2026

0.1.1a29 pre-release

May 9, 2026

0.1.1a28 pre-release

May 9, 2026

0.1.1a27 pre-release

May 8, 2026

0.1.1a26 pre-release

May 8, 2026

0.1.1a25 pre-release

May 8, 2026

0.1.1a24 pre-release

May 7, 2026

0.1.1a23 pre-release

May 7, 2026

0.1.1a22 pre-release

May 7, 2026

0.1.1a21 pre-release

May 7, 2026

0.1.1a20 pre-release

May 7, 2026

0.1.1a19 pre-release

May 7, 2026

0.1.1a18 pre-release

May 7, 2026

0.1.1a17 pre-release

May 7, 2026

0.1.1a16 pre-release

May 6, 2026

0.1.1a15 pre-release

May 6, 2026

0.1.1a14 pre-release

May 6, 2026

0.1.1a13 pre-release

May 5, 2026

0.1.1a12 pre-release

May 5, 2026

0.1.1a11 pre-release

May 5, 2026

0.1.1a10 pre-release

May 5, 2026

0.1.1a9 pre-release

May 4, 2026

0.1.1a8 pre-release

May 4, 2026

0.1.1a7 pre-release

May 4, 2026

0.1.1a6 pre-release

May 4, 2026

0.1.1a5 pre-release

May 4, 2026

0.1.1a4 pre-release

May 4, 2026

0.1.1a3 pre-release

May 4, 2026

0.1.1a2 pre-release

May 4, 2026

0.1.1a1 pre-release

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teich-0.1.1a45.tar.gz (282.4 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

teich-0.1.1a45-py3-none-any.whl (77.2 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file teich-0.1.1a45.tar.gz.

File metadata

Download URL: teich-0.1.1a45.tar.gz
Upload date: May 12, 2026
Size: 282.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for teich-0.1.1a45.tar.gz
Algorithm	Hash digest
SHA256	`e066a52f8387f4ae55688b767e6f52f6b63d7cc5a081f884c748955e5f65c9a9`
MD5	`55cecb93eab64725c487cd327dad1e0d`
BLAKE2b-256	`1d680925d9e6b12f663fd76b6c4b03490279ed924cfa829580d869ad7132bd1e`

See more details on using hashes here.

File details

Details for the file teich-0.1.1a45-py3-none-any.whl.

File metadata

Download URL: teich-0.1.1a45-py3-none-any.whl
Upload date: May 12, 2026
Size: 77.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for teich-0.1.1a45-py3-none-any.whl
Algorithm	Hash digest
SHA256	`321fcacd8aa43e25283627aae6070ed6e8bbd08e5db5349001480d8d8f5906ed`
MD5	`40d961cc8b01fba71a8c7142f901ac96`
BLAKE2b-256	`e2caa064820a6ba34cf5eabd10348efb6c83a093aa2793b9fa65a764227592dc`

See more details on using hashes here.

teich 0.1.1a45

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Teich

⚡ Quick Start

⭐ What Teich Does

📥 Prerequisites

🚀 Usage

Generate traces from prompts

Generate a text-only chat dataset

Train with Unsloth and TRL `SFTTrainer`

Fallback manual flow with `load_traces`

📋 Configuration

Local providers (LM Studio, Ollama)

🏗️ Data Structure

🔧 Python API

📦 Trace-First Workflow

🛠️ Development

📌 Status

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

teich 0.1.1a45

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Teich

⚡ Quick Start

⭐ What Teich Does

📥 Prerequisites

🚀 Usage

Generate traces from prompts

Generate a text-only chat dataset

Train with Unsloth and TRL SFTTrainer

Fallback manual flow with load_traces

📋 Configuration

Local providers (LM Studio, Ollama)

🏗️ Data Structure

🔧 Python API

📦 Trace-First Workflow

🛠️ Development

📌 Status

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Train with Unsloth and TRL `SFTTrainer`

Fallback manual flow with `load_traces`