Turn coding agent traces into auditable supervised fine-tuning data

These details have not been verified by PyPI

Project description

Teich

Agent data infrastructure for generation, normalization, formatting, response masking, and training audits.

Teich turns raw agent sessions, chat datasets, local JSONL, Hugging Face datasets, and in-memory datasets.Dataset objects into auditable SFT data.

It handles the parts that usually break training runs:

normalizing traces into OpenAI-style messages and tools
preserving tool schemas, reasoning, metadata, and provenance
rendering through your target tokenizer's chat template
recording typed supervision spans before tokenization
applying response-only labels after TRL / Unsloth trainer tokenization
reporting dropped, oversized, trimmed, malformed, and fully masked rows

Use it as a trace generator, a dataset loader, a chat-template renderer, a masking layer, or the whole pipeline.

Install

pip install teich

Or run it without installing:

uvx teich --help

Agent trace generation needs Docker and an API key for the configured provider. Preparing an existing local or Hugging Face dataset does not need Docker.

Prefer a browser workflow?

teich studio

See Teich Studio.

Quickstart: Prepare Existing Data

If your dataset already has messages, Teich can usually prepare it directly.

from teich import prepare_data

train_dataset = prepare_data(
    "TeichAI/Claude-Opus-4.6-Reasoning-887x",
    tokenizer,
    max_length=32768,
    oversized_policy="trim_followups",
    tokenize=True,
    chat_template_kwargs={"enable_thinking": True, "preserve_thinking": True},
)

Then create your trainer and call mask_data():

from teich import mask_data

trainer = mask_data(
    trainer,
    tokenizer=tokenizer,
    train_on_reasoning=True,
    train_on_final_answers=True,
    train_on_tools=True,
)

More detail: Preparing Data and Training.

Quickstart: Generate New Traces

teich init my-project
cd my-project

Add prompts to prompts.jsonl:

{"prompt":"Build a simple todo list app in React"}
{"github_repo":"armand0e/perplexica-mcp","prompt":"Add a small usability improvement and update the tests"}
{"prompt":"Draft a compact project plan","follow_up_prompts":["Revise it for a solo developer","Add a risk checklist"]}

Set your provider key and run:

export OPENAI_API_KEY=sk-...
teich generate -c config.yaml

Teich writes raw traces, converted training rows, sandbox snapshots, and a dataset card under output/. Use --resume to skip prompts that already completed.

More detail: Generation.

What Teich Supports

Use case	Start here
Configure and steer runs in a browser	Teich Studio
Generate Codex, Pi, Claude Code, Hermes, or chat data	Generation
Load local files, folders, Hugging Face datasets, or `datasets.Dataset` objects	Preparing Data
Train with TRL / Unsloth while keeping response-only labels correct	Training
Understand `messages`, `tools`, metadata, and native trace behavior	Data Format
Use `prepare_data`, `mask_data`, `load_traces`, and validation helpers	Python API
See the full generation, preparation, and masking pipeline	Pipeline Flow

Why Teich

Most SFT pipelines flatten agent data too early. That loses tool schemas, tool results, reasoning boundaries, provenance, and the exact assistant spans you meant to train on.

Teich keeps the data structured until the last practical moment:

prompts / traces / JSONL / HF datasets / Dataset objects
        -> load_traces() or prepare_data()
        -> normalized messages + tools
        -> tokenizer chat template rendering
        -> trainer-friendly text + Teich supervision spans
        -> SFTTrainer tokenization
        -> mask_data()
        -> audited input_ids + labels

This makes multi-turn, tool-call, reasoning, and mixed-source datasets trainable without relying on brittle single-span masking.

Common Commands

# Create a generation project
teich init my-project

# Generate data from config.yaml
teich generate -c config.yaml

# Resume an interrupted batch
teich generate -c config.yaml --resume

# Launch the local browser UI
teich studio

# Use a local OpenAI-compatible endpoint
TEICH_PROVIDER=LMstudio \
TEICH_MODEL=gemma-4 \
TEICH_BASE_URL=http://localhost:1234/v1 \
TEICH_API_KEY=llm \
teich generate -c config.yaml

Minimal Config

agent:
  provider: codex  # codex, pi, claude-code, hermes, or chat

model:
  model: codex-mini-latest
  approval_policy: never
  sandbox: danger-full-access

prompts_file: prompts.jsonl

output:
  traces_dir: ./output
  sandbox_dir: ./sandbox
  failures_dir: ./failures

publish:
  repo_id: username/my-dataset
  private: false

agent.provider: chat writes structured chat rows directly and does not require Docker. Agent providers preserve raw or native traces as source-of-truth artifacts.

Python Entry Points

from teich import (
    prepare_data,
    mask_data,
    load_traces,
    detect_trace_type,
    validate_tool_calls,
    row_fits_context,
    trace_is_complete,
    preview_sft_example,
)

See Python API for the full public surface.

Status

Teich is alpha. The core trace, preparation, masking, and audit workflow is usable, but APIs may evolve as more agent formats and training flows are added.

Development

uv pip install -e ".[dev]"
uv run pytest --ignore=tests/test_integration.py -q

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Jun 14, 2026

0.2.0

Jun 14, 2026

0.1.9

Jun 14, 2026

0.1.8

Jun 14, 2026

This version

0.1.7

Jun 13, 2026

0.1.6

Jun 12, 2026

0.1.5

Jun 12, 2026

0.1.4

Jun 11, 2026

0.1.3

Jun 10, 2026

0.1.2

Jun 9, 2026

0.1.1a80 pre-release

Jun 6, 2026

0.1.1a79 pre-release

Jun 6, 2026

0.1.1a78 pre-release

Jun 6, 2026

0.1.1a77 pre-release

Jun 4, 2026

0.1.1a76 pre-release

May 24, 2026

0.1.1a75 pre-release

May 24, 2026

0.1.1a74 pre-release

May 23, 2026

0.1.1a73 pre-release

May 23, 2026

0.1.1a72 pre-release

May 23, 2026

0.1.1a71 pre-release

May 22, 2026

0.1.1a70 pre-release

May 22, 2026

0.1.1a69 pre-release

May 22, 2026

0.1.1a68 pre-release

May 22, 2026

0.1.1a67 pre-release

May 22, 2026

0.1.1a66 pre-release

May 22, 2026

0.1.1a65 pre-release

May 22, 2026

0.1.1a64 pre-release

May 22, 2026

0.1.1a63 pre-release

May 21, 2026

0.1.1a62 pre-release

May 14, 2026

0.1.1a61 pre-release

May 14, 2026

0.1.1a57 pre-release

May 13, 2026

0.1.1a54 pre-release

May 13, 2026

0.1.1a52 pre-release

May 13, 2026

0.1.1a51 pre-release

May 13, 2026

0.1.1a50 pre-release

May 13, 2026

0.1.1a49 pre-release

May 13, 2026

0.1.1a48 pre-release

May 13, 2026

0.1.1a47 pre-release

May 13, 2026

0.1.1a46 pre-release

May 13, 2026

0.1.1a45 pre-release

May 12, 2026

0.1.1a44 pre-release

May 12, 2026

0.1.1a43 pre-release

May 12, 2026

0.1.1a42 pre-release

May 12, 2026

0.1.1a41 pre-release

May 12, 2026

0.1.1a40 pre-release

May 11, 2026

0.1.1a39 pre-release

May 11, 2026

0.1.1a38 pre-release

May 11, 2026

0.1.1a37 pre-release

May 11, 2026

0.1.1a36 pre-release

May 11, 2026

0.1.1a35 pre-release

May 11, 2026

0.1.1a34 pre-release

May 11, 2026

0.1.1a33 pre-release

May 10, 2026

0.1.1a32 pre-release

May 9, 2026

0.1.1a31 pre-release

May 9, 2026

0.1.1a30 pre-release

May 9, 2026

0.1.1a29 pre-release

May 9, 2026

0.1.1a28 pre-release

May 9, 2026

0.1.1a27 pre-release

May 8, 2026

0.1.1a26 pre-release

May 8, 2026

0.1.1a25 pre-release

May 8, 2026

0.1.1a24 pre-release

May 7, 2026

0.1.1a23 pre-release

May 7, 2026

0.1.1a22 pre-release

May 7, 2026

0.1.1a21 pre-release

May 7, 2026

0.1.1a20 pre-release

May 7, 2026

0.1.1a19 pre-release

May 7, 2026

0.1.1a18 pre-release

May 7, 2026

0.1.1a17 pre-release

May 7, 2026

0.1.1a16 pre-release

May 6, 2026

0.1.1a15 pre-release

May 6, 2026

0.1.1a14 pre-release

May 6, 2026

0.1.1a13 pre-release

May 5, 2026

0.1.1a12 pre-release

May 5, 2026

0.1.1a11 pre-release

May 5, 2026

0.1.1a10 pre-release

May 5, 2026

0.1.1a9 pre-release

May 4, 2026

0.1.1a8 pre-release

May 4, 2026

0.1.1a7 pre-release

May 4, 2026

0.1.1a6 pre-release

May 4, 2026

0.1.1a5 pre-release

May 4, 2026

0.1.1a4 pre-release

May 4, 2026

0.1.1a3 pre-release

May 4, 2026

0.1.1a2 pre-release

May 4, 2026

0.1.1a1 pre-release

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teich-0.1.7.tar.gz (1.1 MB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

teich-0.1.7-py3-none-any.whl (853.7 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file teich-0.1.7.tar.gz.

File metadata

Download URL: teich-0.1.7.tar.gz
Upload date: Jun 13, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for teich-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`741f0e296687fdcd463ae282055b4c435b9817b2a3b0fc28eac332b856e635fe`
MD5	`93aa0a1aa55b4ef45614c8dfd4fa2188`
BLAKE2b-256	`53464b6eb0177aa5bf283484aa354be886f652c0af5c0a38c407cb3c9f5a41e6`

See more details on using hashes here.

File details

Details for the file teich-0.1.7-py3-none-any.whl.

File metadata

Download URL: teich-0.1.7-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 853.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for teich-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`161da7f423601608e442b4642fe0c9e08191fd29a9e91ebe8411667e007ac5a0`
MD5	`ab9f52384c5d9e8318810432d340814a`
BLAKE2b-256	`6025ef90b62cff108d016ab7eac45cb15bc946608013e38a22de509ff462cf84`

See more details on using hashes here.

teich 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Teich

Install

Quickstart: Prepare Existing Data

Quickstart: Generate New Traces

What Teich Supports

Why Teich

Common Commands

Minimal Config

Python Entry Points

Status

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes