Distill agent traces into training datasets across multiple formats
Project description
Trajectory Distiller
Convert agent traces from multiple formats into training datasets for fine-tuning.
Installation
pip install trajectory-distiller
Supported Input Formats
| Format | Description |
|---|---|
glint |
Session-based format with turns array |
armand0e |
Conversation-based format with tool_calls |
vfable |
Trajectory-based format with tool_use |
opencoven |
Source/target pair format |
victor |
Prompt/response pair format |
Supported Output Formats
| Format | Description |
|---|---|
openai_chat |
OpenAI chat completion format |
alpaca |
Alpaca instruction format |
sharegpt |
ShareGPT conversation format |
conversation |
General conversation format |
Quick Start
Distill Traces
# Convert glint traces to OpenAI chat format (auto-detected)
distill input.jsonl --format openai_chat --output train.jsonl
# Convert armand0e format explicitly
distill input.jsonl --input-format armand0e --format sharegpt -o train.jsonl
# Convert to alpaca format
distill input.jsonl --format alpaca -o alpaca_train.jsonl
Filter Traces
# Filter to records using specific tools
distill filter traces.jsonl --tool bash --tool edit
# Filter by error rate and quality
distill filter traces.jsonl --min-errors 0.1 --min-quality 0.5
# Filter by session length
distill filter traces.jsonl --min-turns 5 --max-turns 50
# Combine filters and save
distill filter traces.jsonl --tool bash --min-quality 0.3 -o filtered.jsonl
Split Dataset
# Split into 95/5 train/val
distill split traces.jsonl --train-ratio 0.95 --val-ratio 0.05
# Stratify by tool distribution
distill split traces.jsonl --stratify-by tool --output-dir splits/
# Split with test set
distill split traces.jsonl --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1
Fable5 Dataset Usage
# Glint dataset
distill glint_traces.jsonl --format openai_chat -o glint_openai.jsonl
# armand0e dataset
distill armand0e_data.jsonl --input-format armand0e --format alpaca -o armand0e_alpaca.jsonl
# vfable dataset
distill vfable_traces.jsonl --input-format vfable --format sharegpt -o vfable_sharegpt.jsonl
# opencoven dataset
distill opencoven_pairs.jsonl --input-format opencoven --format openai_chat -o opencoven_openai.jsonl
# victor dataset
distill victor_pairs.jsonl --input-format victor --format conversation -o victor_conv.jsonl
Programming API
from trajectory_distiller import Distiller, FormatConverter, TraceFilter, DataSplitter
# Distill traces
distiller = Distiller()
records = distiller.distill("traces.jsonl", output_format="openai_chat")
# Filter traces
trace_filter = TraceFilter()
filtered = trace_filter.filter_by_tool(records, tools=["bash", "edit"])
filtered = trace_filter.filter_by_quality(filtered, min_quality_score=0.5)
# Convert formats
converter = FormatConverter()
alpaca_records = converter.to_alpaca(records)
sharegpt_records = converter.to_sharegpt(records)
# Split data
splitter = DataSplitter()
splits = splitter.split(records, train_ratio=0.95, stratify_by="tool")
splits.save("output/")
print(splits.stats())
License
MIT
Ecosystem
Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:
| Project | Description |
|---|---|
| Anvil | Self-verified coding agent |
| VerifyLoop | Plan→Execute→Verify→Recover framework |
| ErrorRecovery | Self-healing middleware (3,725 error patterns) |
| FableForge-14B | The fine-tuned 14B model (4-stage training) |
| ShellWhisperer | 1.5B edge agent (phone/RPi, 50ms) |
| ReasonCritic | Verification model (130 benchmark tasks) |
| TraceCompiler | Compile traces → LoRA skills |
| AgentRuntime | Persistent agent daemon (systemd for AI) |
| AgentSwarm | Multi-agent from real trace transitions |
| AgentTelemetry | Datadog for agents (token tracking, costs) |
| BenchAgent | HumanEval for tool-use (107 tasks) |
| AgentDev | VSCode extension with verification |
| TraceViz | Trace replay visualizer (Next.js) |
| AgentSkills | npm for agent behaviors |
| AgentCurriculum | 5-stage progressive training |
| AgentFuzzer | Adversarial testing for agents |
| AgentConstitution | Safety guardrails from traces |
| CostOptimizer | Token cost reduction (50-80%) |
| AgentProfiler | Behavioral fingerprinting |
| TrajectoryDistiller | Trace→training data pipeline |
| Fable5-Dataset | HuggingFace dataset release |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fableforge_trajectory_distiller-0.1.0.tar.gz.
File metadata
- Download URL: fableforge_trajectory_distiller-0.1.0.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca3316838baeaacde93907705c34fe17871645834e8f37207cfab88ec16700b7
|
|
| MD5 |
38564a0f24122ba815e305a8e1271624
|
|
| BLAKE2b-256 |
e3916ee71be53928bcc3aaa58e50fbbd8e30ba725e06fe19587428680e4364f8
|
Provenance
The following attestation bundles were made for fableforge_trajectory_distiller-0.1.0.tar.gz:
Publisher:
release.yml on KingLabsA/trajectory-distiller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fableforge_trajectory_distiller-0.1.0.tar.gz -
Subject digest:
ca3316838baeaacde93907705c34fe17871645834e8f37207cfab88ec16700b7 - Sigstore transparency entry: 1819994107
- Sigstore integration time:
-
Permalink:
KingLabsA/trajectory-distiller@9b64d03687783c3afa6f77efe1c4607b20ccf838 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KingLabsA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9b64d03687783c3afa6f77efe1c4607b20ccf838 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fableforge_trajectory_distiller-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fableforge_trajectory_distiller-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91b856a30df8df0836c202ec491f5c186e8e67de18749719f2dc29f393ff0001
|
|
| MD5 |
b5b04d372a66302391533991e2276383
|
|
| BLAKE2b-256 |
78e623b1b4570a9f7f02ea03a1c471b2381ec82df738af2d7519a02421634b5d
|
Provenance
The following attestation bundles were made for fableforge_trajectory_distiller-0.1.0-py3-none-any.whl:
Publisher:
release.yml on KingLabsA/trajectory-distiller
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fableforge_trajectory_distiller-0.1.0-py3-none-any.whl -
Subject digest:
91b856a30df8df0836c202ec491f5c186e8e67de18749719f2dc29f393ff0001 - Sigstore transparency entry: 1819994154
- Sigstore integration time:
-
Permalink:
KingLabsA/trajectory-distiller@9b64d03687783c3afa6f77efe1c4607b20ccf838 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KingLabsA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9b64d03687783c3afa6f77efe1c4607b20ccf838 -
Trigger Event:
push
-
Statement type: