Skip to main content

Distill agent traces into training datasets across multiple formats

Project description

Trajectory Distiller

License: MIT Python 3.10+ Tests

Convert agent traces from multiple formats into training datasets for fine-tuning.

Installation

pip install trajectory-distiller

Supported Input Formats

Format Description
glint Session-based format with turns array
armand0e Conversation-based format with tool_calls
vfable Trajectory-based format with tool_use
opencoven Source/target pair format
victor Prompt/response pair format

Supported Output Formats

Format Description
openai_chat OpenAI chat completion format
alpaca Alpaca instruction format
sharegpt ShareGPT conversation format
conversation General conversation format

Quick Start

Distill Traces

# Convert glint traces to OpenAI chat format (auto-detected)
distill input.jsonl --format openai_chat --output train.jsonl

# Convert armand0e format explicitly
distill input.jsonl --input-format armand0e --format sharegpt -o train.jsonl

# Convert to alpaca format
distill input.jsonl --format alpaca -o alpaca_train.jsonl

Filter Traces

# Filter to records using specific tools
distill filter traces.jsonl --tool bash --tool edit

# Filter by error rate and quality
distill filter traces.jsonl --min-errors 0.1 --min-quality 0.5

# Filter by session length
distill filter traces.jsonl --min-turns 5 --max-turns 50

# Combine filters and save
distill filter traces.jsonl --tool bash --min-quality 0.3 -o filtered.jsonl

Split Dataset

# Split into 95/5 train/val
distill split traces.jsonl --train-ratio 0.95 --val-ratio 0.05

# Stratify by tool distribution
distill split traces.jsonl --stratify-by tool --output-dir splits/

# Split with test set
distill split traces.jsonl --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1

Fable5 Dataset Usage

# Glint dataset
distill glint_traces.jsonl --format openai_chat -o glint_openai.jsonl

# armand0e dataset
distill armand0e_data.jsonl --input-format armand0e --format alpaca -o armand0e_alpaca.jsonl

# vfable dataset
distill vfable_traces.jsonl --input-format vfable --format sharegpt -o vfable_sharegpt.jsonl

# opencoven dataset
distill opencoven_pairs.jsonl --input-format opencoven --format openai_chat -o opencoven_openai.jsonl

# victor dataset
distill victor_pairs.jsonl --input-format victor --format conversation -o victor_conv.jsonl

Programming API

from trajectory_distiller import Distiller, FormatConverter, TraceFilter, DataSplitter

# Distill traces
distiller = Distiller()
records = distiller.distill("traces.jsonl", output_format="openai_chat")

# Filter traces
trace_filter = TraceFilter()
filtered = trace_filter.filter_by_tool(records, tools=["bash", "edit"])
filtered = trace_filter.filter_by_quality(filtered, min_quality_score=0.5)

# Convert formats
converter = FormatConverter()
alpaca_records = converter.to_alpaca(records)
sharegpt_records = converter.to_sharegpt(records)

# Split data
splitter = DataSplitter()
splits = splitter.split(records, train_ratio=0.95, stratify_by="tool")
splits.save("output/")
print(splits.stats())

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project Description
Anvil Self-verified coding agent
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error patterns)
FableForge-14B The fine-tuned 14B model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fableforge_trajectory_distiller-0.1.0.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fableforge_trajectory_distiller-0.1.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file fableforge_trajectory_distiller-0.1.0.tar.gz.

File metadata

File hashes

Hashes for fableforge_trajectory_distiller-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ca3316838baeaacde93907705c34fe17871645834e8f37207cfab88ec16700b7
MD5 38564a0f24122ba815e305a8e1271624
BLAKE2b-256 e3916ee71be53928bcc3aaa58e50fbbd8e30ba725e06fe19587428680e4364f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_trajectory_distiller-0.1.0.tar.gz:

Publisher: release.yml on KingLabsA/trajectory-distiller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fableforge_trajectory_distiller-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fableforge_trajectory_distiller-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91b856a30df8df0836c202ec491f5c186e8e67de18749719f2dc29f393ff0001
MD5 b5b04d372a66302391533991e2276383
BLAKE2b-256 78e623b1b4570a9f7f02ea03a1c471b2381ec82df738af2d7519a02421634b5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fableforge_trajectory_distiller-0.1.0-py3-none-any.whl:

Publisher: release.yml on KingLabsA/trajectory-distiller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page