Skip to main content

Load, preprocess, and manage the Fable5 agent trace datasets

Project description

Fable5 Dataset

License: MIT Python 3.10+ Tests

Load, preprocess, and manage the Fable5 agent trace datasets for fine-tuning and evaluation.

Installation

pip install fable5-dataset

Dataset Sources

Source Format Description
Glint Session-based with turns Full agent sessions with tool use
armand0e Conversation with tool_calls Multi-turn conversations with function calling
vfable Trajectory with tool_use Agent trajectories with sequential tool use
Coding Excellence Session-based with quality scores High-quality coding sessions rated by experts
OpenCoven Source/target pairs Instruction-following input/output pairs
Victor Prompt/response pairs Single-turn coding instruction pairs

Quick Start

Load Datasets

# Load the Glint dataset
fable5 load glint

# Load all datasets with PII removal
fable5 load all --remove-pii

# Load with quality filter
fable5 load coding_excellence --min-quality 0.8 -o filtered.jsonl

View Statistics

# View stats for a specific dataset
fable5 stats --source glint

# View stats from a local file
fable5 stats traces.jsonl

# Compare all datasets
fable5 stats --source all

Convert Formats

# Convert to OpenAI chat format
fable5 convert traces.jsonl --format openai_chat -o train.jsonl

# Convert to Alpaca format
fable5 convert traces.jsonl --format alpaca -o alpaca.jsonl

Generate Benchmarks

# Generate 50 benchmark tasks from Glint
fable5 benchmark --source glint --num-tasks 50

# Generate category-specific benchmarks
fable5 benchmark --source coding_excellence --categories debugging implementation -o bench.jsonl

Split Data

# Split into 95/5 train/val
fable5 split traces.jsonl --train-ratio 0.95 --val-ratio 0.05

# Stratified split by tool distribution
fable5 split traces.jsonl --stratify-by tool --output-dir splits/

Programming API

from fable5_dataset import DatasetLoader, Preprocessor, BenchmarkGenerator, DatasetStats

# Load datasets
loader = DatasetLoader()
records = loader.load_dataset("glint", normalize=True, remove_pii=True)
all_data = loader.load_dataset("all")

# Preprocess
preprocessor = Preprocessor()
normalized = preprocessor.normalize_format(records, source_format="glint")
cleaned = preprocessor.remove_pii(normalized)
filtered = preprocessor.filter_quality(cleaned, min_quality=0.7)

# Statistics
stats = DatasetStats()
result = stats.compute_stats(records)
print(result.summary())
print(result.to_dict())

# Benchmark generation
gen = BenchmarkGenerator()
tasks = gen.generate_benchmark(records, num_tasks=50, categories=["debugging", "implementation"])
gen.save_benchmark(tasks, "benchmark.jsonl")

# Compare datasets
comparisons = stats.compare_datasets(all_data)
for name, ds_stats in comparisons.items():
    print(f"{name}: {ds_stats.total_rows} records, {ds_stats.avg_turns_per_session:.1f} avg turns")

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project Description
Anvil Self-verified coding agent
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error patterns)
FableForge-14B The fine-tuned 14B model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fable5_dataset-0.1.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fable5_dataset-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file fable5_dataset-0.1.0.tar.gz.

File metadata

  • Download URL: fable5_dataset-0.1.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fable5_dataset-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7018d535941710dda120b45644a8bb9d0f1166e64d6cf2f639ffcc7f1f4a0f63
MD5 539f2a78e57968f77bf67084732d2d76
BLAKE2b-256 bf2e995af58d705042dece80fac1f29d1b86fabbc52393eaaa0da6be8169d926

See more details on using hashes here.

Provenance

The following attestation bundles were made for fable5_dataset-0.1.0.tar.gz:

Publisher: release.yml on KingLabsA/fable5-dataset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fable5_dataset-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fable5_dataset-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fable5_dataset-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6429803081295edbb74b17668799a8a8198e885c84e76ad9134b8d7b07ff966a
MD5 d4148f7372e3c109dc50f99a4f62d89b
BLAKE2b-256 ebab8f72b86ef2e0877ca94e065ed1167b6797093ce109775f8c2792cfe96044

See more details on using hashes here.

Provenance

The following attestation bundles were made for fable5_dataset-0.1.0-py3-none-any.whl:

Publisher: release.yml on KingLabsA/fable5-dataset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page