Load, preprocess, and manage the Fable5 agent trace datasets

Project description

Fable5 Dataset

Load, preprocess, and manage the Fable5 agent trace datasets for fine-tuning and evaluation.

Installation

pip install fable5-dataset

Dataset Sources

Source	Format	Description
Glint	Session-based with turns	Full agent sessions with tool use
armand0e	Conversation with tool_calls	Multi-turn conversations with function calling
vfable	Trajectory with tool_use	Agent trajectories with sequential tool use
Coding Excellence	Session-based with quality scores	High-quality coding sessions rated by experts
OpenCoven	Source/target pairs	Instruction-following input/output pairs
Victor	Prompt/response pairs	Single-turn coding instruction pairs

Quick Start

Load Datasets

# Load the Glint dataset
fable5 load glint

# Load all datasets with PII removal
fable5 load all --remove-pii

# Load with quality filter
fable5 load coding_excellence --min-quality 0.8 -o filtered.jsonl

View Statistics

# View stats for a specific dataset
fable5 stats --source glint

# View stats from a local file
fable5 stats traces.jsonl

# Compare all datasets
fable5 stats --source all

Convert Formats

# Convert to OpenAI chat format
fable5 convert traces.jsonl --format openai_chat -o train.jsonl

# Convert to Alpaca format
fable5 convert traces.jsonl --format alpaca -o alpaca.jsonl

Generate Benchmarks

# Generate 50 benchmark tasks from Glint
fable5 benchmark --source glint --num-tasks 50

# Generate category-specific benchmarks
fable5 benchmark --source coding_excellence --categories debugging implementation -o bench.jsonl

Split Data

# Split into 95/5 train/val
fable5 split traces.jsonl --train-ratio 0.95 --val-ratio 0.05

# Stratified split by tool distribution
fable5 split traces.jsonl --stratify-by tool --output-dir splits/

Programming API

from fable5_dataset import DatasetLoader, Preprocessor, BenchmarkGenerator, DatasetStats

# Load datasets
loader = DatasetLoader()
records = loader.load_dataset("glint", normalize=True, remove_pii=True)
all_data = loader.load_dataset("all")

# Preprocess
preprocessor = Preprocessor()
normalized = preprocessor.normalize_format(records, source_format="glint")
cleaned = preprocessor.remove_pii(normalized)
filtered = preprocessor.filter_quality(cleaned, min_quality=0.7)

# Statistics
stats = DatasetStats()
result = stats.compute_stats(records)
print(result.summary())
print(result.to_dict())

# Benchmark generation
gen = BenchmarkGenerator()
tasks = gen.generate_benchmark(records, num_tasks=50, categories=["debugging", "implementation"])
gen.save_benchmark(tasks, "benchmark.jsonl")

# Compare datasets
comparisons = stats.compare_datasets(all_data)
for name, ds_stats in comparisons.items():
    print(f"{name}: {ds_stats.total_rows} records, {ds_stats.avg_turns_per_session:.1f} avg turns")

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project	Description
Anvil	Self-verified coding agent
VerifyLoop	Plan→Execute→Verify→Recover framework
ErrorRecovery	Self-healing middleware (3,725 error patterns)
FableForge-14B	The fine-tuned 14B model (4-stage training)
ShellWhisperer	1.5B edge agent (phone/RPi, 50ms)
ReasonCritic	Verification model (130 benchmark tasks)
TraceCompiler	Compile traces → LoRA skills
AgentRuntime	Persistent agent daemon (systemd for AI)
AgentSwarm	Multi-agent from real trace transitions
AgentTelemetry	Datadog for agents (token tracking, costs)
BenchAgent	HumanEval for tool-use (107 tasks)
AgentDev	VSCode extension with verification
TraceViz	Trace replay visualizer (Next.js)
AgentSkills	npm for agent behaviors
AgentCurriculum	5-stage progressive training
AgentFuzzer	Adversarial testing for agents
AgentConstitution	Safety guardrails from traces
CostOptimizer	Token cost reduction (50-80%)
AgentProfiler	Behavioral fingerprinting
TrajectoryDistiller	Trace→training data pipeline
Fable5-Dataset	HuggingFace dataset release

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fable5_dataset-0.1.0.tar.gz (19.0 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fable5_dataset-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file fable5_dataset-0.1.0.tar.gz.

File metadata

Download URL: fable5_dataset-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 19.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fable5_dataset-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7018d535941710dda120b45644a8bb9d0f1166e64d6cf2f639ffcc7f1f4a0f63`
MD5	`539f2a78e57968f77bf67084732d2d76`
BLAKE2b-256	`bf2e995af58d705042dece80fac1f29d1b86fabbc52393eaaa0da6be8169d926`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fable5_dataset-0.1.0.tar.gz:

Publisher: release.yml on KingLabsA/fable5-dataset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fable5_dataset-0.1.0.tar.gz
- Subject digest: 7018d535941710dda120b45644a8bb9d0f1166e64d6cf2f639ffcc7f1f4a0f63
- Sigstore transparency entry: 1820003299
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: KingLabsA/fable5-dataset@92dafc6bdadb24ae369c2ddfb645098643b93b26
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/KingLabsA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@92dafc6bdadb24ae369c2ddfb645098643b93b26
- Trigger Event: push

File details

Details for the file fable5_dataset-0.1.0-py3-none-any.whl.

File metadata

Download URL: fable5_dataset-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fable5_dataset-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6429803081295edbb74b17668799a8a8198e885c84e76ad9134b8d7b07ff966a`
MD5	`d4148f7372e3c109dc50f99a4f62d89b`
BLAKE2b-256	`ebab8f72b86ef2e0877ca94e065ed1167b6797093ce109775f8c2792cfe96044`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fable5_dataset-0.1.0-py3-none-any.whl:

Publisher: release.yml on KingLabsA/fable5-dataset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fable5_dataset-0.1.0-py3-none-any.whl
- Subject digest: 6429803081295edbb74b17668799a8a8198e885c84e76ad9134b8d7b07ff966a
- Sigstore transparency entry: 1820003325
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: KingLabsA/fable5-dataset@92dafc6bdadb24ae369c2ddfb645098643b93b26
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/KingLabsA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@92dafc6bdadb24ae369c2ddfb645098643b93b26
- Trigger Event: push

fable5-dataset 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Fable5 Dataset

Installation

Dataset Sources

Quick Start

Load Datasets

View Statistics

Convert Formats

Generate Benchmarks

Split Data

Programming API

License

Ecosystem

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance