Load, preprocess, and manage the Fable5 agent trace datasets
Project description
Fable5 Dataset
Load, preprocess, and manage the Fable5 agent trace datasets for fine-tuning and evaluation.
Installation
pip install fable5-dataset
Dataset Sources
| Source | Format | Description |
|---|---|---|
| Glint | Session-based with turns | Full agent sessions with tool use |
| armand0e | Conversation with tool_calls | Multi-turn conversations with function calling |
| vfable | Trajectory with tool_use | Agent trajectories with sequential tool use |
| Coding Excellence | Session-based with quality scores | High-quality coding sessions rated by experts |
| OpenCoven | Source/target pairs | Instruction-following input/output pairs |
| Victor | Prompt/response pairs | Single-turn coding instruction pairs |
Quick Start
Load Datasets
# Load the Glint dataset
fable5 load glint
# Load all datasets with PII removal
fable5 load all --remove-pii
# Load with quality filter
fable5 load coding_excellence --min-quality 0.8 -o filtered.jsonl
View Statistics
# View stats for a specific dataset
fable5 stats --source glint
# View stats from a local file
fable5 stats traces.jsonl
# Compare all datasets
fable5 stats --source all
Convert Formats
# Convert to OpenAI chat format
fable5 convert traces.jsonl --format openai_chat -o train.jsonl
# Convert to Alpaca format
fable5 convert traces.jsonl --format alpaca -o alpaca.jsonl
Generate Benchmarks
# Generate 50 benchmark tasks from Glint
fable5 benchmark --source glint --num-tasks 50
# Generate category-specific benchmarks
fable5 benchmark --source coding_excellence --categories debugging implementation -o bench.jsonl
Split Data
# Split into 95/5 train/val
fable5 split traces.jsonl --train-ratio 0.95 --val-ratio 0.05
# Stratified split by tool distribution
fable5 split traces.jsonl --stratify-by tool --output-dir splits/
Programming API
from fable5_dataset import DatasetLoader, Preprocessor, BenchmarkGenerator, DatasetStats
# Load datasets
loader = DatasetLoader()
records = loader.load_dataset("glint", normalize=True, remove_pii=True)
all_data = loader.load_dataset("all")
# Preprocess
preprocessor = Preprocessor()
normalized = preprocessor.normalize_format(records, source_format="glint")
cleaned = preprocessor.remove_pii(normalized)
filtered = preprocessor.filter_quality(cleaned, min_quality=0.7)
# Statistics
stats = DatasetStats()
result = stats.compute_stats(records)
print(result.summary())
print(result.to_dict())
# Benchmark generation
gen = BenchmarkGenerator()
tasks = gen.generate_benchmark(records, num_tasks=50, categories=["debugging", "implementation"])
gen.save_benchmark(tasks, "benchmark.jsonl")
# Compare datasets
comparisons = stats.compare_datasets(all_data)
for name, ds_stats in comparisons.items():
print(f"{name}: {ds_stats.total_rows} records, {ds_stats.avg_turns_per_session:.1f} avg turns")
License
MIT
Ecosystem
Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:
| Project | Description |
|---|---|
| Anvil | Self-verified coding agent |
| VerifyLoop | Plan→Execute→Verify→Recover framework |
| ErrorRecovery | Self-healing middleware (3,725 error patterns) |
| FableForge-14B | The fine-tuned 14B model (4-stage training) |
| ShellWhisperer | 1.5B edge agent (phone/RPi, 50ms) |
| ReasonCritic | Verification model (130 benchmark tasks) |
| TraceCompiler | Compile traces → LoRA skills |
| AgentRuntime | Persistent agent daemon (systemd for AI) |
| AgentSwarm | Multi-agent from real trace transitions |
| AgentTelemetry | Datadog for agents (token tracking, costs) |
| BenchAgent | HumanEval for tool-use (107 tasks) |
| AgentDev | VSCode extension with verification |
| TraceViz | Trace replay visualizer (Next.js) |
| AgentSkills | npm for agent behaviors |
| AgentCurriculum | 5-stage progressive training |
| AgentFuzzer | Adversarial testing for agents |
| AgentConstitution | Safety guardrails from traces |
| CostOptimizer | Token cost reduction (50-80%) |
| AgentProfiler | Behavioral fingerprinting |
| TrajectoryDistiller | Trace→training data pipeline |
| Fable5-Dataset | HuggingFace dataset release |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fable5_dataset-0.1.0.tar.gz.
File metadata
- Download URL: fable5_dataset-0.1.0.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7018d535941710dda120b45644a8bb9d0f1166e64d6cf2f639ffcc7f1f4a0f63
|
|
| MD5 |
539f2a78e57968f77bf67084732d2d76
|
|
| BLAKE2b-256 |
bf2e995af58d705042dece80fac1f29d1b86fabbc52393eaaa0da6be8169d926
|
Provenance
The following attestation bundles were made for fable5_dataset-0.1.0.tar.gz:
Publisher:
release.yml on KingLabsA/fable5-dataset
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fable5_dataset-0.1.0.tar.gz -
Subject digest:
7018d535941710dda120b45644a8bb9d0f1166e64d6cf2f639ffcc7f1f4a0f63 - Sigstore transparency entry: 1820003299
- Sigstore integration time:
-
Permalink:
KingLabsA/fable5-dataset@92dafc6bdadb24ae369c2ddfb645098643b93b26 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KingLabsA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@92dafc6bdadb24ae369c2ddfb645098643b93b26 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fable5_dataset-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fable5_dataset-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6429803081295edbb74b17668799a8a8198e885c84e76ad9134b8d7b07ff966a
|
|
| MD5 |
d4148f7372e3c109dc50f99a4f62d89b
|
|
| BLAKE2b-256 |
ebab8f72b86ef2e0877ca94e065ed1167b6797093ce109775f8c2792cfe96044
|
Provenance
The following attestation bundles were made for fable5_dataset-0.1.0-py3-none-any.whl:
Publisher:
release.yml on KingLabsA/fable5-dataset
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fable5_dataset-0.1.0-py3-none-any.whl -
Subject digest:
6429803081295edbb74b17668799a8a8198e885c84e76ad9134b8d7b07ff966a - Sigstore transparency entry: 1820003325
- Sigstore integration time:
-
Permalink:
KingLabsA/fable5-dataset@92dafc6bdadb24ae369c2ddfb645098643b93b26 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KingLabsA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@92dafc6bdadb24ae369c2ddfb645098643b93b26 -
Trigger Event:
push
-
Statement type: