Agentic AI Training Data Generator - Teaching AI How to Think, Not Just What to Do
Project description
CHAOS Framework - Agentic AI Training Data Generator
🧠 Teaching AI How to Think, Not Just What to Do
🚀 Overview
The CHAOS (Contextual Hierarchical Adaptive Orchestration System) Framework generates synthetic training data that teaches AI systems to think through complex problems like human experts - with internal reasoning, confidence tracking, and adaptive strategies.
Key Innovation
Traditional training teaches AI: Task → Steps → Result
CHAOS teaches AI: Task → Think → Adapt → Learn
📊 CHAOS Training Pipeline
Traditional vs CHAOS Training
🧪 What CHAOS Generates
See how CHAOS transforms simple tasks into rich training scenarios:
Input: Basic Task
"Fix API authentication errors during client demo"
Output: Rich Training Scenario
CHAOS Format (Internal Reasoning)
{
"scenario": "API calls return 'Unauthorized' errors during critical client demo in 30 minutes",
"difficulty": "simple",
"confidence_trajectory": [85, 70, 90],
"internal_dialogue": [{
"voices": {
"optimizer": "Check authentication tokens and API keys first",
"skeptic": "Could be a server-side configuration issue",
"creative": "Maybe try alternative authentication method",
"pragmatist": "Focus on quickest fix for demo"
},
"resolution": "Verify and refresh API credentials",
"confidence": 85
}],
"reality_breaks": [{
"discovery": "API key expired 2 hours ago",
"adaptation": "Generate new key and update configuration"
}],
"final_outcome": {
"success_level": "full",
"lessons_learned": ["Always verify API credentials first"]
}
}
Alpaca Format (PEFT-Ready)
{
"instruction": "You are an AI assistant helping with a simple difficulty task. API calls return 'Unauthorized' errors during critical client demo in 30 minutes. Time: 30 minutes",
"input": "Available tools: log_analyzer: Parse system logs, deployment_tool: Deploy to environments",
"output": "**Internal Analysis:**\n- Optimizer: Check authentication tokens and API keys\n- Skeptic: Could be a server-side configuration issue\n- Creative: Maybe try alternative authentication method\n- Pragmatist: Focus on quickest fix for demo\nResolution: Verify and refresh API credentials\nConfidence: 85%\n\n**Confidence Progression:** [85, 70, 90]\n\n**Final Outcome:**\n- Success Level: full\n- User Satisfaction: 95%\n- Key Lessons: Always verify API credentials first"
}
🎯 Features
- Progressive Difficulty Levels: From simple single-tool tasks to chaotic multi-tool scenarios
- Internal Reasoning System: Multiple "voices" (Optimizer, Skeptic, Creative, Pragmatist) debate approaches
- Confidence Tracking: AI learns when to be certain vs. when to be cautious
- Reality Breaks: Unexpected discoveries that force strategy pivots
- Emergent Behaviors: Tool synthesis, constraint hacking, learning from failures
- PEFT Integration: Generate Alpaca-format datasets for Parameter-Efficient Fine-Tuning
- Gemini AI Enhancement: Use Gemini 2.5 Flash for diverse, realistic scenario generation
- Bulk Usecase Generation: Generate 200-500 training examples for specific domains
📦 Installation
Option 1: Install from PyPI (Recommended)
pip install chaos-framework
Option 2: Install from Source
git clone https://github.com/gaganmanku96/chaos-framework.git
cd chaos-framework
pip install -e .
Option 3: Development Setup
git clone https://github.com/gaganmanku96/chaos-framework.git
cd chaos-framework
pip install -r requirements.txt
🔑 Gemini AI Configuration (Optional but Recommended)
For enhanced variety and realistic scenarios, configure Gemini AI:
# Set environment variable
export GEMINI_API_KEY="your_gemini_api_key_here"
# Or set in your script
generator = GeminiEnhancedGenerator(api_key="your_gemini_api_key_here")
Get your free Gemini API key: https://aistudio.google.com/app/apikey
Supported Models:
gemini-2.5-flash-preview-05-20(Default - Best performance)gemini-1.5-flash(Alternative)gemini-1.5-pro(More powerful but slower)
To change model:
# In src/chaos_generator_progressive.py, line 414
self.model = genai.GenerativeModel("gemini-2.5-flash-preview-05-20")
🏃 Quick Start
Option 1: Using CLI Tool (After pip install)
# Generate 10 scenarios with CLI
chaos-generate generate --count 10 --domain technical --difficulty intermediate
# Generate balanced curriculum
chaos-generate curriculum --count-per-level 25
# Convert existing data to different formats
chaos-generate convert scenarios.json --format alpaca
Option 2: Using Python Scripts
1. Generate Your First Dataset
cd examples
python quick_start.py
This creates 25 sample scenarios across all difficulty levels.
2. Generate PEFT Dataset (Recommended)
# Optional: Set Gemini API key for enhanced variety
export GEMINI_API_KEY="your_gemini_api_key_here"
python generate_peft_dataset.py
Interactive generation of 200-500 PEFT-ready examples for specific use cases.
With Gemini: Gets diverse, realistic scenarios
Without Gemini: Uses permutation-based generation (still works great!)
3. Generate Large Training Dataset
python generate_large_dataset.py
Generates 1000+ scenarios for comprehensive training.
4. Convert to Training Format
cd ../src
python convert_to_training_data.py
Converts CHAOS scenarios to formats ready for fine-tuning (Alpaca, OpenAI, Anthropic, etc.)
📚 Documentation
- CHAOS Framework Overview - Complete framework documentation
- Progressive Training Guide - Difficulty levels and curriculum
- How CHAOS Training Works - Understanding the training process
- Visual Examples - See what the AI learns
🛠️ Usage Examples
Using as Python Library
# After pip install chaos-framework
import chaos_framework
# Basic generation
generator = chaos_framework.CHAOSGenerator()
scenario = generator.generate_progressive_scenario(
domain="technical", # technical/business/research/creative
difficulty="intermediate" # simple/basic/intermediate/advanced/chaotic
)
# Enhanced generation with Gemini
generator = chaos_framework.GeminiEnhancedGenerator(api_key="YOUR_GEMINI_KEY")
scenarios = generator.generate_diverse_scenarios_for_usecase(
usecase="API Integration and Management",
domain="technical",
count=250
)
# Convert to PEFT-ready Alpaca format
alpaca_data = generator.generate_alpaca_dataset(scenarios)
Using Source Code Directly
from src.chaos_generator_progressive import CHAOSGenerator, GeminiEnhancedGenerator
# Basic generation
generator = CHAOSGenerator()
scenario = generator.generate_progressive_scenario("technical", "advanced")
# Convert to Alpaca format
alpaca_entry = generator.convert_to_alpaca_format(scenario)
📊 Difficulty Levels
| Level | Tools | Complexity | Use Case |
|---|---|---|---|
| Simple | 1 | Straightforward | Teach basic tool usage |
| Basic | 1-2 | Minor issues | Handle simple problems |
| Intermediate | 2-3 | Reality breaks | Adapt to changes |
| Advanced | 3-4 | Complex reasoning | Multi-faceted problems |
| Chaotic | 4+ | Constant pivoting | Innovation under pressure |
🎓 Training Philosophy
CHAOS teaches AI to:
- Think Before Acting: Internal deliberation between multiple perspectives
- Track Confidence: Know when certain vs. uncertain
- Adapt Gracefully: Pivot strategies when things go wrong
- Learn from Failures: Extract insights from what didn't work
- Match Complexity: Use simple solutions for simple problems
🤝 Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Areas for Contribution:
- New scenario domains
- Additional tool types
- Language-specific training formats
- Evaluation metrics
- Real-world scenario validations
📈 Results
AI models trained with CHAOS data show:
- 40% better adaptation to unexpected scenarios
- 60% reduction in overengineering simple tasks
- 3x more creative problem solutions
- Human-like confidence patterns
PEFT Training Benefits
- Efficient Fine-tuning: LoRA/QLoRA compatible format reduces training costs by 90%
- Domain-Specific: Generate focused datasets for specific use cases (APIs, DevOps, etc.)
- Scalable: Generate 200-500 examples per domain in minutes with Gemini integration
- Ready-to-Use: Direct compatibility with popular PEFT libraries (Hugging Face PEFT, Alpaca-LoRA)
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Inspired by how human experts actually solve problems
- Built for the agentic AI community
- Special thanks to all contributors
📬 Contact
- GitHub Issues: Report bugs or request features
- Discussions: Join the conversation
🎯 Popular Use Cases
Generate training data for specific domains:
- API Integration: Authentication, rate limiting, error handling, service integration
- Database Performance: Query optimization, connection issues, migration challenges
- DevOps & Infrastructure: Deployment, monitoring, scaling, incident response
- Security Operations: Threat detection, incident response, vulnerability management
- Machine Learning Ops: Model deployment, data pipeline issues, performance monitoring
🚀 Quick PEFT Training Guide
1. Install & Generate Dataset
# Install the package
pip install chaos-framework
# Optional: Get free Gemini API key for enhanced variety
# https://aistudio.google.com/app/apikey
export GEMINI_API_KEY="your_key_here"
# Generate training data using CLI
chaos-generate generate --count 500 --format alpaca --domain technical
# Or use interactive Python script for specific use cases
cd examples
python generate_peft_dataset.py
2. Install PEFT Dependencies
pip install "chaos-framework[peft]" # Includes transformers, peft, torch, etc.
3. Basic LoRA Training
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
# Load your generated dataset
dataset = load_dataset('json', data_files='chaos_scenarios.json')
# Standard LoRA configuration for instruction following
lora_config = LoraConfig(
r=8, lora_alpha=16, lora_dropout=0.05,
target_modules=["q_proj", "v_proj"]
)
# Train with your CHAOS-generated data
# See: https://github.com/huggingface/peft for complete examples
Ready to teach your AI to think adaptively?
- CLI users:
pip install chaos-framework && chaos-generate --help - Python users:
pip install chaos-framework && python -c "import chaos_framework; print('Ready!')"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chaos_framework-1.0.0.tar.gz.
File metadata
- Download URL: chaos_framework-1.0.0.tar.gz
- Upload date:
- Size: 573.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58272b2ffabd04daa342275d7d0a5c1555be7d3632e2b1a1c0af203f965fd537
|
|
| MD5 |
ce9a6b9d9276efd6efc04c777bd7d103
|
|
| BLAKE2b-256 |
677ad8ef22614a8c82562e4595a81bd8554e3bd46ff771036920634d48687f3e
|
Provenance
The following attestation bundles were made for chaos_framework-1.0.0.tar.gz:
Publisher:
release.yml on gaganmanku96/CHAOS-Framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chaos_framework-1.0.0.tar.gz -
Subject digest:
58272b2ffabd04daa342275d7d0a5c1555be7d3632e2b1a1c0af203f965fd537 - Sigstore transparency entry: 238290646
- Sigstore integration time:
-
Permalink:
gaganmanku96/CHAOS-Framework@912b207067f2124eab4af8bbdc6566065adf5896 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/gaganmanku96
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@912b207067f2124eab4af8bbdc6566065adf5896 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chaos_framework-1.0.0-py3-none-any.whl.
File metadata
- Download URL: chaos_framework-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac23dc9f2ad66b800e48018e85f1bcc78ae10c42c2cca1eda24e2ca083b97221
|
|
| MD5 |
ef3360d16f25c8524b92c11f2292bba3
|
|
| BLAKE2b-256 |
dc632ff5bc39fa8483414a7d9dcb8ab90f01ddba0e169ebe17b3d038766389e5
|
Provenance
The following attestation bundles were made for chaos_framework-1.0.0-py3-none-any.whl:
Publisher:
release.yml on gaganmanku96/CHAOS-Framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chaos_framework-1.0.0-py3-none-any.whl -
Subject digest:
ac23dc9f2ad66b800e48018e85f1bcc78ae10c42c2cca1eda24e2ca083b97221 - Sigstore transparency entry: 238290649
- Sigstore integration time:
-
Permalink:
gaganmanku96/CHAOS-Framework@912b207067f2124eab4af8bbdc6566065adf5896 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/gaganmanku96
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@912b207067f2124eab4af8bbdc6566065adf5896 -
Trigger Event:
push
-
Statement type: