Skip to main content

Agentic AI Training Data Generator - Teaching AI How to Think, Not Just What to Do

Project description

CHAOS Framework - Agentic AI Training Data Generator

🧠 Teaching AI How to Think, Not Just What to Do

Python License: MIT

🚀 Overview

The CHAOS (Contextual Hierarchical Adaptive Orchestration System) Framework generates synthetic training data that teaches AI systems to think through complex problems like human experts - with internal reasoning, confidence tracking, and adaptive strategies.

Key Innovation

Traditional training teaches AI: Task → Steps → Result

CHAOS teaches AI: Task → Think → Adapt → Learn

📊 CHAOS Training Pipeline

CHAOS Training Pipeline

Traditional vs CHAOS Training

Traditional vs CHAOS

🧪 What CHAOS Generates

See how CHAOS transforms simple tasks into rich training scenarios:

Input: Basic Task

"Fix API authentication errors during client demo"

Output: Rich Training Scenario

CHAOS Format (Internal Reasoning)

{
  "scenario": "API calls return 'Unauthorized' errors during critical client demo in 30 minutes",
  "difficulty": "simple",
  "confidence_trajectory": [85, 70, 90],
  "internal_dialogue": [{
    "voices": {
      "optimizer": "Check authentication tokens and API keys first",
      "skeptic": "Could be a server-side configuration issue", 
      "creative": "Maybe try alternative authentication method",
      "pragmatist": "Focus on quickest fix for demo"
    },
    "resolution": "Verify and refresh API credentials",
    "confidence": 85
  }],
  "reality_breaks": [{
    "discovery": "API key expired 2 hours ago",
    "adaptation": "Generate new key and update configuration"
  }],
  "final_outcome": {
    "success_level": "full",
    "lessons_learned": ["Always verify API credentials first"]
  }
}

Alpaca Format (PEFT-Ready)

{
  "instruction": "You are an AI assistant helping with a simple difficulty task. API calls return 'Unauthorized' errors during critical client demo in 30 minutes. Time: 30 minutes",
  "input": "Available tools: log_analyzer: Parse system logs, deployment_tool: Deploy to environments", 
  "output": "**Internal Analysis:**\n- Optimizer: Check authentication tokens and API keys\n- Skeptic: Could be a server-side configuration issue\n- Creative: Maybe try alternative authentication method\n- Pragmatist: Focus on quickest fix for demo\nResolution: Verify and refresh API credentials\nConfidence: 85%\n\n**Confidence Progression:** [85, 70, 90]\n\n**Final Outcome:**\n- Success Level: full\n- User Satisfaction: 95%\n- Key Lessons: Always verify API credentials first"
}

🎯 Features

  • Progressive Difficulty Levels: From simple single-tool tasks to chaotic multi-tool scenarios
  • Internal Reasoning System: Multiple "voices" (Optimizer, Skeptic, Creative, Pragmatist) debate approaches
  • Confidence Tracking: AI learns when to be certain vs. when to be cautious
  • Reality Breaks: Unexpected discoveries that force strategy pivots
  • Emergent Behaviors: Tool synthesis, constraint hacking, learning from failures
  • PEFT Integration: Generate Alpaca-format datasets for Parameter-Efficient Fine-Tuning
  • Gemini AI Enhancement: Use Gemini 2.5 Flash for diverse, realistic scenario generation
  • Bulk Usecase Generation: Generate 200-500 training examples for specific domains

📦 Installation

Option 1: Install from PyPI (Recommended)

pip install chaos-framework

Option 2: Install from Source

git clone https://github.com/gaganmanku96/chaos-framework.git
cd chaos-framework
pip install -e .

Option 3: Development Setup

git clone https://github.com/gaganmanku96/chaos-framework.git
cd chaos-framework
pip install -r requirements.txt

🔑 Gemini AI Configuration (Optional but Recommended)

For enhanced variety and realistic scenarios, configure Gemini AI:

# Set environment variable
export GEMINI_API_KEY="your_gemini_api_key_here"

# Or set in your script
generator = GeminiEnhancedGenerator(api_key="your_gemini_api_key_here")

Get your free Gemini API key: https://aistudio.google.com/app/apikey

Supported Models:

  • gemini-2.5-flash-preview-05-20 (Default - Best performance)
  • gemini-1.5-flash (Alternative)
  • gemini-1.5-pro (More powerful but slower)

To change model:

# In src/chaos_generator_progressive.py, line 414
self.model = genai.GenerativeModel("gemini-2.5-flash-preview-05-20")

🏃 Quick Start

Option 1: Using CLI Tool (After pip install)

# Generate 10 scenarios with CLI
chaos-generate generate --count 10 --domain technical --difficulty intermediate

# Generate balanced curriculum
chaos-generate curriculum --count-per-level 25

# Convert existing data to different formats
chaos-generate convert scenarios.json --format alpaca

Option 2: Using Python Scripts

1. Generate Your First Dataset

cd examples
python quick_start.py

This creates 25 sample scenarios across all difficulty levels.

2. Generate PEFT Dataset (Recommended)

# Optional: Set Gemini API key for enhanced variety
export GEMINI_API_KEY="your_gemini_api_key_here"

python generate_peft_dataset.py

Interactive generation of 200-500 PEFT-ready examples for specific use cases.
With Gemini: Gets diverse, realistic scenarios
Without Gemini: Uses permutation-based generation (still works great!)

3. Generate Large Training Dataset

python generate_large_dataset.py

Generates 1000+ scenarios for comprehensive training.

4. Convert to Training Format

cd ../src
python convert_to_training_data.py

Converts CHAOS scenarios to formats ready for fine-tuning (Alpaca, OpenAI, Anthropic, etc.)

📚 Documentation

🛠️ Usage Examples

Using as Python Library

# After pip install chaos-framework
import chaos_framework

# Basic generation
generator = chaos_framework.CHAOSGenerator()
scenario = generator.generate_progressive_scenario(
    domain="technical",      # technical/business/research/creative
    difficulty="intermediate"  # simple/basic/intermediate/advanced/chaotic
)

# Enhanced generation with Gemini
generator = chaos_framework.GeminiEnhancedGenerator(api_key="YOUR_GEMINI_KEY")
scenarios = generator.generate_diverse_scenarios_for_usecase(
    usecase="API Integration and Management",
    domain="technical",
    count=250
)

# Convert to PEFT-ready Alpaca format
alpaca_data = generator.generate_alpaca_dataset(scenarios)

Using Source Code Directly

from src.chaos_generator_progressive import CHAOSGenerator, GeminiEnhancedGenerator

# Basic generation
generator = CHAOSGenerator()
scenario = generator.generate_progressive_scenario("technical", "advanced")

# Convert to Alpaca format
alpaca_entry = generator.convert_to_alpaca_format(scenario)

📊 Difficulty Levels

Level Tools Complexity Use Case
Simple 1 Straightforward Teach basic tool usage
Basic 1-2 Minor issues Handle simple problems
Intermediate 2-3 Reality breaks Adapt to changes
Advanced 3-4 Complex reasoning Multi-faceted problems
Chaotic 4+ Constant pivoting Innovation under pressure

🎓 Training Philosophy

CHAOS teaches AI to:

  1. Think Before Acting: Internal deliberation between multiple perspectives
  2. Track Confidence: Know when certain vs. uncertain
  3. Adapt Gracefully: Pivot strategies when things go wrong
  4. Learn from Failures: Extract insights from what didn't work
  5. Match Complexity: Use simple solutions for simple problems

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Areas for Contribution:

  • New scenario domains
  • Additional tool types
  • Language-specific training formats
  • Evaluation metrics
  • Real-world scenario validations

📈 Results

AI models trained with CHAOS data show:

  • 40% better adaptation to unexpected scenarios
  • 60% reduction in overengineering simple tasks
  • 3x more creative problem solutions
  • Human-like confidence patterns

PEFT Training Benefits

  • Efficient Fine-tuning: LoRA/QLoRA compatible format reduces training costs by 90%
  • Domain-Specific: Generate focused datasets for specific use cases (APIs, DevOps, etc.)
  • Scalable: Generate 200-500 examples per domain in minutes with Gemini integration
  • Ready-to-Use: Direct compatibility with popular PEFT libraries (Hugging Face PEFT, Alpaca-LoRA)

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

  • Inspired by how human experts actually solve problems
  • Built for the agentic AI community
  • Special thanks to all contributors

📬 Contact


🎯 Popular Use Cases

Generate training data for specific domains:

  • API Integration: Authentication, rate limiting, error handling, service integration
  • Database Performance: Query optimization, connection issues, migration challenges
  • DevOps & Infrastructure: Deployment, monitoring, scaling, incident response
  • Security Operations: Threat detection, incident response, vulnerability management
  • Machine Learning Ops: Model deployment, data pipeline issues, performance monitoring

🚀 Quick PEFT Training Guide

1. Install & Generate Dataset

# Install the package
pip install chaos-framework

# Optional: Get free Gemini API key for enhanced variety
# https://aistudio.google.com/app/apikey
export GEMINI_API_KEY="your_key_here" 

# Generate training data using CLI
chaos-generate generate --count 500 --format alpaca --domain technical

# Or use interactive Python script for specific use cases
cd examples
python generate_peft_dataset.py

2. Install PEFT Dependencies

pip install "chaos-framework[peft]"  # Includes transformers, peft, torch, etc.

3. Basic LoRA Training

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

# Load your generated dataset
dataset = load_dataset('json', data_files='chaos_scenarios.json')

# Standard LoRA configuration for instruction following
lora_config = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"]
)

# Train with your CHAOS-generated data
# See: https://github.com/huggingface/peft for complete examples

Ready to teach your AI to think adaptively?

  • CLI users: pip install chaos-framework && chaos-generate --help
  • Python users: pip install chaos-framework && python -c "import chaos_framework; print('Ready!')"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chaos_framework-1.0.0.tar.gz (573.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chaos_framework-1.0.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file chaos_framework-1.0.0.tar.gz.

File metadata

  • Download URL: chaos_framework-1.0.0.tar.gz
  • Upload date:
  • Size: 573.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chaos_framework-1.0.0.tar.gz
Algorithm Hash digest
SHA256 58272b2ffabd04daa342275d7d0a5c1555be7d3632e2b1a1c0af203f965fd537
MD5 ce9a6b9d9276efd6efc04c777bd7d103
BLAKE2b-256 677ad8ef22614a8c82562e4595a81bd8554e3bd46ff771036920634d48687f3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for chaos_framework-1.0.0.tar.gz:

Publisher: release.yml on gaganmanku96/CHAOS-Framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chaos_framework-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chaos_framework-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac23dc9f2ad66b800e48018e85f1bcc78ae10c42c2cca1eda24e2ca083b97221
MD5 ef3360d16f25c8524b92c11f2292bba3
BLAKE2b-256 dc632ff5bc39fa8483414a7d9dcb8ab90f01ddba0e169ebe17b3d038766389e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for chaos_framework-1.0.0-py3-none-any.whl:

Publisher: release.yml on gaganmanku96/CHAOS-Framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page