Agentic AI Training Data Generator - Teaching AI How to Think, Not Just What to Do

These details have not been verified by PyPI

Project description

CHAOS Framework - Agentic AI Training Data Generator

🧠 Teaching AI How to Think, Not Just What to Do

🚀 Overview

The CHAOS (Contextual Hierarchical Adaptive Orchestration System) Framework generates synthetic training data that teaches AI systems to think through complex problems like human experts - with internal reasoning, confidence tracking, and adaptive strategies.

Key Innovation

Traditional training teaches AI: Task → Steps → Result

CHAOS teaches AI: Task → Think → Adapt → Learn

📊 CHAOS Training Pipeline

CHAOS Training Pipeline

Traditional vs CHAOS Training

Traditional vs CHAOS

🧪 What CHAOS Generates

See how CHAOS transforms simple tasks into rich training scenarios:

Input: Basic Task

"Fix API authentication errors during client demo"

Output: Rich Training Scenario

CHAOS Format (Internal Reasoning)

{
  "scenario": "API calls return 'Unauthorized' errors during critical client demo in 30 minutes",
  "difficulty": "simple",
  "confidence_trajectory": [85, 70, 90],
  "internal_dialogue": [{
    "voices": {
      "optimizer": "Check authentication tokens and API keys first",
      "skeptic": "Could be a server-side configuration issue", 
      "creative": "Maybe try alternative authentication method",
      "pragmatist": "Focus on quickest fix for demo"
    },
    "resolution": "Verify and refresh API credentials",
    "confidence": 85
  }],
  "reality_breaks": [{
    "discovery": "API key expired 2 hours ago",
    "adaptation": "Generate new key and update configuration"
  }],
  "final_outcome": {
    "success_level": "full",
    "lessons_learned": ["Always verify API credentials first"]
  }
}

Alpaca Format (PEFT-Ready)

{
  "instruction": "You are an AI assistant helping with a simple difficulty task. API calls return 'Unauthorized' errors during critical client demo in 30 minutes. Time: 30 minutes",
  "input": "Available tools: log_analyzer: Parse system logs, deployment_tool: Deploy to environments", 
  "output": "**Internal Analysis:**\n- Optimizer: Check authentication tokens and API keys\n- Skeptic: Could be a server-side configuration issue\n- Creative: Maybe try alternative authentication method\n- Pragmatist: Focus on quickest fix for demo\nResolution: Verify and refresh API credentials\nConfidence: 85%\n\n**Confidence Progression:** [85, 70, 90]\n\n**Final Outcome:**\n- Success Level: full\n- User Satisfaction: 95%\n- Key Lessons: Always verify API credentials first"
}

🎯 Features

Progressive Difficulty Levels: From simple single-tool tasks to chaotic multi-tool scenarios
Internal Reasoning System: Multiple "voices" (Optimizer, Skeptic, Creative, Pragmatist) debate approaches
Confidence Tracking: AI learns when to be certain vs. when to be cautious
Reality Breaks: Unexpected discoveries that force strategy pivots
Emergent Behaviors: Tool synthesis, constraint hacking, learning from failures
PEFT Integration: Generate Alpaca-format datasets for Parameter-Efficient Fine-Tuning
Gemini AI Enhancement: Use Gemini 2.5 Flash for diverse, realistic scenario generation
Bulk Usecase Generation: Generate 200-500 training examples for specific domains

📦 Installation

Option 1: Install from PyPI (Recommended)

pip install chaos-framework

Option 2: Install from Source

git clone https://github.com/gaganmanku96/chaos-framework.git
cd chaos-framework
pip install -e .

Option 3: Development Setup

git clone https://github.com/gaganmanku96/chaos-framework.git
cd chaos-framework
pip install -r requirements.txt

🔑 Gemini AI Configuration (Optional but Recommended)

For enhanced variety and realistic scenarios, configure Gemini AI:

# Set environment variable
export GEMINI_API_KEY="your_gemini_api_key_here"

# Or set in your script
generator = GeminiEnhancedGenerator(api_key="your_gemini_api_key_here")

Get your free Gemini API key: https://aistudio.google.com/app/apikey

Supported Models:

gemini-2.5-flash-preview-05-20 (Default - Best performance)
gemini-1.5-flash (Alternative)
gemini-1.5-pro (More powerful but slower)

To change model:

# In src/chaos_generator_progressive.py, line 414
self.model = genai.GenerativeModel("gemini-2.5-flash-preview-05-20")

🏃 Quick Start

Option 1: Using CLI Tool (After pip install)

# Generate 10 scenarios with CLI
chaos-generate generate --count 10 --domain technical --difficulty intermediate

# Generate balanced curriculum
chaos-generate curriculum --count-per-level 25

# Convert existing data to different formats
chaos-generate convert scenarios.json --format alpaca

Option 2: Using Python Scripts

1. Generate Your First Dataset

cd examples
python quick_start.py

This creates 25 sample scenarios across all difficulty levels.

2. Generate PEFT Dataset (Recommended)

# Optional: Set Gemini API key for enhanced variety
export GEMINI_API_KEY="your_gemini_api_key_here"

python generate_peft_dataset.py

Interactive generation of 200-500 PEFT-ready examples for specific use cases.
With Gemini: Gets diverse, realistic scenarios
Without Gemini: Uses permutation-based generation (still works great!)

3. Generate Large Training Dataset

python generate_large_dataset.py

Generates 1000+ scenarios for comprehensive training.

4. Convert to Training Format

cd ../src
python convert_to_training_data.py

Converts CHAOS scenarios to formats ready for fine-tuning (Alpaca, OpenAI, Anthropic, etc.)

📚 Documentation

CHAOS Framework Overview - Complete framework documentation
Progressive Training Guide - Difficulty levels and curriculum
How CHAOS Training Works - Understanding the training process
Visual Examples - See what the AI learns

🛠️ Usage Examples

Using as Python Library

# After pip install chaos-framework
import chaos_framework

# Basic generation
generator = chaos_framework.CHAOSGenerator()
scenario = generator.generate_progressive_scenario(
    domain="technical",      # technical/business/research/creative
    difficulty="intermediate"  # simple/basic/intermediate/advanced/chaotic
)

# Enhanced generation with Gemini
generator = chaos_framework.GeminiEnhancedGenerator(api_key="YOUR_GEMINI_KEY")
scenarios = generator.generate_diverse_scenarios_for_usecase(
    usecase="API Integration and Management",
    domain="technical",
    count=250
)

# Convert to PEFT-ready Alpaca format
alpaca_data = generator.generate_alpaca_dataset(scenarios)

Using Source Code Directly

from src.chaos_generator_progressive import CHAOSGenerator, GeminiEnhancedGenerator

# Basic generation
generator = CHAOSGenerator()
scenario = generator.generate_progressive_scenario("technical", "advanced")

# Convert to Alpaca format
alpaca_entry = generator.convert_to_alpaca_format(scenario)

📊 Difficulty Levels

Level	Tools	Complexity	Use Case
Simple	1	Straightforward	Teach basic tool usage
Basic	1-2	Minor issues	Handle simple problems
Intermediate	2-3	Reality breaks	Adapt to changes
Advanced	3-4	Complex reasoning	Multi-faceted problems
Chaotic	4+	Constant pivoting	Innovation under pressure

🎓 Training Philosophy

CHAOS teaches AI to:

Think Before Acting: Internal deliberation between multiple perspectives
Track Confidence: Know when certain vs. uncertain
Adapt Gracefully: Pivot strategies when things go wrong
Learn from Failures: Extract insights from what didn't work
Match Complexity: Use simple solutions for simple problems

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Areas for Contribution:

New scenario domains
Additional tool types
Language-specific training formats
Evaluation metrics
Real-world scenario validations

📈 Results

AI models trained with CHAOS data show:

40% better adaptation to unexpected scenarios
60% reduction in overengineering simple tasks
3x more creative problem solutions
Human-like confidence patterns

PEFT Training Benefits

Efficient Fine-tuning: LoRA/QLoRA compatible format reduces training costs by 90%
Domain-Specific: Generate focused datasets for specific use cases (APIs, DevOps, etc.)
Scalable: Generate 200-500 examples per domain in minutes with Gemini integration
Ready-to-Use: Direct compatibility with popular PEFT libraries (Hugging Face PEFT, Alpaca-LoRA)

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Inspired by how human experts actually solve problems
Built for the agentic AI community
Special thanks to all contributors

📬 Contact

GitHub Issues: Report bugs or request features
Discussions: Join the conversation

🎯 Popular Use Cases

Generate training data for specific domains:

API Integration: Authentication, rate limiting, error handling, service integration
Database Performance: Query optimization, connection issues, migration challenges
DevOps & Infrastructure: Deployment, monitoring, scaling, incident response
Security Operations: Threat detection, incident response, vulnerability management
Machine Learning Ops: Model deployment, data pipeline issues, performance monitoring

🚀 Quick PEFT Training Guide

1. Install & Generate Dataset

# Install the package
pip install chaos-framework

# Optional: Get free Gemini API key for enhanced variety
# https://aistudio.google.com/app/apikey
export GEMINI_API_KEY="your_key_here" 

# Generate training data using CLI
chaos-generate generate --count 500 --format alpaca --domain technical

# Or use interactive Python script for specific use cases
cd examples
python generate_peft_dataset.py

2. Install PEFT Dependencies

pip install "chaos-framework[peft]"  # Includes transformers, peft, torch, etc.

3. Basic LoRA Training

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

# Load your generated dataset
dataset = load_dataset('json', data_files='chaos_scenarios.json')

# Standard LoRA configuration for instruction following
lora_config = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"]
)

# Train with your CHAOS-generated data
# See: https://github.com/huggingface/peft for complete examples

Ready to teach your AI to think adaptively?

CLI users: pip install chaos-framework && chaos-generate --help
Python users: pip install chaos-framework && python -c "import chaos_framework; print('Ready!')"

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.0

Jun 16, 2025

This version

1.0.0

Jun 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chaos_framework-1.0.0.tar.gz (573.6 kB view details)

Uploaded Jun 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chaos_framework-1.0.0-py3-none-any.whl (7.4 kB view details)

Uploaded Jun 14, 2025 Python 3

File details

Details for the file chaos_framework-1.0.0.tar.gz.

File metadata

Download URL: chaos_framework-1.0.0.tar.gz
Upload date: Jun 14, 2025
Size: 573.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chaos_framework-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`58272b2ffabd04daa342275d7d0a5c1555be7d3632e2b1a1c0af203f965fd537`
MD5	`ce9a6b9d9276efd6efc04c777bd7d103`
BLAKE2b-256	`677ad8ef22614a8c82562e4595a81bd8554e3bd46ff771036920634d48687f3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chaos_framework-1.0.0.tar.gz:

Publisher: release.yml on gaganmanku96/CHAOS-Framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chaos_framework-1.0.0.tar.gz
- Subject digest: 58272b2ffabd04daa342275d7d0a5c1555be7d3632e2b1a1c0af203f965fd537
- Sigstore transparency entry: 238290646
- Sigstore integration time: Jun 14, 2025
Source repository:
- Permalink: gaganmanku96/CHAOS-Framework@912b207067f2124eab4af8bbdc6566065adf5896
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/gaganmanku96
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@912b207067f2124eab4af8bbdc6566065adf5896
- Trigger Event: push

File details

Details for the file chaos_framework-1.0.0-py3-none-any.whl.

File metadata

Download URL: chaos_framework-1.0.0-py3-none-any.whl
Upload date: Jun 14, 2025
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chaos_framework-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac23dc9f2ad66b800e48018e85f1bcc78ae10c42c2cca1eda24e2ca083b97221`
MD5	`ef3360d16f25c8524b92c11f2292bba3`
BLAKE2b-256	`dc632ff5bc39fa8483414a7d9dcb8ab90f01ddba0e169ebe17b3d038766389e5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chaos_framework-1.0.0-py3-none-any.whl:

Publisher: release.yml on gaganmanku96/CHAOS-Framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chaos_framework-1.0.0-py3-none-any.whl
- Subject digest: ac23dc9f2ad66b800e48018e85f1bcc78ae10c42c2cca1eda24e2ca083b97221
- Sigstore transparency entry: 238290649
- Sigstore integration time: Jun 14, 2025
Source repository:
- Permalink: gaganmanku96/CHAOS-Framework@912b207067f2124eab4af8bbdc6566065adf5896
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/gaganmanku96
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@912b207067f2124eab4af8bbdc6566065adf5896
- Trigger Event: push

chaos-framework 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

CHAOS Framework - Agentic AI Training Data Generator

🚀 Overview

Key Innovation

📊 CHAOS Training Pipeline

Traditional vs CHAOS Training

🧪 What CHAOS Generates

Input: Basic Task

Output: Rich Training Scenario

CHAOS Format (Internal Reasoning)

Alpaca Format (PEFT-Ready)

🎯 Features

📦 Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install from Source

Option 3: Development Setup

🔑 Gemini AI Configuration (Optional but Recommended)

🏃 Quick Start

Option 1: Using CLI Tool (After pip install)

Option 2: Using Python Scripts

1. Generate Your First Dataset

2. Generate PEFT Dataset (Recommended)

3. Generate Large Training Dataset

4. Convert to Training Format

📚 Documentation

🛠️ Usage Examples

Using as Python Library

Using Source Code Directly

📊 Difficulty Levels

🎓 Training Philosophy

🤝 Contributing

Areas for Contribution:

📈 Results

PEFT Training Benefits

📄 License

🙏 Acknowledgments

📬 Contact

🎯 Popular Use Cases

🚀 Quick PEFT Training Guide

1. Install & Generate Dataset

2. Install PEFT Dependencies

3. Basic LoRA Training

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance