Domain-agnostic agent framework for integrating AI agents into data pipelines

These details have not been verified by PyPI

Project links

Project description

SOTA Agent - Universal Agent Workflow Template

A generic, production-ready template for integrating AI agents into any application or data pipeline.

🎯 This is a TEMPLATE - Use it to build agent workflows for any domain!

Originally designed for fraud detection, this architecture template applies to any domain requiring AI agent integration:

🔒 Fraud Detection & Risk Analysis
💬 Customer Support & Chatbots
📝 Content Moderation & Policy Enforcement
🏥 Healthcare & Diagnosis Support
🔍 Data Quality & Anomaly Detection
📊 Analytics & Report Generation
🤖 Any Agent-Powered Workflow

🚀 Quick Start

Installation

# Basic installation
pip install sota-agent-framework

# With optional features
pip install sota-agent-framework[mcp]          # MCP tool calling
pip install sota-agent-framework[ray]          # Distributed execution
pip install sota-agent-framework[databricks]    # Databricks integration
pip install sota-agent-framework[optimization]  # DSPy + TextGrad
pip install sota-agent-framework[all]          # Everything

# Or install from GitHub
pip install git+https://github.com/somasekar278/universal-agent-template.git

Generate Your First Project

# Generate a complete project for your domain
sota-generate --domain "your_domain" --output ./your-project

# Navigate and run
cd your-project
python examples/example_usage.py  # Works immediately! ✅

For Contributors/Development

If you're cloning the repo to contribute:

git clone https://github.com/somasekar278/universal-agent-template.git
cd universal-agent-template
./setup.sh  # or setup.bat on Windows
python template_generator.py --domain "test"

Path 2: Integrate Into Existing Code (3 lines)

from agents import AgentRouter

router = AgentRouter.from_yaml("config/agents.yaml")  # 1. Load
result = await router.route("your_agent", input_data)  # 2. Execute
# That's it! 🎉

📖 See Getting Started Guide for detailed 5-minute guide

🧪 Benchmark Your Agents

The framework includes a production-grade evaluation suite for comprehensive agent testing:

# Install with benchmarking support
pip install sota-agent-framework[dev]

# Run benchmarks
sota-benchmark run --suite fraud --agents all --report md

# View auto-generated leaderboard
cat benchmark_results/leaderboard.md

Features:

✅ Multi-metric evaluation (tool calls, planning, hallucination, latency, coherence, accuracy)
🏆 Auto-generated leaderboards ranking agents
📊 Multiple report formats (Markdown, JSON, HTML)
🔄 Regression testing for CI/CD
⚡ Parallel execution for fast evaluation

📖 See Benchmarking Guide for complete documentation

🧠 Agent-Governed Memory System

Intelligent memory management where agents decide what to store, retrieve, and forget:

from memory import MemoryManager, MemoryType, MemoryImportance

# Initialize memory
memory = MemoryManager()

# Agent stores (auto-detects importance and type)
await memory.store(
    content="User prefers dark mode at night",
    importance=MemoryImportance.HIGH
)

# Agent retrieves with semantic search
memories = await memory.retrieve(
    query="What are user preferences?",
    strategy="hybrid"  # semantic + recency + importance
)

# Agent reflects and consolidates
summary = await memory.reflect()

# Agent forgets old data
forgotten = await memory.forget()

Features:

🧠 5 Memory Types - Short-term, long-term, episodic, semantic, procedural
🔍 Semantic Search - Vector embeddings for similarity-based retrieval
🤔 Reflection - Agents create insights and summaries from memories
⏰ Smart Forgetting - Time/importance/capacity-based policies
🔗 Memory Graphs - Track relationships and patterns
💬 Context Budgeting - Automatic token management for LLMs
🤝 Shared Memory - Private and shared memory spaces across agents

📖 See Memory System Guide for complete documentation

🎯 Reasoning Optimization

Advanced reasoning optimization for continuously improving agents:

from reasoning import ReasoningOptimizer, TrajectoryOptimizer, CoTDistiller

# Initialize optimizer
optimizer = ReasoningOptimizer(agent)

# Optimize execution
result = await optimizer.optimize(input_data)

# Learn from execution
await optimizer.learn_from_execution(
    trajectory=execution_trajectory,
    reasoning_chain=agent_reasoning,
    reward=0.85  # Reward signal
)

# Get optimization report
report = optimizer.get_optimization_report()

Features:

📊 Trajectory Optimization - Learn optimal action sequences from past executions
📉 CoT Distillation - Compress reasoning chains (50%+ token savings)
🔄 Feedback Loops - Critique → Revise → Retry for self-improvement
🛡️ Policy Constraints - Enforce safety, cost, and latency guardrails
🎓 RL-Style Tuning - Optimize hyperparameters via reward signals

📖 See Reasoning Optimization Guide for complete documentation

🎯 Prompt Optimization (DSPy + TextGrad)

Advanced prompt optimization using DSPy for task prompts and TextGrad for system prompts:

from optimization import PromptOptimizer, OptimizationPipeline

# Initialize optimizer
optimizer = PromptOptimizer()

# Optimize system prompt with TextGrad
system_result = await optimizer.optimize(
    prompt="You are a fraud detection expert.",
    prompt_type="system",
    evaluation_data=eval_data,
    objective="Maximize accuracy while being concise"
)

# Optimize task prompt with DSPy
task_result = await optimizer.optimize(
    prompt="Classify the transaction",
    prompt_type="task",
    training_data=train_data,
    task="fraud_detection"
)

# Run full optimization pipeline
pipeline = OptimizationPipeline()
result = await pipeline.run(
    agent_config=agent_config,
    training_data=train_data,
    evaluation_data=eval_data,
    stages=["system", "task", "test"]
)

# A/B test variants
from optimization import ABTestFramework

framework = ABTestFramework()
test_result = await framework.run_test(
    variants=[baseline, optimized],
    test_data=test_cases
)

Features:

🎓 DSPy Integration - Few-shot learning for task prompts
📈 TextGrad Optimization - Gradient-based system prompt refinement
🔄 Multi-Stage Pipelines - System → Task → A/B Test
🧪 Statistical Testing - Confidence intervals and significance
📦 Unity Catalog Integration - Auto-versioning of optimized prompts
📊 Performance Tracking - Optimization history and metrics

📖 See Optimization Guide for complete documentation

📊 Databricks-Native Visualization

Built-in observability and debugging for Databricks notebooks:

from visualization import DatabricksVisualizer

# Works natively in Databricks notebooks
viz = DatabricksVisualizer()

# Execution graph (Mermaid diagram)
viz.show_execution_graph(trace)

# Timeline (Plotly chart)
viz.show_timeline(trace)

# Tool call replay
viz.show_tool_calls(tool_calls)

# Decision inspection
viz.explain_decision(decision, context)

# Log to MLflow
viz.log_to_mlflow(trace)

# Create interactive widget
create_databricks_widget(trace)

Features:

🎨 Execution Graphs - Mermaid diagrams showing agent workflow
⏱️ Timeline Visualization - Plotly charts for execution timing
🔧 Tool Call Replay - Interactive tool call inspection
🤔 Decision Explainer - "Why did the agent do this?"
📝 Prompt Comparison - Side-by-side version diffs
📊 MLflow Integration - Auto-log visualizations to MLflow
🎛️ Databricks Widgets - Interactive notebook controls

Designed for Databricks:

Uses displayHTML() for native rendering
Integrates with MLflow UI
Works with Databricks widgets
Also works in Jupyter/standalone

📖 See Visualization Guide for complete documentation

Why Use This Template?

✨ Universal Design - Works for any domain, not just fraud detection
🔌 Plug-and-Play - 3 lines to integrate into existing pipelines
⚙️ Configuration-Driven - Enable/disable agents via YAML, zero code changes
🎯 SLA-Aware - Control inline vs async execution based on your requirements
🏗️ Production-Ready - Battle-tested patterns, not toy examples
📦 Complete Stack - Includes telemetry, evaluation, optimization, deployment
🚀 Template Generator - Scaffold new projects in seconds
🧪 Built-in Benchmarking - Comprehensive eval suite with leaderboards

Architecture Overview

This project implements a domain-agnostic, plug-and-play agent framework that integrates into existing data pipelines with minimal code changes. The architecture leverages:

Ephemeral Agents: Task-specific narrative agents that spin up on-demand
Hot LLM Pools: Always-on GPU endpoints via Databricks Model Serving
Prompt Optimization: DSPy for task prompts, TextGrad for system prompts
Memory & Context: Lakebase for conversation history and embeddings
MCP Tool Calling: Standardized tool interfaces via Model Context Protocol
Observability: OTEL → Zerobus → Delta Lake telemetry pipeline
Evaluation: MLflow custom scorers and continuous feedback loops

Key Features

🔌 Plug-and-Play Integration - Add to existing pipelines with 3 lines of code
⚙️ Configuration-Driven - Enable/disable agents via YAML, no code changes
🧠 LangGraph Orchestration - Plan → Act → Critique → Re-plan loops for autonomous workflows
🎯 SLA-Aware Execution - Control inline vs offline based on requirements
🔒 Type-Safe - Pydantic schemas validate all data at runtime
🌐 ASGI Support - FastAPI endpoints, SSE streaming, async HTTP
🔄 Agent-to-Agent (A2A) - Event-driven agent communication via NATS/Redis (optional)
✨ Domain-Agnostic - Works for fraud, risk, support, compliance, or any use case
📈 Prompt Optimization - DSPy for task prompts, TextGrad for system prompts
📊 Comprehensive Telemetry - All events streamed to Delta Lake via Zerobus
🧠 Memory Management - Lakebase for vector embeddings and conversation history
🔧 MCP Tool Integration - Standardized external tool calling (v1.25.0+)
📉 MLflow Tracking - Experiment tracking, evaluation, and model registry
🏛️ Unity Catalog - Centralized prompt and model versioning
🏢 Multi-Tenant Ready - Schema adapters handle any customer format
🧪 Agent Benchmarking - Multi-metric eval suite with auto-generated leaderboards
🧠 Agent-Governed Memory - Intelligent storage, retrieval, reflection, and forgetting
🎯 Reasoning Optimization - Trajectory tuning, CoT distillation, feedback loops, RL-style tuning
📊 Databricks-Native Visualization - Execution graphs, timelines, tool replay, decision inspection
⚙️ YAML-Configurable - All infrastructure and runtime settings via unified YAML

Project Structure

.
├── agents/                     # 🤖 Agent framework (CORE)
│   ├── base.py                #    - Base agent interfaces
│   ├── config.py              #    - Configuration loader
│   ├── registry.py            #    - Agent registry + router
│   └── execution/             #    - Pluggable execution backends
├── shared/                    # 📦 Shared libraries
│   ├── schemas/               #    - Pydantic data models (type-safe)
│   └── adapters/              #    - Schema adaptation framework
├── config/                    # ⚙️  Configuration (plug-and-play)
│   ├── agents/                #    - Agent configurations (YAML)
│   └── adapters/              #    - Customer schema adapters
├── services/                  # 🚀 Deployable services
├── optimization/              # 🎓 Prompt optimization (DSPy/TextGrad)
├── memory/                    # 🧠 Lakebase integration
├── orchestration/             # 🔄 Databricks Workflows + LangGraph
├── mcp-servers/               # 🔧 Model Context Protocol tools
├── evaluation/                # 📊 MLflow scorers and metrics
├── telemetry/                 # 📈 OTEL → Zerobus → Delta
├── uc-registry/               # 🗃️  Unity Catalog integration
├── data/                      # 📊 Synthetic testbed
├── infrastructure/            # 🏗️  Deployment configs (DABS)
├── experiments/               # 🔬 Notebooks + MLflow tracking
├── tests/                     # 🧪 Unit, integration, load tests
└── docs/                      # 📖 Documentation

See Project Structure for detailed breakdown with key concepts.

Data Schemas

All data structures are defined using Pydantic models in shared/schemas/:

transactions.py - Transaction records and payment data
fraud_signals.py - Velocity, amount, location, device signals
contexts.py - Merchant and customer profiles
agent_io.py - Agent inputs, outputs, tool calls (MCP-ready)
evaluation.py - Evaluation records and scorer metrics
telemetry.py - OTEL traces for Zerobus ingestion

See shared/schemas/README.md for detailed documentation.

Quick Start (Plug-and-Play)

Add agents to your existing pipeline in 3 lines:

from agents import AgentRouter
from shared.schemas import AgentInput

# 1. Load agents from config (one line!)
router = AgentRouter.from_yaml("config/agents.yaml")

# 2. Convert your data to AgentInput (Pydantic validates!)
agent_input = AgentInput(
    request_id=record.id,
    data=YourDomainData(**record.dict()),  # Your domain-specific data
    # ... your contexts
)

# 3. Route to agent (inline or offline based on config!)
result = await router.route("your_agent", agent_input)

# That's it! Agent runs according to your config.
# No code changes to enable/disable or switch execution modes.

Configuration controls everything:

# config/agents.yaml
agents:
  your_agent:
    class: "your_package.YourAgent"
    execution_mode: "offline"  # or "inline" if SLA allows
    enabled: true              # Change to false to disable
    timeout: 30

Works for any domain: Fraud detection, risk analysis, customer support, compliance, content moderation, etc.

See Configuration System for details.

Getting Started

Prerequisites

Python 3.9+
Databricks workspace with:
- Model Serving endpoint
- Unity Catalog
- Lakebase access
Zerobus server endpoint (for telemetry)

Installation

# Clone the repository
git clone <repo-url>
cd "SOTA Agent"

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Or install in development mode
pip install -e ".[dev]"

Configuration

# Copy example config
cp .env.example .env

# Edit .env with your Databricks credentials
# - DATABRICKS_HOST
# - DATABRICKS_TOKEN
# - MODEL_SERVING_ENDPOINT
# - UNITY_CATALOG_NAME
# - ZEROBUS_ENDPOINT

Databricks Stack

Component	Technology
LLM Inference	Databricks Model Serving
Orchestration	LangGraph + Databricks Workflows
Tracing & Evaluation	Databricks MLflow
Memory/Vector Store	Lakebase
Telemetry Sink	Zerobus → Delta Lake
Prompt Registry	Unity Catalog
Dashboards	Databricks SQL
Compute	Databricks Clusters / Serverless

Development

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

# Run specific test suite
pytest tests/unit/
pytest tests/integration/

Code Quality

# Format code
black .

# Lint
ruff check .

# Type check
mypy .

Architecture Flows

Realtime Path (Low-latency)

Transaction → Event Collector → Ephemeral Narrative Agent → MCP Tool Calls → LLM Pool → Risk Narrative → Dashboard/Alerts

Async Path (Optimization)

MLflow Scorers → Evaluate High-Risk Txns → Log Metrics → DSPy/TextGrad Optimization → Update Prompts in UC → Deploy to Agents

MCP Integration

All tool calls use Model Context Protocol for standardization:

# Tool call schema (MCP-ready)
tool_call = ToolCall(
    tool_id="call_123",
    tool_name="merchant_context",
    tool_server="uc-query-server",
    arguments={"merchant_id": "mch_001"}
)

# Tool result
tool_result = ToolResult(
    tool_call_id="call_123",
    success=True,
    result=merchant_data,
    latency_ms=45.2
)

See mcp-servers/ for tool implementations.

Telemetry

All events flow through OTEL → Zerobus → Delta Lake:

Agent start/complete/error
MCP tool calls
LLM requests/responses
Stream chunks
Evaluation results

Query telemetry in Unity Catalog:

SELECT * FROM main.telemetry.agent_traces
WHERE transaction_id = 'txn_123'
ORDER BY timestamp DESC;

Prompt Optimization

DSPy (Task Prompts)

# Optimize reasoning pipeline
from optimization.dspy import MIPROOptimizer

optimizer = MIPROOptimizer(training_data)
optimized_prompt = optimizer.optimize(baseline_prompt)

TextGrad (System Prompts)

# Optimize system prompt with guardrails
from optimization.textgrad import SystemPromptOptimizer

optimizer = SystemPromptOptimizer(feedback_data)
optimized_system = optimizer.optimize(system_prompt)

Synthetic Data

Generate idempotent test data:

# Generate synthetic transactions
python -m data.synthetic.generate --seed 42 --count 5000

# Output: data/synthetic/raw/transactions.parquet

Contributing

Create a feature branch
Make changes with tests
Run linters and tests
Submit pull request

License

MIT

Documentation

🎯 Start Here

Getting Started ⭐ - 5-minute quick start guide
Template Guide ⭐ - Comprehensive guide for any domain
Cross-Domain Examples ⭐ - 8 real-world examples
Documentation Index - Complete documentation map

📚 Core Documentation

Project Structure - Code organization and key concepts
Configuration System - YAML-based configuration
Schema Documentation - Data schemas and adaptation
Use Cases - Advanced usage patterns

🛠️ Tools

Template Generator - python template_generator.py --help
Example Integrations - examples/plug_and_play_integration.py

Contact

For questions, see docs/ or contact the team.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.3

Jan 2, 2026

0.4.2

Jan 2, 2026

0.4.1

Dec 31, 2025

0.4.0

Dec 31, 2025

0.3.0

Dec 31, 2025

This version

0.2.1

Dec 30, 2025

0.1.6

Dec 30, 2025

0.1.5

Dec 30, 2025

0.1.4

Dec 30, 2025

0.1.3

Dec 30, 2025

0.1.2

Dec 30, 2025

0.1.1

Dec 30, 2025

0.1.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sota_agent_framework-0.2.1.tar.gz (160.6 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sota_agent_framework-0.2.1-py3-none-any.whl (188.3 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file sota_agent_framework-0.2.1.tar.gz.

File metadata

Download URL: sota_agent_framework-0.2.1.tar.gz
Upload date: Dec 30, 2025
Size: 160.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for sota_agent_framework-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`f7b260da1f0194604989c82c6f65198077c2cd193163ed1320622ba3a9e92bbc`
MD5	`1694ef715b3bc24a5e9ee60cbc30e7b4`
BLAKE2b-256	`b5ac5199884ece53731eb04365b1d17a87639d8c399875c740d8565e736a1a87`

See more details on using hashes here.

File details

Details for the file sota_agent_framework-0.2.1-py3-none-any.whl.

File metadata

Download URL: sota_agent_framework-0.2.1-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 188.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for sota_agent_framework-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d231ea416b1dc841edc51ff2cc51ea91a2a074d2b2103b0a1a4d9d6aa319a8f0`
MD5	`9cbdcd342c950794d6b2e698cd350ae1`
BLAKE2b-256	`412b10de128634db3d2143a0f1ed54117fd3a36b180032a3efce23f28ef1c2ea`

See more details on using hashes here.

sota-agent-framework 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SOTA Agent - Universal Agent Workflow Template

🚀 Quick Start

Installation

Generate Your First Project

For Contributors/Development

Path 2: Integrate Into Existing Code (3 lines)

🧪 Benchmark Your Agents

🧠 Agent-Governed Memory System

🎯 Reasoning Optimization

🎯 Prompt Optimization (DSPy + TextGrad)

📊 Databricks-Native Visualization

Why Use This Template?

Architecture Overview

Key Features

Project Structure

Data Schemas

Quick Start (Plug-and-Play)

Getting Started

Prerequisites

Installation

Configuration

Databricks Stack

Development

Run Tests

Code Quality

Architecture Flows

Realtime Path (Low-latency)

Async Path (Optimization)

MCP Integration

Telemetry

Prompt Optimization

DSPy (Task Prompts)

TextGrad (System Prompts)

Synthetic Data

Contributing

License

Documentation

🎯 Start Here

📚 Core Documentation

🛠️ Tools

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes