Skip to main content

A production-ready Python framework for building autonomous AI agents that can plan, validate, and execute complex tasks using LLMs and custom tools

Project description

AgentV2 - Production-Ready Python Agent Framework

A modular, production-ready Python framework for building autonomous AI agents that can plan, validate, and execute complex tasks using LLMs and custom tools.

๐ŸŽฏ What is This Project?

AgentV2 is a deterministic, production-grade agent framework that separates concerns into distinct components:

  • Planner: Generates structured todo lists from natural language tasks
  • Validator: Ensures todos meet quality and domain-specific requirements
  • Executor: Deterministically executes todos one at a time using LLM-guided actions
  • Agent: High-level orchestrator that coordinates the entire workflow
  • Session Memory: Lightweight caching and context management for conversational agents

The framework enforces strict architectural boundaries, ensuring predictable execution, no silent failures, and deterministic outcomes.

๐Ÿ—๏ธ Architecture & Logic

High-Level Flow

graph TD
    A[User Task] --> B[Agent.run]
    B --> C[Planner]
    C --> D[Generate Todos]
    D --> E[Validator]
    E --> F{Valid?}
    F -->|No| G[Auto-Rewrite]
    G --> E
    F -->|Yes| H[Executor]
    H --> I[Execute Todos]
    I --> J[LLM Proposes Action]
    J --> K{Action Type}
    K -->|tool| L[Execute Tool]
    K -->|complete_todo| M[Mark Complete]
    K -->|think| N[Update State]
    L --> O[Update Memory]
    M --> O
    N --> O
    O --> P{All Done?}
    P -->|No| I
    P -->|Yes| Q[Summarize]
    Q --> R[Final Reply]

Component Responsibilities

graph LR
    subgraph "Agent (Orchestrator)"
        A1[Task Input] --> A2[Plan]
        A2 --> A3[Validate]
        A3 --> A4[Execute]
        A4 --> A5[Summarize]
    end
    
    subgraph "Planner"
        P1[Task] --> P2[LLM Generate]
        P2 --> P3[TodoList]
    end
    
    subgraph "Validator"
        V1[TodoList] --> V2[Base Rules]
        V2 --> V3[Domain Rules]
        V3 --> V4[Quality Score]
        V4 --> V5{Pass?}
        V5 -->|No| V6[Rewrite]
        V6 --> V1
        V5 -->|Yes| V7[Validated]
    end
    
    subgraph "Executor"
        E1[TodoList] --> E2[Iterate Todos]
        E2 --> E3[LLM Action]
        E3 --> E4{Action}
        E4 -->|tool| E5[Call Tool]
        E4 -->|complete| E6[Mark Done]
        E4 -->|think| E7[Update State]
        E5 --> E8[Update Memory]
        E6 --> E8
        E7 --> E8
        E8 --> E9{Next?}
        E9 -->|Yes| E2
        E9 -->|No| E10[Done]
    end
    
    A2 --> P1
    A3 --> V1
    A4 --> E1
    V7 --> A4
    E10 --> A5

Session Memory Flow

sequenceDiagram
    participant U as User
    participant A as Agent
    participant S as SessionStore
    participant C as Cache
    
    U->>A: chat("What's the weather?")
    A->>S: get(session_id)
    S->>C: check_cache(normalized_task)
    alt Cache Hit
        C-->>A: cached_reply
        A-->>U: cached_reply (no LLM call)
    else Cache Miss
        A->>A: plan + execute
        A->>C: cache_reply(task, reply)
        A-->>U: final_reply
    end

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd AgentV2

# Install dependencies (using uv)
uv sync

# Or using pip
pip install -r requirements.txt

Environment Setup

Create a .env file:

GROQ_API_KEY=your_groq_api_key_here

Basic Usage

Example 1: Simple Task Execution

from src.agent import Agent
import uuid

# Define tools
def add(a: int, b: int) -> int:
    return a + b

tools = {"add": add}

# Create agent
agent = Agent(
    model="groq/openai/gpt-oss-120b",
    system_prompt="You are an autonomous execution agent.",
    session_id=f"session-{uuid.uuid4().hex[:8]}",
    tools=tools,
)

# Run a task
result = agent.run("Add 5 and 10, then multiply by 2")
print(result.final_reply)

Example 2: Chat Agent with Web Search

from src.agent import Agent
from ddgs import DDGS
import uuid

def web_search(query: str) -> str:
    with DDGS() as ddgs:
        results = list(ddgs.text(query, max_results=5))
    return format_results(results)

tools = {"web_search": web_search}

agent = Agent(
    model="groq/openai/gpt-oss-120b",
    system_prompt="You are a helpful assistant with web search.",
    session_id=f"chat-{uuid.uuid4().hex[:8]}",
    tools=tools,
)

# Use chat API (with session memory)
reply = agent.chat("What's the latest news about AI?")
print(reply)

Example 3: File Operations Agent

from src.agent import Agent
from pathlib import Path
import uuid

def read_file(path: str) -> str:
    return Path(path).read_text()

def write_file(path: str, content: str) -> str:
    Path(path).write_text(content)
    return f"Wrote {len(content)} bytes to {path}"

tools = {
    "read_file": read_file,
    "write_file": write_file,
}

agent = Agent(
    model="groq/openai/gpt-oss-120b",
    system_prompt="You are a file operations agent.",
    session_id=f"fileops-{uuid.uuid4().hex[:8]}",
    tools=tools,
    domain_validator=None,  # Disable domain validation for file ops
)

result = agent.run("Create a hello.py file that prints 'Hello, World!'")
print(result.final_reply)

๐ŸŽจ Key Features

1. Deterministic Execution

  • No unbounded loops
  • Strict step limits per todo
  • Predictable outcomes
  • No silent failures

2. Session Memory

  • Automatic caching of exact-match tasks
  • Context injection across multiple turns
  • Lightweight, token-efficient
  • Session-based isolation

3. Strict Validation

  • Base validation (action verbs, length, forbidden phrases)
  • Domain-specific validation (backend/frontend/data)
  • Quality scoring (0.0-1.0)
  • Auto-rewrite on failure (bounded attempts)

4. Tool Sandboxing

  • Tools provided as callables
  • Validated before execution
  • Exceptions propagate as RuntimeError
  • Results stored in authoritative memory

5. Rich Logging

  • Structured, box-formatted logs
  • Clear visual separation
  • Execution stats and progress tracking
  • Error reporting with context

๐Ÿ”ง Architecture Constraints

The framework enforces strict boundaries:

  • Planner decides todos - Executor never modifies the plan
  • LLM proposes actions - Only via AgentState schema
  • Memory is authoritative - Executor enforces all invariants
  • No retries in Agent - Failures propagate immediately
  • Tools are sandboxed - Validated and isolated
  • Deterministic execution - Same input โ†’ same output

๐Ÿ“ Project Structure

AgentV2/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ agent.py          # High-level orchestrator
โ”‚   โ”œโ”€โ”€ planner.py        # Todo generation
โ”‚   โ”œโ”€โ”€ executor.py        # Deterministic execution
โ”‚   โ””โ”€โ”€ session_store.py  # Session memory management
โ”œโ”€โ”€ schemas/
โ”‚   โ”œโ”€โ”€ AgentMemory.py    # Authoritative memory state
โ”‚   โ”œโ”€โ”€ AgentState.py     # LLM action proposals
โ”‚   โ”œโ”€โ”€ TodoSchema.py     # Todo data models
โ”‚   โ””โ”€โ”€ SessionMemory.py  # Session context model
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ llm.py            # LLM interface (LiteLLM)
โ”‚   โ”œโ”€โ”€ logger.py         # Rich logging utilities
โ”‚   โ”œโ”€โ”€ validators.py     # Todo validation & scoring
โ”‚   โ””โ”€โ”€ Prompts.py        # Prompt template loader
โ”œโ”€โ”€ prompts/
โ”‚   โ”œโ”€โ”€ Agent.md          # Execution prompt
โ”‚   โ”œโ”€โ”€ Todo.md           # Planning prompt
โ”‚   โ”œโ”€โ”€ FinalReply.md     # Summarization prompt
โ”‚   โ””โ”€โ”€ TodoRewrite.md    # Rewrite prompt
โ”œโ”€โ”€ main.py               # Example: Basic tools
โ”œโ”€โ”€ main2.py             # Example: Chat agent
โ”œโ”€โ”€ main3.py             # Example: File operations
โ””โ”€โ”€ README.md

๐Ÿ’ก Use Cases

1. Task Automation

  • Break down complex tasks into executable steps
  • Execute multi-step workflows deterministically
  • Handle file operations, API calls, data processing

2. Conversational Agents

  • Chat interfaces with web search
  • Context-aware responses
  • Caching for repeated queries
  • Session-based memory

3. Code Generation & File Operations

  • Generate code files from descriptions
  • Read and modify existing files
  • Execute and test generated code
  • Create full-stack applications

4. Data Processing Pipelines

  • Extract, transform, and load data
  • Validate and clean datasets
  • Generate reports and summaries

5. API Integration Agents

  • Interact with external APIs
  • Process web search results
  • Aggregate information from multiple sources

6. Development Assistants

  • Generate boilerplate code
  • Refactor existing codebases
  • Write tests and documentation
  • Debug and fix issues

๐Ÿ› ๏ธ Customization

Adding Custom Tools

def my_custom_tool(param1: str, param2: int) -> str:
    """Tool description for the LLM."""
    # Your logic here
    return "result"

tools = {
    "my_custom_tool": my_custom_tool,
}

Custom Domain Validators

from utils.validators import DomainTodoValidator

class MyDomainValidator(DomainTodoValidator):
    FORBIDDEN = ["forbidden_term1", "forbidden_term2"]
    
    def validate(self, todo: TodoItemInput) -> None:
        # Your validation logic
        pass

agent = Agent(
    ...,
    domain_validator=MyDomainValidator(),
)

Custom Prompts

Edit the markdown files in prompts/:

  • Agent.md - Execution instructions
  • Todo.md - Planning instructions
  • FinalReply.md - Summarization instructions

๐Ÿ“Š Execution Flow Details

Planning Phase

  1. User provides task description
  2. Planner generates TodoListInput using LLM
  3. Validator checks base rules, domain rules, quality score
  4. Auto-rewrite invalid todos (up to 2 attempts)
  5. Return validated TodoList with UUIDs

Execution Phase

  1. Executor iterates through todos sequentially
  2. For each todo:
    • LLM proposes AgentState (think/tool/complete_todo/fail_todo/noop)
    • Validate JSON strictly
    • Apply action deterministically
    • Update AgentMemory (authoritative state)
    • Enforce step limits (MAX_STEPS_PER_TODO)
  3. Continue until all todos complete or fail

Summarization Phase

  1. Collect completed todos and final results
  2. Generate natural language summary
  3. Return final reply to user

๐Ÿ”’ Security & Safety

  • Tool Sandboxing: Tools execute in controlled environment
  • Input Validation: All LLM outputs validated with Pydantic
  • Error Handling: No silent failures, all errors propagate
  • Step Limits: Bounded execution prevents infinite loops
  • Session Isolation: Each session_id has isolated memory

๐Ÿ“ Logging

The framework uses Rich for beautiful, structured logging:

  • Box-formatted panels for clear separation
  • Color-coded success/error/warning messages
  • Execution stats tables
  • Todo lists with status indicators
  • Structured logs to files in logs/ directory

๐Ÿค Contributing

This is a production-ready framework with strict architectural constraints. When contributing:

  1. Maintain separation of concerns (Planner/Executor/Agent)
  2. Never add retry logic in Agent
  3. Always validate LLM outputs
  4. Keep execution deterministic
  5. Add tests for new features

๐Ÿ“„ License

[Add your license here]

๐Ÿ™ Acknowledgments


Note: This framework is designed for production use with strict architectural boundaries. Ensure you understand the constraints before extending functionality. sequenceDiagram participant User participant Agent participant SessionMemory participant AgentMemory participant Executor participant LLM

User->>Agent: run("task")
Agent->>SessionMemory: get_cached_reply(normalized_task)
alt Cache Hit
    SessionMemory-->>Agent: cached_reply
    Agent-->>User: AgentMemory(final_reply=cached_reply)
else Cache Miss
    Agent->>AgentMemory: new AgentMemory(todos=...)
    Agent->>Executor: run(memory)
    loop For each todo
        Executor->>LLM: propose action
        LLM-->>Executor: AgentState
        Executor->>AgentMemory: append_state(state)
        Executor->>AgentMemory: update last_tool/result
        Executor->>AgentMemory: mark todo complete
    end
    Executor-->>Agent: updated AgentMemory
    Agent->>AgentMemory: generate final_reply
    Agent->>SessionMemory: cache_reply(task, reply)
    Agent-->>User: AgentMemory with final_reply
end

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentv2-0.1.0.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentv2-0.1.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file agentv2-0.1.0.tar.gz.

File metadata

  • Download URL: agentv2-0.1.0.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agentv2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 10ebb804043c881e1b2a3273ada2fd0664d333711e2ea91f3210b2d5d9fa8021
MD5 7c821aad96c2e6b9fa3833213dafdf39
BLAKE2b-256 a7e4f320f4104cfd97fedf9e5d974dfd25fa6ab6c213f10a9e0e12ea79fa52f3

See more details on using hashes here.

File details

Details for the file agentv2-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentv2-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agentv2-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5dece8b54dc89767950ccba571f8912a41db3de3790705e200b792fd66bbcea9
MD5 5a303c5290481117cefb538dceac08c8
BLAKE2b-256 dd8be540515d85ab936f8017cec20d5657a1ff610d657dbbbe885df48bdfdfd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page