Skip to main content

Multi-agent orchestration platform over MCP servers

Project description

๐Ÿ”ฅ AgentForge-AI โ€” Multi-Agent Orchestration over MCP

Production-grade AI agent platform that decomposes natural-language tasks into specialized agent workflows โ€” with human-in-the-loop approval, structured LLM output, and full observability.


๐ŸŽฌ Demo

One command triggers the full pipeline:

agentforge run "Triage all open bugs and alert the team"

What happens:

  1. ๐Ÿง  Orchestrator decomposes the task via LLM
  2. ๐Ÿ” TriageAgent fetches open GitHub issues
  3. โšก LLM classifies severity (critical/high/medium/low)
  4. ๐Ÿท๏ธ Labels applied on GitHub automatically
  5. ๐Ÿ“ Notion triage report generated
  6. ๐Ÿ’ฌ Slack alert sent to #engineering

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    CLI / User                       โ”‚
โ”‚              "Triage all open bugs"                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  Orchestrator   โ”‚  LLM decomposes task
              โ”‚  (Task Planner) โ”‚  into subtasks with
              โ”‚                 โ”‚  confidence routing
              โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜
                  โ”‚    โ”‚    โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ–ผ             โ–ผ             โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚ DevAgent โ”‚  โ”‚ TriageAg โ”‚  โ”‚ StandupAgโ”‚
   โ”‚          โ”‚  โ”‚          โ”‚  โ”‚          โ”‚
   โ”‚ GitHub   โ”‚  โ”‚ Classify โ”‚  โ”‚ Activity โ”‚
   โ”‚ Issues   โ”‚  โ”‚ + Label  โ”‚  โ”‚ Summary  โ”‚
   โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚              โ”‚              โ”‚
        โ–ผ              โ–ผ              โ–ผ
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚          MCP Server Layer            โ”‚
   โ”‚  GitHub  โ”‚  Notion  โ”‚  Slack         โ”‚
   โ”‚  (REST)  โ”‚  (REST)  โ”‚  (REST)        โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Components

Component Description
Orchestrator LLM-powered task decomposer with confidence-based routing and keyword fallback
BaseAgent Abstract base with approval gates for destructive operations
TriageAgent Fetches GitHub issues โ†’ LLM classifies severity โ†’ labels on GitHub โ†’ Notion report โ†’ Slack alert
StandupAgent Fetches GitHub activity โ†’ LLM generates standup โ†’ posts to Notion + Slack
DevAgent Creates issues, lists repos via GitHub API
EvalEngine Logs predictions to JSONL, computes precision/recall per severity label
MCP Servers GitHub, Notion, Slack โ€” all with circuit breaker + retry via tenacity

โšก Quick Start

1. Install AgentForge

Option A: Install via PyPI (Recommended)

pip install agentforge-ai

Option B: Install from Source

git clone https://github.com/OmRajput17/AgentForge-AI.git
cd AgentForge-AI
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS/Linux
pip install -e .

2. Initialize Config

agentforge init

This creates ~/.agentforge/config.yml. Open it in your editor:

notepad %USERPROFILE%\.agentforge\config.yml     # Windows
nano ~/.agentforge/config.yml                     # macOS/Linux

Full config.yml:

# โ”€โ”€ LLM Provider โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
llm:
  provider: groq                    # 'openai' or 'groq'
  model: llama-3.3-70b-versatile    # or 'gpt-4o' for OpenAI
  api_key: ''                       # your API key

# โ”€โ”€ MCP Server Connections โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
mcp_servers:
  github_token: ''                  # GitHub PAT (required)
  github_owner: ''                  # GitHub username
  github_repo: ''                   # target repository
  notion_token: ''                  # Notion integration secret (optional)
  notion_page_id: ''                # Notion page ID for reports (optional)
  slack_token: ''                   # Slack bot token (optional)
  slack_channel: general            # Slack channel for alerts

# โ”€โ”€ Behavior โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
auto_approve: false                 # skip approval prompts for destructive ops
confidence_threshold: 0.8           # min confidence for agent routing
max_iterations: 10                  # max subtasks per run
standup_lookback_hours: 24          # how far back to fetch GitHub activity

Supports OpenAI and Groq โ€” switch providers by changing provider and model. No code changes needed.

3. Run a Task

# Triage bugs โ€” classify, label, report, alert
agentforge run "Triage all open bugs and alert the team"

# Generate daily standup from GitHub activity
agentforge run "Generate daily standup for om"

# Create a GitHub issue via natural language
agentforge run "Create a GitHub issue for the login bug"

4. Check MCP Server Status

agentforge server
  GitHub       โœ… configured
  Notion       โœ… configured
  Slack        โœ… configured

๐Ÿงช Running Tests (No API Keys Required)

All tests are fully mocked โ€” they run without any API keys or network access.

# Run the full test suite
pytest agentforge/tests/ -v

# Run specific test modules
pytest agentforge/tests/test_schemas.py -v          # Pydantic schema validation
pytest agentforge/tests/test_triage_agent.py -v     # TriageAgent unit tests
pytest agentforge/tests/test_standup_agent.py -v    # StandupAgent unit tests
pytest agentforge/tests/test_dev_agent.py -v        # DevAgent unit tests
pytest agentforge/tests/test_mcp_github.py -v       # GitHub MCP server tests

# Run with coverage
pytest agentforge/tests/ -v --cov=agentforge --cov-report=term-missing

Test Suite Overview

Test Module Tests What It Covers
test_schemas.py 17 Pydantic validation, severity normalization, wontfix, model_dump
test_triage_agent.py 12 Classification, fallback, report generation, approval gate, Slack alerts
test_standup_agent.py 7 Event summarization, standup generation, full workflow
test_dev_agent.py 4 GitHub issue creation, listing, unknown action, LLM failure
test_mcp_github.py โ€” GitHub API wrapper with mocked HTTP
test_mcp_notion.py โ€” Notion API wrapper
test_mcp_slack.py โ€” Slack API wrapper
test_eval_engine.py โ€” Prediction logging, precision/recall metrics

๐Ÿ”’ Production Hardening

What makes this production-ready:

  • ๐Ÿ›ก๏ธ Human-in-the-Loop Approval โ€” Destructive agents (TriageAgent, DevAgent) require explicit user approval before mutating GitHub. Powered by BaseAgent.run() โ†’ ApprovalGate.ask().

  • ๐Ÿ“Š Pydantic Structured Output โ€” No raw json.loads() anywhere. All LLM responses go through with_structured_output(Schema) with field validators that normalize and sanitize data.

  • ๐Ÿ”„ Async-First Architecture โ€” All blocking MCP calls wrapped in asyncio.to_thread(). Orchestrator runs parallel subtasks via asyncio.gather().

  • ๐Ÿ’ฅ Graceful Degradation โ€” Every LLM call has try/except with safe fallback responses. Notion/Slack failures are non-fatal. Missing fields handled with .get() defaults.

  • ๐Ÿ”Œ Pluggable LLM Provider โ€” get_llm() factory returns OpenAI or Groq based on config. Switch providers without touching any agent code.

  • ๐Ÿ“ˆ EvalEngine Observability โ€” Every triage prediction is logged to ~/.agentforge/evals.jsonl with confidence scores. Compute precision/recall per severity label across runs.

  • ๐Ÿ”Œ Circuit Breaker + Retry โ€” All MCP servers inherit from BaseMCPServer with tenacity retry logic and thread-safe circuit breaker for resilient API calls.


๐Ÿ”„ TriageAgent Workflow

  โ”Œโ”€โ”€โ”€ Step 1: Fetch โ”€โ”€โ”€โ”
  โ”‚ github.list_issues() โ”‚ โ† asyncio.to_thread()
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 2: Classify     โ”‚
  โ”‚ LLM batch call       โ”‚ โ† with_structured_output(TriageResponse)
  โ”‚ (all issues at once) โ”‚   try/except โ†’ fallback to "low"
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 3: EvalEngine   โ”‚
  โ”‚ Log predictions      โ”‚ โ† confidence scores + run_id
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 4: Approval     โ”‚
  โ”‚ "Label 5 issues?"    โ”‚ โ† ApprovalGate.ask() โ€” human confirms
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 5: Apply Labels โ”‚
  โ”‚ github.add_labels()  โ”‚ โ† asyncio.to_thread(), per-issue resilience
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 6: Notion       โ”‚
  โ”‚ Create triage report โ”‚ โ† Full severity breakdown (non-fatal)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 7: Slack        โ”‚
  โ”‚ Alert #engineering   โ”‚ โ† Critical/High/Medium/Low counts (non-fatal)
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Step 8: Eval Report  โ”‚
  โ”‚ Print metrics        โ”‚ โ† Accuracy, precision, recall per label
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ Project Structure

AgentForge/
โ”œโ”€โ”€ agentforge/
โ”‚   โ”œโ”€โ”€ agents/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py              # BaseAgent ABC โ€” approval gates, logging
โ”‚   โ”‚   โ”œโ”€โ”€ dev_agent.py         # GitHub operations agent
โ”‚   โ”‚   โ”œโ”€โ”€ triage_agent.py      # Bug triage workflow agent
โ”‚   โ”‚   โ”œโ”€โ”€ standup_agent.py     # Daily standup generator
โ”‚   โ”‚   โ””โ”€โ”€ schemas.py           # Pydantic models for structured LLM output
โ”‚   โ”œโ”€โ”€ mcp/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py              # BaseMCPServer โ€” circuit breaker + retry
โ”‚   โ”‚   โ”œโ”€โ”€ github_server.py     # GitHub REST API wrapper
โ”‚   โ”‚   โ”œโ”€โ”€ notion_server.py     # Notion API wrapper
โ”‚   โ”‚   โ””โ”€โ”€ slack_server.py      # Slack API wrapper
โ”‚   โ”œโ”€โ”€ graph/
โ”‚   โ”‚   โ””โ”€โ”€ state.py             # AgentForgeState TypedDict
โ”‚   โ”œโ”€โ”€ tests/                   # Full test suite (all mocked, no API keys needed)
โ”‚   โ”œโ”€โ”€ orchestrator.py          # LLM task decomposer + parallel execution
โ”‚   โ”œโ”€โ”€ eval_engine.py           # Prediction logging + precision/recall
โ”‚   โ”œโ”€โ”€ approval.py              # Human-in-the-loop approval gate
โ”‚   โ”œโ”€โ”€ config.py                # YAML config + LLM factory + Pydantic settings
โ”‚   โ”œโ”€โ”€ logger.py                # Rich console logger with agent colors
โ”‚   โ””โ”€โ”€ cli.py                   # Typer CLI entrypoint
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ› ๏ธ Tech Stack

Technology Purpose
Python 3.11+ Core runtime
LangChain LLM orchestration with structured output
Groq / OpenAI Pluggable LLM providers via get_llm() factory
Pydantic v2 Schema validation, field normalization
asyncio Async execution, to_thread for blocking MCP calls
httpx HTTP client for MCP server API calls
tenacity Retry logic + circuit breaker for API resilience
Rich Beautiful terminal UI, colored logs, approval prompts
Typer CLI framework
pytest + pytest-asyncio Async-aware testing

๐Ÿ“ License

MIT License โ€” see LICENSE for details.


Built with โค๏ธ by Om Rajput

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentforge_ai-0.1.0.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentforge_ai-0.1.0-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file agentforge_ai-0.1.0.tar.gz.

File metadata

  • Download URL: agentforge_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentforge_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dfc2101f76f4ad43bcc3c6a27c5e52b249cea2f48c75f9fc9935126d3d0f3ff0
MD5 a44f9c434db0c4816653347487590b4a
BLAKE2b-256 23ec23314e8948a447b7a1045448c3cd9d42f45bc792a458b5cbc4c013f99b0e

See more details on using hashes here.

File details

Details for the file agentforge_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentforge_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentforge_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d646ed8ead289a63d5520c98c97c142cb8960a3fa81a3ba023387507dae44f37
MD5 d52edd56864f01a8e6426b59510d876a
BLAKE2b-256 fe27b42f0dab8546eb5fb21a8432ccb32e55326cc1ae6396c221fd26d604aec7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page