Multi-agent orchestration system built with Microsoft Agent Framework's Magentic Fleet pattern
Project description
⚠️ Active Development Notice APIs, signatures, and execution semantics can change between minor versions. Pin a version tag for production usage.
AgenticFleet – DSPy‑Enhanced Multi‑Agent Orchestration
AgenticFleet is a hybrid DSPy + Microsoft agent-framework runtime that delivers a self‑optimizing fleet of specialized AI agents. DSPy handles task analysis, routing, progress & quality assessment; agent-framework provides robust orchestration primitives, event streaming, and tool execution. Together they enable delegated, sequential, parallel, and handoff‑driven workflows with iterative refinement loops.
Table of Contents
- AgenticFleet – DSPy‑Enhanced Multi‑Agent Orchestration
- Table of Contents
- Key Features
- Architecture Overview
- Directory Layout
- Installation
- Configuration & Environment
- Quick Start
- Execution Modes
- Agents
- DSPy Optimization
- Observability & History
- Azure Cosmos DB Integration
- Evaluation & Self-Improvement
- Testing & Quality
- Troubleshooting
- Contributing
- License
- Acknowledgments
- Related Documentation
Key Features
- Adaptive Routing – DSPy reasoner analyzes tasks and decides agent roster + execution mode (delegated / sequential / parallel).
- Advanced Reasoning – Pluggable strategies per agent: ReAct for autonomous tool loops (Researcher) and Program of Thought for code-based logic (Analyst).
- Quality Loops – Automatic Judge / Reviewer refinement when quality score drops below configurable threshold.
- Tool‑Aware Decisions – Signatures include tool context; Reasoner recommends tool usage (code interpreter, search, browser, etc.).
- Streaming Events – Emits OpenAI Responses‑compatible events for real‑time TUI / web UI updates.
- Self‑Improvement – GEPA + BootstrapFewShot compilation refines routing from curated examples & execution history.
- YAML‑Driven – Central
workflow_config.yamlgoverns models, thresholds, agents, tracing, evaluation toggles. - Rich Ergonomics – Typer CLI (
cli/console.py),dspy-fleetcommand, optional Vite frontend, history analytics scripts. - Safe Fallbacks – Graceful degradation when DSPy unavailable (heuristic routing & quality scoring).
- Extensible Toolkit – Add agents, tools, signatures, evaluation metrics with minimal boilerplate.
- Azure Cosmos DB Persistence (optional) – Set one flag to mirror workflow runs, agent memories, DSPy datasets, and cache metadata into Cosmos NoSQL for durable, queryable telemetry.
Architecture Overview
Four‑phase pipeline:
Task → [1] DSPy Analysis → [2] DSPy Routing → [3] Agent Execution → [4] Quality / Judge Assessment → (Optional Refinement)
| Phase | Responsibility | Source |
|---|---|---|
| Analysis | Extract goals, complexity, constraints | dspy_modules/reasoner.py (analyze_task) |
| Routing | Pick agents + execution mode, tools | dspy_modules/reasoner.py (route_task) |
| Execution | Orchestrate agents & tools; stream events | workflows/supervisor.py |
| Quality | Score output, recommend improvements | dspy_modules/reasoner.py (assess_quality + Judge) |
Workflow Diagram
graph TD
A[Task input] --> B[DSPy analysis]
B --> C[DSPy routing]
C --> D1[Agent execution delegated]
C --> D2[Agent execution sequential]
C --> D3[Agent execution parallel]
D1 --> E[Quality assessment]
D2 --> E
D3 --> E
E --> F[Final output]
E --> G[Refinement loop]
G --> F
Refinement triggers when score < threshold (default 8 or judge threshold ≥ 7). Handoffs coordinate multi‑agent chains via HandoffManager (in workflows/handoff.py).
Consult: docs/developers/architecture.md & docs/guides/quick-reference.md.
Latency & Slow Phases
Common bottlenecks and how to mitigate:
- DSPy compilation on first run
- Use cached compiled reasoner after first run; clear via
uv run python -m agentic_fleet.scripts.manage_cache --clear - Reduce GEPA effort in
config/workflow_config.yaml(e.g.,gepa_max_metric_calls,max_bootstrapped_demos) - Set
DSPY_COMPILE=falseduring rapid iteration
- Use cached compiled reasoner after first run; clear via
- External tool calls (OpenAI, Tavily, Hosted Interpreter)
- Prefer lighter Reasoner model
dspy.model: gpt-5-mini - Disable pre‑analysis tool usage for simple tasks
- Prefer lighter Reasoner model
- Judge/refinement loops
- Set
quality.max_refinement_rounds: 1 - Use
judge_reasoning_effort: minimal
- Set
- Parallel fan‑out synthesis
- Cap
execution.max_parallel_agentsto a small number - Enable streaming to surface progress early
- Cap
- History and tracing I/O
- Reduce verbosity in production; batch writes if needed
For timing analysis, run history analytics: uv run python src/agentic_fleet/scripts/analyze_history.py --timing.
Backend API & Performance
Recent optimizations have significantly improved the responsiveness and scalability of the backend API:
- Non-Blocking Architecture: Heavy background tasks, such as self-improvement analysis and DSPy compilation, are now offloaded to separate threads. This prevents blocking the main asyncio event loop, ensuring the API remains responsive to new requests even under load.
- Job Store Abstraction: Background job state is no longer tied to a global variable. A pluggable
JobStoreinterface (currently implemented asInMemoryJobStore) allows for easy future extension to persistent stores like Redis or Azure Cosmos DB. - Performance Benchmarking: A dedicated benchmark script (
scripts/benchmark_api.py) is available to rigorously measure API latency, throughput, and error rates under concurrent load. - Real Models Only: DSPy now requires a real model ID (e.g.,
gpt-5-mini); the previoustest-model/DummyLM path has been removed to avoid mock outputs during production runs.
Directory Layout
| Path | Purpose |
|---|---|
config/workflow_config.yaml |
Models, agents, thresholds, tracing, evaluation flags |
src/agentic_fleet/dspy_modules/ |
DSPy Signatures & Reasoner implementation |
src/agentic_fleet/workflows/ |
Flattened orchestration logic (supervisor.py, handoff.py, strategies.py) |
src/agentic_fleet/agents/ |
Specialist configurations, factory, and prompts (prompts.py) |
src/agentic_fleet/api/ |
FastAPI backend, DB models (api/db), settings (api/settings.py) |
src/agentic_fleet/tools/ |
Tool adapters: Tavily, Browser, Hosted Interpreter, MCP |
src/agentic_fleet/utils/ |
Compiler cache, GEPA optimizer, history, tracing, registry |
src/agentic_fleet/evaluation/ |
Metrics & evaluator engine |
src/agentic_fleet/cli/console.py |
Rich / Typer CLI (dspy-fleet) |
examples/ |
Minimal workflow samples |
scripts/ |
Analysis, self-improvement, benchmarking, dataset gen |
logs/ |
Execution history, compilation artifacts |
frontend/ |
Optional Vite + React streaming UI |
Installation
Python (uv recommended)
git clone https://github.com/Qredence/agentic-fleet.git
cd agentic-fleet
# Create and sync a local environment from pyproject.toml
uv sync
Standard pip
# From PyPI (library / CLI usage)
pip install agentic-fleet
# From a local clone (editable install)
pip install -e .
Optional Frontend
make frontend-install # installs Node dependencies
make dev # runs backend + frontend dev servers
Playwright (Browser Tool)
playwright install chromium
Configuration & Environment
Create .env (or copy .env.example):
OPENAI_API_KEY=sk-...
# Required for all model calls (validated at startup)
TAVILY_API_KEY=tvly-...
# Optional: Enables web search for Researcher agent
DSPY_COMPILE=true # Toggle DSPy compilation (true/false)
OPENAI_BASE_URL=https://...
# Optional custom endpoint
LANGFUSE_PUBLIC_KEY=...
LANGFUSE_SECRET_KEY=...
# Optional Azure Cosmos DB mirroring
AGENTICFLEET_USE_COSMOS=0
AZURE_COSMOS_ENDPOINT=https://<account>.documents.azure.com:443/
AZURE_COSMOS_USE_MANAGED_IDENTITY=0
AZURE_COSMOS_KEY=<primary-or-secondary-key>
AZURE_COSMOS_DATABASE=agentic-fleet
AGENTICFLEET_DEFAULT_USER_ID=local-dev
# Container overrides (use defaults unless you renamed them)
# AZURE_COSMOS_WORKFLOW_RUNS_CONTAINER=workflowRuns
# AZURE_COSMOS_AGENT_MEMORY_CONTAINER=agentMemory
# AZURE_COSMOS_DSPY_EXAMPLES_CONTAINER=dspyExamples
# AZURE_COSMOS_DSPY_OPTIMIZATION_RUNS_CONTAINER=dspyOptimizationRuns
# AZURE_COSMOS_CACHE_CONTAINER=cache
Note: The OPENAI_API_KEY environment variable is required and will be validated at startup. If missing, the application will fail with a clear error message.
Key YAML knobs (workflow_config.yaml):
dspy.model– Reasoner model (e.g. gpt-5-mini)dspy.optimization.metric_threshold– Minimum routing accuracyworkflow.supervisor.max_rounds– Conversation turn limitworkflow.supervisor.enable_streaming– Event streaming toggleagents.*– Per-agent model + temperature + toolsevaluation.*– Batch evaluation settings
Configuration Validation: The YAML configuration is automatically validated against a schema when loaded. Invalid values (e.g., out-of-range temperatures, invalid model names) will raise ConfigurationError with clear messages indicating which field failed validation. Cosmos mirroring is best-effort—if environment variables are missing or the containers are unreachable, workflows continue locally and you’ll see a warning in the logs.
Quick Start
TUI / CLI
agentic-fleet # Launch interactive console (packaged entry point)
# Process a single task with streaming
agentic-fleet run -m "What is Gemini 3 Pro?" --verbose
# List available agents and their tools
agentic-fleet list-agents
# Run batch evaluation (uses config dataset by default)
agentic-fleet evaluate --max-tasks 5
# Module-style invocation (alternative)
python -m agentic_fleet.cli.console --help
Python API
import asyncio
from agentic_fleet.workflows import create_supervisor_workflow
async def main():
workflow = await create_supervisor_workflow(compile_dspy=True)
result = await workflow.run("Summarize transformer architecture evolution")
print(result["result"]) # final output
print(result["quality"]) # quality assessment details
asyncio.run(main())
Backend API
Start the FastAPI backend server:
./start_backend.sh
# Server runs at http://localhost:8000
# API Docs: http://localhost:8000/api/docs
Run the automated performance benchmark:
# Requires the backend to be running
python scripts/benchmark_api.py
Streaming
async for event in workflow.run_stream("Compare AWS vs Azure AI offerings"):
# Handle MagenticAgentMessageEvent / WorkflowOutputEvent
print(event)
Execution Modes
| Mode | Description | Use Case |
|---|---|---|
| Delegated | Single agent manages entire task | Focused research, simple writeups |
| Sequential | Output of one feeds next | Research → Analyze → Write report |
| Parallel | Multiple agents concurrently; synthesis afterwards | Multi‑source comparisons |
| Handoff Chains | Explicit role transitions with artifacts | Complex coding + verification flows |
Reasoner chooses based on task structure + examples; can be overridden via configuration or future explicit flags.
Agents
Core specialists: Researcher, Analyst, Writer, Reviewer, Judge (quality). Extended handoff specialists: Planner, Executor, Coder, Verifier, Generator.
See AGENTS.md for detailed roles, tool usage, configuration examples, and selection guidelines.
DSPy Optimization
Training examples live in src/agentic_fleet/data/supervisor_examples.json:
{
"task": "Research the latest AI advances",
"team": "Researcher: web search\nAnalyst: code + data",
"assigned_to": "Researcher,Analyst",
"mode": "sequential"
}
Compilation (BootstrapFewShot + GEPA) occurs on first run (if DSPY_COMPILE=true). Cache stored under logs/compiled_supervisor.pkl. Refresh via:
uv run python -m agentic_fleet.scripts.manage_cache --clear
Observability & History
- History: Structured events appended to
logs/execution_history.jsonl. - Tracing: Enable OpenTelemetry in YAML; export to AI Toolkit / OTLP endpoint.
- Logging: Adjustable log level via env (
AGENTIC_FLEET_LOG_LEVEL=DEBUG). - Analysis:
scripts/analyze_history.py --allsurfaces aggregate metrics.
Evaluation & Self-Improvement
Run batch evaluations against curated tasks:
uv run python -m agentic_fleet.cli.console analyze --dataset data/evaluation_tasks.jsonl
Generate evaluation datasets from history:
uv run python scripts/create_history_evaluation.py
Self‑improve routing by folding high‑quality history examples back into DSPy training:
uv run python scripts/self_improve.py --max 50
Testing & Quality
make check # lint + format (Ruff), type‑check (ty)
make test # run pytest suite
PYTHONPATH=. uv run pytest tests/workflows/test_execution_strategies.py -q
Key test domains: routing accuracy, tool registry integration, judge refinement, lazy compilation, tracing hooks.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Missing web citations | TAVILY_API_KEY unset |
Export key or set in .env |
| Workflow startup fails | Missing OPENAI_API_KEY |
Set OPENAI_API_KEY in .env or environment |
| Configuration error | Invalid YAML values | Check workflow_config.yaml for invalid values (e.g., temperature > 2.0) |
| Slow first run | DSPy compilation | Enable cache; reduce max_bootstrapped_demos |
| No streaming output | enable_streaming=false |
Toggle in YAML |
| Low quality score | Insufficient examples | Add training examples; rerun compilation |
| Tool warning | Name mismatch | Verify tool name & registry entry |
Detailed guides: docs/users/troubleshooting.md, docs/guides/dspy-optimizer.md.
Contributing
- Fork / branch (
breaking-refactorfor large changes) - Add or update tests (prefer focused unit tests over broad integration when possible)
- Run
make checkand ensure no style / type errors - Update docs (README, AGENTS.md, or relevant guide) for user‑visible changes
- Submit PR with clear rationale & architectural notes (link to
docs/developers/architecture.mdsections if modifying internals)
Please see: docs/developers/contributing.md.
License
MIT License – see LICENSE.
Acknowledgments
- Microsoft agent-framework – Orchestration, events & tool interfaces
- DSPy (Stanford NLP) – Prompt optimization & structured signatures
- Tavily – Reliable, citation‑rich web search
- OpenAI Responses – Event paradigm enabling unified CLI/TUI/frontend streaming
Related Documentation
- Getting Started:
docs/users/getting-started.md - Configuration:
docs/users/configuration.md - Architecture Deep Dive:
docs/developers/architecture.md - Quick Reference:
docs/guides/quick-reference.md - DSPy Optimization:
docs/guides/dspy-optimizer.md - Evaluation:
docs/guides/evaluation.md - Tracing:
docs/guides/tracing.md - Self Improvement:
docs/users/self-improvement.md - Troubleshooting:
docs/users/troubleshooting.md - Cosmos Requirements:
cosmosdb_requirements.md - Cosmos Data Model:
cosmosdb_data_model.md
Happy hacking! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_fleet-0.6.2.tar.gz.
File metadata
- Download URL: agentic_fleet-0.6.2.tar.gz
- Upload date:
- Size: 387.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1cb585f6faba1a8c1fcdac255dde00c728bbbb46180ff6451798a3ba8c3ef0d
|
|
| MD5 |
ce477a944ba03f6eb59b31717426109f
|
|
| BLAKE2b-256 |
5a50d6684d7d8878a0f94eaaa9ad8c1c352d0e2c7908c94d6b8653397a7aa523
|
File details
Details for the file agentic_fleet-0.6.2-py3-none-any.whl.
File metadata
- Download URL: agentic_fleet-0.6.2-py3-none-any.whl
- Upload date:
- Size: 228.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
607821657bd42a831178afbf37631625d5959b6df2910e01c3a4a4075060a2da
|
|
| MD5 |
2e725d43b65a472ce15dce86bc9e4a64
|
|
| BLAKE2b-256 |
90a3c5460d006a371060fe37b22511d52837856006aaf92951862be4487dd47f
|