Control plane for multi-agent vetted decisionmaking across org knowledge and channels
Project description
Aragora
Aragora orchestrates 42 AI agents to adversarially vet decisions through structured debate, delivering audit-ready decision receipts. Built for enterprises where AI decisions carry real consequences.
The Decision Integrity Platform
New here? Start with the Getting Started Guide -- you'll have a working demo in 30 seconds.
Individual LLMs are unreliable. Their personas shift with context, their confidence doesn't correlate with accuracy, and they say what you want to hear. For consequential decisions, you need infrastructure that treats this as a feature to be engineered around, not a problem to be ignored.
Aragora orchestrates 42 agent types in structured adversarial debates -- forcing models to challenge each other's reasoning, surface blind spots, and produce decisions with complete audit trails showing where they agreed, where they disagreed, and why.
Try It Now
pip install aragora
# Zero-config demo — runs a full adversarial debate, no API keys needed
aragora demo
# Or run the guided quickstart (opens receipt in your browser)
aragora quickstart --demo
Or run with Docker (includes dashboard UI):
docker compose -f docker-compose.quickstart.yml up
# Open http://localhost:3000
What you'll see (click to expand)
================================================================
ARAGORA DEMO -- Adversarial Decision Stress-Test
================================================================
Topic: Should we adopt microservices?
Agents: Analyst, Critic, Synthesizer, Devil's Advocate
Rounds: 2
--- Round 1 --------------------------------------------------
[ANALYST] (supportive)
This is a sound strategy. The evidence points toward
significant gains in maintainability and team productivity.
[CRITIC] (critical)
The claimed benefits are overstated. Most organizations
underestimated the operational burden by 3-5x. I recommend
a modular monolith as the safer path.
[SYNTHESIZER] (balanced)
The tradeoffs here are real. On one hand, the current
architecture limits independent scaling. On the other,
the migration carries execution risk.
--- Decision Receipt -----------------------------------------
Verdict: CONDITIONAL APPROVAL
Confidence: 72%
Consensus: Partial (3 of 4 agents)
Dissent: Devil's Advocate flagged migration risk
# Review your current changes against main
git diff main | aragora review --demo
# Or review a GitHub PR
aragora review --pr https://github.com/org/repo/pull/123 --demo
# Stress-test a specification
aragora gauntlet spec.md --profile thorough --output receipt.html
# Run a multi-agent debate
aragora ask "Design a rate limiter for 1M req/sec" --agents anthropic-api,openai-api,gemini
# Start the API server
aragora serve
Add to Your CI Pipeline (1 minute)
# .github/workflows/aragora-review.yml
name: Aragora Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: an0mium/aragora@main
with:
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
Or generate it automatically: aragora init --ci github
Five Pillars
Aragora is built on five architectural commitments designed for a world where individual AI agents cannot be trusted with consequential decisions alone.
1. SMB-Ready, Enterprise-Grade
Aragora is useful to a 5-person startup on day one and scales to regulated enterprise without rearchitecting. Enterprise features -- OIDC/SAML SSO, MFA, AES-256-GCM encryption, multi-tenant isolation, RBAC with 7 roles and 360+ permissions, SOC 2 / GDPR / HIPAA compliance frameworks -- are built in, not bolted on. Security hardening (rate limiting, SSRF protection, path traversal guards, input validation, audit trails) is the default, not a premium tier.
2. Leading-Edge Memory and Context
Single agents lose context. Aragora's 4-tier Continuum Memory (fast / medium / slow / glacial) and Knowledge Mound with 33 registered adapters give every debate access to institutional history, cross-session learning, and evidence provenance. The RLM (Recursive Language Models) system compresses and structures context to reduce prompt bloat, enabling debates that sustain coherence across long multi-round sessions and large document sets where individual models would degrade.
3. Extensible and Modular
Connectors for Slack, Teams, Discord, Telegram, WhatsApp, email, voice, Kafka, RabbitMQ, GitHub, Jira, Salesforce, healthcare HL7/FHIR, and dozens more. SDKs in Python and TypeScript (140 namespaces in the TypeScript SDK). 2,000+ API operations across 1,800+ paths and 190+ WebSocket event types. OpenClaw integration for portable agent governance. A workflow engine with DAG execution and 50+ templates. A marketplace for agent personas, debate templates, and workflow patterns. Aragora adapts to your stack -- not the other way around.
4. Multi-Agent Robustness
Individual LLMs exhibit persona instability -- their outputs shift based on framing, context, and even prompt ordering. Aragora treats this as a feature: by running Claude, GPT, Gemini, Grok, Mistral, DeepSeek, Qwen, Kimi, and local models in structured Propose / Critique / Revise debates, the system surfaces disagreements that reveal genuine uncertainty. ELO rankings track agent performance. Calibration scoring (Brier scores) measures prediction accuracy. The Trickster detects hollow consensus where models agree without genuine reasoning. The result: when models with different training data independently converge on an answer, that convergence is meaningful -- and when they disagree, the dissent trail tells you exactly where human judgment is needed.
5. Self-Healing and Self-Extending
The Nomic Loop is Aragora's autonomous self-improvement system: agents debate improvements to the codebase, design solutions, implement code, run tests, and verify changes -- with human approval gates and automatic rollback on failure. This is how Aragora grew from a debate engine to 3,200+ modules. Red-team mode stress-tests the platform's own specs. The Gauntlet runs adversarial attacks against proposed changes. The system hardens itself.
Why Aragora?
A single LLM will confidently give you a wrong answer and you won't know it. Research shows that LLM personas are context-dependent, fragile under adversarial pressure, and prone to sycophantic agreement with whoever is asking. Stanford's taxonomy of LLM reasoning failures documents systematic breakdowns in formal logic, unfaithful chain-of-thought, and robustness failures under minor prompt variations -- exactly the failure modes that structured adversarial debate is designed to surface. When the decision matters -- hiring, architecture, compliance, strategy -- one model's opinion is insufficient.
Aragora treats each model as an unreliable witness and uses structured debate protocols to extract signal from their disagreements:
| What you get | How it works |
|---|---|
| Adversarial Validation | Models with different training data and blind spots challenge each other's reasoning |
| Decision Receipts | Cryptographic audit trails with evidence chains, dissent tracking, and confidence calibration |
| Gauntlet Mode | Red-team stress-tests for specs, policies, and architectures using adversarial personas |
| Calibrated Trust | ELO rankings and Brier scores track which models are actually reliable on which domains |
| Institutional Memory | Decisions persist across sessions with 4-tier memory and Knowledge Mound (33 adapters) |
| Channel Delivery | Results route to Slack, Teams, Discord, Telegram, WhatsApp, email, or voice |
Quick Start
1. Install and Try It (30 seconds)
pip install aragora
# Run a zero-config demo debate — opens receipt in your browser
aragora quickstart --demo
# Or review your uncommitted changes — no API keys needed in demo mode
git diff main | aragora review --demo
See docs/QUICKSTART_DEVELOPER.md for the full developer quickstart.
2. Run Debates and Start the Server
# Set at least one API key
export ANTHROPIC_API_KEY=your-key # or OPENAI_API_KEY, GEMINI_API_KEY, XAI_API_KEY
# Run a multi-agent debate
aragora ask "Should we adopt microservices?" --agents anthropic-api,openai-api --rounds 3
# Start the API server
aragora serve
See docs/guides/GETTING_STARTED.md for the complete 5-minute setup.
3. Deploy with Docker
# Clone and deploy
git clone https://github.com/an0mium/aragora && cd aragora
# Production deployment (secrets from AWS Secrets Manager)
cd deploy/liftmode && ./setup.sh
# Or run directly with Docker Compose
docker compose -f deploy/docker-compose.yml up -d
See docs/DEPLOYMENT.md for full deployment options (Docker, Kubernetes, offline mode).
4. Develop with the SDK
| Package | Install | Purpose | PyPI |
|---|---|---|---|
aragora |
pip install aragora |
Full platform (server, CLI, debate engine) | v2.6.3 |
aragora-debate |
pip install aragora-debate |
Standalone debate engine (no server needed) | v0.2.0 |
aragora-sdk |
pip install aragora-sdk |
Python client SDK for connecting to aragora | v2.6.3 |
@aragora/sdk |
npm install @aragora/sdk |
TypeScript/Node.js client SDK | — |
Core Workflows
1. Gauntlet Mode -- Adversarial Stress Testing
Stress-test specs, architectures, and policies before they ship:
aragora gauntlet spec.md --input-type spec --profile quick
aragora gauntlet policy.yaml --input-type policy --persona gdpr
aragora gauntlet architecture.md --profile thorough --output report.html
| Attack Type | What It Tests |
|---|---|
| Red Team | Security holes, injection points, auth bypasses |
| Devil's Advocate | Logic flaws, hidden assumptions, edge cases |
| Scaling Critic | Performance bottlenecks, SPOF, thundering herd |
| Compliance | GDPR, HIPAA, SOC 2, AI Act violations |
Decision receipts provide cryptographic audit trails for every finding.
2. AI Code Review
Get multi-model consensus on your pull requests:
git diff main | aragora review
aragora review https://github.com/owner/repo/pull/123
aragora review --demo # try without API keys
When 3+ independent models with different training data agree on an issue, that convergence is meaningful. Split opinions show where human judgment is needed -- the disagreement is the signal.
3. Structured Debates
The debate protocol follows thesis > antithesis > synthesis:
- Propose -- Agents generate initial responses from different perspectives
- Critique -- Agents challenge each other's proposals with severity scores
- Revise -- Proposers incorporate valid critiques
- Synthesize -- Judge combines best elements into a final answer
Configurable consensus: majority, unanimous, judge-based, or none.
Architecture
aragora/
├── debate/ # Core debate engine (210+ modules)
│ ├── orchestrator.py # Arena -- main debate loop
│ ├── consensus.py # Consensus detection and proofs
│ ├── convergence.py # Semantic similarity detection
│ └── phases/ # Propose, critique, revise, vote, judge
├── agents/ # 42 registered agent types (CLI, direct API, OpenRouter, local)
│ ├── api_agents/ # Anthropic, OpenAI, Gemini, Grok, Mistral, OpenRouter
│ ├── cli_agents.py # Claude Code, Codex, Gemini CLI, Grok CLI
│ └── fallback.py # OpenRouter fallback on quota errors
├── gauntlet/ # Adversarial stress testing
├── knowledge/ # Knowledge Mound with 33 registered adapters
├── memory/ # 4-tier memory (fast/medium/slow/glacial)
├── server/ # 2,000+ API operations, 190+ WebSocket event types
├── pipeline/ # Decision-to-PR generation
├── genesis/ # Fractal debates, agent evolution
├── sandbox/ # Docker-based safe execution
├── rbac/ # Role-based access control (7 roles, 360+ permissions)
├── compliance/ # SOC 2, GDPR, HIPAA frameworks
└── workflow/ # DAG-based automation engine
Scale: 3,200+ Python modules | 140,000+ tests
Performance and Costs
| Metric | Typical Value |
|---|---|
| Debate latency (3 agents, 2 rounds) | 30-90 seconds |
| Token usage per debate | 8,000-25,000 tokens |
| Estimated cost per debate | $0.05-$0.30 (depends on models) |
| Concurrent debates supported | 50+ (configurable) |
| API response time (cached) | < 200ms |
| Memory tier lookup (fast tier) | < 10ms |
Costs vary by model mix. Claude Haiku + GPT-4o-mini debates cost ~$0.05; Claude Opus + GPT-4 debates cost ~$0.30. Use aragora decide --dry-run to preview costs before execution.
How Aragora Compares
| Capability | Aragora | LangGraph | CrewAI | AutoGen |
|---|---|---|---|---|
| Adversarial debate protocol | Built-in (propose/critique/revise) | Manual | No | No |
| Decision receipts with audit trail | Cryptographic, SHA-256 | No | No | No |
| Agent calibration (ELO + Brier) | Built-in | No | No | No |
| Multi-model consensus | 42 agent types, 10+ providers | Single-provider | Single-provider | Multi-provider |
| Gauntlet stress testing | Built-in CLI | No | No | No |
| Enterprise security (SSO, RBAC, encryption) | Production-ready | No | No | No |
| Self-improvement (Nomic Loop) | Autonomous with safety gates | No | No | No |
| Knowledge persistence (33 adapters) | 4-tier memory + Knowledge Mound | Custom | Custom | Custom |
| Channel delivery (Slack, Teams, etc.) | 8 channels built-in | No | No | No |
Programmatic Usage
from aragora import Arena, Environment, DebateProtocol
from aragora.agents import create_agent
agents = [
create_agent("anthropic-api", name="claude", role="proposer"),
create_agent("openai-api", name="gpt", role="critic"),
create_agent("gemini", name="gemini", role="synthesizer"),
]
env = Environment(task="Design a distributed cache with LRU eviction")
protocol = DebateProtocol(rounds=3, consensus="majority")
arena = Arena(env, agents, protocol)
result = await arena.run()
print(result.final_answer)
print(f"Consensus: {result.consensus_reached} ({result.confidence:.0%})")
Python SDK
from aragora.client import AragoraClient
client = AragoraClient(base_url="http://localhost:8080")
debate = client.debates.run(task="Should we adopt microservices?")
receipt = await client.gauntlet.run_and_wait(input_content="spec.md")
See docs/SDK_GUIDE.md for the full API.
Channels and Integrations
Aragora delivers debate results to wherever your team works:
| Channel | Status |
|---|---|
| Slack | Bot + OAuth |
| Microsoft Teams | Bot + OAuth |
| Discord | Interactions API |
| Telegram | Bot API |
| Business API | |
| SMTP + Gmail + Outlook | |
| Voice | TTS integration |
| Webhooks | Custom delivery |
Results automatically route to the originating channel via bidirectional chat routing.
See docs/integrations/INTEGRATIONS.md for setup.
Enterprise Features
| Category | Capabilities |
|---|---|
| Authentication | OIDC/SAML SSO, MFA (TOTP/HOTP), API key management, SCIM 2.0 |
| Multi-Tenancy | Tenant isolation, resource quotas, usage metering |
| Security | AES-256-GCM encryption, rate limiting, SSRF protection, key rotation |
| Compliance | SOC 2 controls, GDPR support, HIPAA, audit trails |
| Observability | Prometheus metrics, Grafana dashboards, OpenTelemetry tracing |
| RBAC | 7 roles, 360+ permissions, decorator-based enforcement |
| Backup | Incremental backups, retention policies, disaster recovery |
| Control Plane | Agent registry, task scheduler, health monitoring, policy governance |
See docs/enterprise/ENTERPRISE_FEATURES.md for details.
Self-Improvement (Nomic Loop)
Aragora includes an autonomous self-improvement system where agents debate and implement improvements to the codebase itself. Experimental -- always run in a sandbox with human review.
python scripts/run_nomic_with_stream.py run --cycles 3
python scripts/self_develop.py --goal "Improve test coverage" --require-approval
Safety: automatic backups, protected file checksums, rollback on failure, human approval gates.
Deployment
| Goal | Command | Requirements |
|---|---|---|
| Try it | docker compose -f docker-compose.quickstart.yml up |
Docker only |
| Self-hosted | cd deploy/self-hosted && docker compose up -d |
Docker + API key |
| Local dev | aragora serve --api-port 8080 --ws-port 8765 |
Python + API key |
See deploy/README.md for the full deployment guide.
API: REST at /api/v2/* | WebSocket at /ws | OpenAPI at /api/openapi
Security
- Ed25519 signature verification for webhooks (Discord, Slack)
- Rate limiting (IP, token, and endpoint-based)
- Input validation and content-length enforcement
- CORS allowlists, security headers, error message sanitization
- Path traversal protection, upload validation with magic byte checking
- WebSocket message limits (64KB), debate timeouts, backpressure control
See docs/enterprise/SECURITY.md and docs/enterprise/COMPLIANCE.md.
Documentation
| Need | Where |
|---|---|
| Developer quickstart | QUICKSTART_DEVELOPER.md |
| First-time setup | GETTING_STARTED.md |
| API reference | API_REFERENCE.md |
| SDK guide | SDK_GUIDE.md |
| Enterprise features | ENTERPRISE_FEATURES.md |
| Gauntlet guide | GAUNTLET.md |
| Agent catalog | AGENTS.md |
| Feature discovery | FEATURE_DISCOVERY.md |
| Extended README | EXTENDED_README.md |
| Full index | INDEX.md |
Inspiration and Citations
Aragora synthesizes ideas from these open-source projects:
- Stanford Generative Agents -- Memory + reflection architecture
- ChatArena -- Multi-agent interaction environments
- LLM Multi-Agent Debate -- ICML 2024 consensus mechanisms
- ai-counsel -- Semantic convergence detection (MIT)
- DebateLLM -- Agreement intensity modulation (Apache 2.0)
- claude-flow -- Adaptive topology switching (MIT)
- LLM Reasoning Failures -- Stanford taxonomy of systematic reasoning breakdowns (Song et al. 2026)
See the full attribution table in docs/reference/CREDITS.md.
Contributing
Contributions welcome. Areas of interest:
- Additional agent backends
- Debate visualization
- Benchmark datasets for agent evaluation
- Lean 4 theorem proving integration
License
MIT
The name "aragora" evokes the Greek agora -- the public assembly where citizens debated and reached collective decisions through reasoned discourse.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aragora-2.7.4.tar.gz.
File metadata
- Download URL: aragora-2.7.4.tar.gz
- Upload date:
- Size: 12.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
952cd7e7772f1259d39e505d7065ffdea4285fb0bd102d30f5b69500c7de1b6b
|
|
| MD5 |
7fdbfdf9f6cc320eae7ec7a896f3038d
|
|
| BLAKE2b-256 |
7b537f4bea6fb0384b7b1546ef93b024dc1126a6bf594126a6fe2edb733e83e1
|
File details
Details for the file aragora-2.7.4-py3-none-any.whl.
File metadata
- Download URL: aragora-2.7.4-py3-none-any.whl
- Upload date:
- Size: 12.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a61448cc6f1cd4b75fdb8cc6f7015ea0b0d15215f2d68f7aa419c0389976bb9
|
|
| MD5 |
5a26c0b6a59df604c20197dc1c4aebeb
|
|
| BLAKE2b-256 |
256234c662c4d623a9ce2c861cd06b1cb8e1c4b491f014dedb9b1cd32c8675f8
|