ARGUS: A Debate-Native Multi-Agent AI Architecture for Accelerating Scientific Discovery
Project description
ARGUS
Agentic Research & Governance Unified System
A debate-native, multi-agent AI framework for evidence-based reasoning with structured argumentation, decision-theoretic planning, and full provenance tracking.
๐ฏ Overview
ARGUS implements Research Debate Chain (RDC) - a novel approach to AI reasoning that structures knowledge evaluation as multi-agent debates. Instead of single-pass inference, ARGUS orchestrates specialist agents that gather evidence, generate rebuttals, and render verdicts through Bayesian aggregation.
Key Innovations
- Conceptual Debate Graph (C-DAG): A directed graph structure where propositions, evidence, and rebuttals are nodes with signed edges representing support/attack relationships
- Evidence-Directed Debate Orchestration (EDDO): Algorithm for managing multi-round debates with stopping criteria
- Value of Information Planning: Decision-theoretic experiment selection using Expected Information Gain
- Full Provenance: PROV-O compatible ledger with hash-chain integrity for audit trails
โจ Features
๐ง Multi-Agent Debate System
- Moderator: Creates debate agendas, manages rounds, evaluates stopping criteria
- Specialist Agents: Domain-specific evidence gathering with hybrid retrieval
- Refuter: Generates counter-evidence and methodological critiques
- Jury: Aggregates evidence via Bayesian updating, renders verdicts
๐ Conceptual Debate Graph (C-DAG)
- Node Types: Propositions, Evidence, Rebuttals, Findings, Assumptions
- Edge Types: Supports, Attacks, Refines, Rebuts with signed weights
- Propagation: Log-odds Bayesian belief updating across the graph
- Visualization: Export to NetworkX for analysis
๐ Hybrid Retrieval System
- BM25 Sparse: Traditional keyword-based retrieval
- FAISS Dense: Semantic vector search with sentence-transformers
- Fusion Methods: Weighted combination or Reciprocal Rank Fusion (RRF)
- Cross-Encoder Reranking: Neural reranking for precision
๐ฏ Decision-Theoretic Planning
- Expected Information Gain (EIG): Monte Carlo estimation for experiment value
- VoI Planner: Knapsack-based optimal action selection under budget
- Calibration: Brier score, ECE, temperature scaling for confidence tuning
๐ Provenance & Governance
- PROV-O Compatible: W3C standard provenance model
- Hash-Chain Integrity: SHA-256 linked events for tamper detection
- Attestations: Cryptographic proofs for content integrity
- Query API: Filter events by entity, agent, time range
๐ LLM Provider Support
| Provider | Models | Features |
|---|---|---|
| OpenAI | GPT-4o, GPT-4, GPT-3.5 | Generate, Stream, Embed |
| Anthropic | Claude 3.5, Claude 3 | Generate, Stream |
| Gemini 1.5 Pro/Flash | Generate, Stream, Embed | |
| Ollama | Llama, Mistral, Phi | Local deployment |
๐ Installation
From PyPI (Recommended)
pip install argus-ai
From Source
git clone https://github.com/your-org/argus.git
cd argus
pip install -e ".[dev]"
Optional Dependencies
# For PDF processing
pip install argus-ai[pdf]
# For all features
pip install argus-ai[all]
๐ Quick Start
Basic Usage
from argus import RDCOrchestrator, get_llm
# Initialize with any supported LLM
llm = get_llm("openai", model="gpt-4o")
# Run a debate on a proposition
orchestrator = RDCOrchestrator(llm=llm, max_rounds=5)
result = orchestrator.debate(
"The new treatment reduces symptoms by more than 20%",
prior=0.5, # Start with 50/50 uncertainty
)
print(f"Verdict: {result.verdict.label}")
print(f"Posterior: {result.verdict.posterior:.3f}")
print(f"Evidence: {result.num_evidence} items")
print(f"Reasoning: {result.verdict.reasoning}")
Building a Debate Graph Manually
from argus import CDAG, Proposition, Evidence, EdgeType
from argus.cdag.nodes import EvidenceType
from argus.cdag.propagation import compute_posterior
# Create the graph
graph = CDAG(name="drug_efficacy_debate")
# Add the proposition to evaluate
prop = Proposition(
text="Drug X is effective for treating condition Y",
prior=0.5,
domain="clinical",
)
graph.add_proposition(prop)
# Add supporting evidence
trial_evidence = Evidence(
text="Phase 3 RCT showed 35% symptom reduction (n=500, p<0.001)",
evidence_type=EvidenceType.EMPIRICAL,
polarity=1, # Supports
confidence=0.9,
relevance=0.95,
quality=0.85,
)
graph.add_evidence(trial_evidence, prop.id, EdgeType.SUPPORTS)
# Add challenging evidence
side_effect = Evidence(
text="15% of patients experienced adverse events",
evidence_type=EvidenceType.EMPIRICAL,
polarity=-1, # Attacks
confidence=0.8,
relevance=0.7,
)
graph.add_evidence(side_effect, prop.id, EdgeType.ATTACKS)
# Compute Bayesian posterior
posterior = compute_posterior(graph, prop.id)
print(f"Posterior probability: {posterior:.3f}")
Document Ingestion & Retrieval
from argus import DocumentLoader, Chunker, EmbeddingGenerator
from argus.retrieval import HybridRetriever
# Load documents
loader = DocumentLoader()
doc = loader.load("research_paper.pdf")
# Chunk with overlap
chunker = Chunker(chunk_size=512, chunk_overlap=50)
chunks = chunker.chunk(doc)
# Create hybrid retriever
retriever = HybridRetriever(
embedding_model="all-MiniLM-L6-v2",
lambda_param=0.7, # Weight toward dense retrieval
use_reranker=True,
)
retriever.index_chunks(chunks)
# Search
results = retriever.retrieve("treatment efficacy results", top_k=10)
for r in results:
print(f"[{r.rank}] Score: {r.score:.3f} - {r.chunk.text[:100]}...")
Multi-Agent Debate
from argus import get_llm
from argus.agents import Moderator, Specialist, Refuter, Jury
from argus import CDAG, Proposition
llm = get_llm("anthropic", model="claude-3-5-sonnet-20241022")
# Initialize agents
moderator = Moderator(llm)
specialist = Specialist(llm, domain="clinical")
refuter = Refuter(llm)
jury = Jury(llm)
# Create debate
graph = CDAG()
prop = Proposition(text="The intervention is cost-effective", prior=0.5)
graph.add_proposition(prop)
# Moderator creates agenda
agenda = moderator.create_agenda(graph, prop.id)
# Specialists gather evidence
evidence = specialist.gather_evidence(graph, prop.id)
# Refuter challenges
rebuttals = refuter.generate_rebuttals(graph, prop.id)
# Jury renders verdict
verdict = jury.evaluate(graph, prop.id)
print(f"Verdict: {verdict.label} (posterior={verdict.posterior:.3f})")
๐ฅ๏ธ Command Line Interface
ARGUS provides a full CLI for common operations:
# Run a debate
argus debate "The hypothesis is supported by evidence" --prior 0.5 --rounds 3
# Quick evaluation (single LLM call)
argus evaluate "Climate change increases wildfire frequency"
# Ingest documents into index
argus ingest ./documents --output ./index
# Show configuration
argus config
# Specify provider
argus debate "Query" --provider anthropic --model claude-3-5-sonnet-20241022
โ๏ธ Configuration
Environment Variables
# LLM API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
# Default settings
export ARGUS_DEFAULT_PROVIDER="openai"
export ARGUS_DEFAULT_MODEL="gpt-4o"
export ARGUS_TEMPERATURE="0.7"
export ARGUS_MAX_TOKENS="2048"
# Ollama (local)
export ARGUS_OLLAMA_HOST="http://localhost:11434"
Programmatic Configuration
from argus import ArgusConfig, get_config
config = ArgusConfig(
default_provider="anthropic",
default_model="claude-3-5-sonnet-20241022",
temperature=0.5,
max_tokens=4096,
)
# Or get global config
config = get_config()
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ARGUS Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ Moderator โโโโโถโ Specialists โโโโโถโ Refuter โ โ
โ โ (Planner) โ โ (Evidence) โ โ (Challenges) โ โ
โ โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ C-DAG (Debate Graph) โ โ
โ โ โโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ โ
โ โ โProps โโโโโโถโ Evidence โโโโโโถโ Rebuttalsโ โ โ
โ โ โโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ โ
โ โ โฒ โ โ โ โ
โ โ โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ โ โ
โ โ Signed Influence Propagation โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Jury (Verdict) โ โ
โ โ Bayesian Aggregation โ Posterior โ Label โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ Knowledge Layer โ โ Decision Layer โ โ Provenance โ โ
โ โ โข Ingestion โ โ โข Bayesian โ โ โข PROV-O Ledger โ โ
โ โ โข Chunking โ โ โข EIG/VoI โ โ โข Hash Chain โ โ
โ โ โข Hybrid Index โ โ โข Calibration โ โ โข Attestations โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Module Overview
| Module | Description |
|---|---|
argus.core |
Configuration, data models, LLM abstractions |
argus.cdag |
Conceptual Debate Graph implementation |
argus.decision |
Bayesian updating, EIG, VoI planning, calibration |
argus.knowledge |
Document ingestion, chunking, embeddings, indexing |
argus.retrieval |
Hybrid retrieval, reranking, cite & critique |
argus.agents |
Moderator, Specialist, Refuter, Jury agents |
argus.provenance |
PROV-O ledger, integrity, attestations |
argus.orchestrator |
RDC orchestration engine |
๐ Algorithms
Signed Influence Propagation
The C-DAG uses log-odds space for numerically stable belief propagation:
posterior = ฯ(log-odds(prior) + ฮฃ signed_weight_i ร log(LR_i))
Where:
ฯis the sigmoid functionLR_iis the likelihood ratio for evidence isigned_weight = polarity ร confidence ร relevance ร quality
Expected Information Gain
For experiment planning, ARGUS computes EIG via Monte Carlo:
EIG(a) = H(p) - E_y[H(p|y)]
Where H is entropy and the expectation is over possible outcomes.
Calibration
Temperature scaling optimizes:
T* = argmin_T ฮฃ -y_i log(ฯ(z_i/T)) - (1-y_i) log(1-ฯ(z_i/T))
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=argus --cov-report=html
# Run specific test module
pytest tests/unit/test_cdag.py -v
# Run integration tests
pytest tests/integration/ -v
# Skip slow tests
pytest -m "not slow"
๐ Examples
Clinical Evidence Evaluation
from argus import RDCOrchestrator, get_llm
from argus.retrieval import HybridRetriever
# Load clinical literature
retriever = HybridRetriever()
retriever.index_chunks(clinical_chunks)
# Evaluate treatment claim
orchestrator = RDCOrchestrator(
llm=get_llm("openai", model="gpt-4o"),
max_rounds=5,
)
result = orchestrator.debate(
"Metformin reduces HbA1c by >1% in Type 2 diabetes",
prior=0.6, # Prior based on existing knowledge
retriever=retriever,
domain="clinical",
)
Research Claim Verification
from argus import CDAG, Proposition, Evidence
from argus.cdag.propagation import compute_all_posteriors
graph = CDAG(name="research_verification")
# Main claim
claim = Proposition(
text="Neural scaling laws predict emergent capabilities",
prior=0.5,
)
graph.add_proposition(claim)
# Add evidence from multiple papers
# ... (add supporting/attacking evidence)
# Compute all posteriors
posteriors = compute_all_posteriors(graph)
for prop_id, posterior in posteriors.items():
prop = graph.get_proposition(prop_id)
print(f"{prop.text[:50]}... : {posterior:.3f}")
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Citation
If you use ARGUS in your research, please cite:
@software{argus2024,
title={ARGUS: Agentic Research \& Governance Unified System},
author={ARGUS Team},
year={2024},
url={https://github.com/your-org/argus}
}
๐ Acknowledgments
- Inspired by debate-native reasoning approaches in AI safety research
- Built on excellent open-source libraries: Pydantic, NetworkX, FAISS, Sentence-Transformers
- LLM integrations powered by OpenAI, Anthropic, and Google APIs
Documentation โข PyPI โข GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_debate_ai-0.1.0.tar.gz.
File metadata
- Download URL: argus_debate_ai-0.1.0.tar.gz
- Upload date:
- Size: 93.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e19bd1e3d14affade4c758a0a61079fa54eb353a821dcae201f107690031591
|
|
| MD5 |
1d7045a4a7f98b70b6107feab85805df
|
|
| BLAKE2b-256 |
56ee6781f8afe3db7427694f4202381acbe35d27648d4131d11e0fad6c1c4208
|
File details
Details for the file argus_debate_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: argus_debate_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 110.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a0180a90fb6a7e02feaf9324c3804ad3c436cc62a15dc347f3f806b93764547
|
|
| MD5 |
c6527373030c57f494bd0a3bfa0e6368
|
|
| BLAKE2b-256 |
bb9370147bf0efc452183d1238709b9ba01062199b3eb0ce1f34f1e79fd835d0
|