Skip to main content

ARGUS: A Debate-Native Multi-Agent AI Architecture for Accelerating Scientific Discovery

Project description

ARGUS

Agentic Research & Governance Unified System

A debate-native, multi-agent AI framework for evidence-based reasoning with structured argumentation, decision-theoretic planning, and full provenance tracking.

Python 3.11+ License: MIT PyPI version


Overview

ARGUS implements Research Debate Chain (RDC) - a novel approach to AI reasoning that structures knowledge evaluation as multi-agent debates. Instead of single-pass inference, ARGUS orchestrates specialist agents that gather evidence, generate rebuttals, and render verdicts through Bayesian aggregation.

Key Innovations

  • Conceptual Debate Graph (C-DAG): A directed graph structure where propositions, evidence, and rebuttals are nodes with signed edges representing support/attack relationships
  • Evidence-Directed Debate Orchestration (EDDO): Algorithm for managing multi-round debates with stopping criteria
  • Value of Information Planning: Decision-theoretic experiment selection using Expected Information Gain
  • Full Provenance: PROV-O compatible ledger with hash-chain integrity for audit trails

Features

Multi-Agent Debate System

  • Moderator: Creates debate agendas, manages rounds, evaluates stopping criteria
  • Specialist Agents: Domain-specific evidence gathering with hybrid retrieval
  • Refuter: Generates counter-evidence and methodological critiques
  • Jury: Aggregates evidence via Bayesian updating, renders verdicts

Conceptual Debate Graph (C-DAG)

  • Node Types: Propositions, Evidence, Rebuttals, Findings, Assumptions
  • Edge Types: Supports, Attacks, Refines, Rebuts with signed weights
  • Propagation: Log-odds Bayesian belief updating across the graph
  • Visualization: Export to NetworkX for analysis

Hybrid Retrieval System

  • BM25 Sparse: Traditional keyword-based retrieval
  • FAISS Dense: Semantic vector search with sentence-transformers
  • Fusion Methods: Weighted combination or Reciprocal Rank Fusion (RRF)
  • Cross-Encoder Reranking: Neural reranking for precision

Decision-Theoretic Planning

  • Expected Information Gain (EIG): Monte Carlo estimation for experiment value
  • VoI Planner: Knapsack-based optimal action selection under budget
  • Calibration: Brier score, ECE, temperature scaling for confidence tuning

Provenance & Governance

  • PROV-O Compatible: W3C standard provenance model
  • Hash-Chain Integrity: SHA-256 linked events for tamper detection
  • Attestations: Cryptographic proofs for content integrity
  • Query API: Filter events by entity, agent, time range

LLM Provider Support

Provider Models Features
OpenAI GPT-4o, GPT-4, GPT-3.5 Generate, Stream, Embed
Anthropic Claude 3.5, Claude 3 Generate, Stream
Google Gemini 1.5 Pro/Flash Generate, Stream, Embed
Ollama Llama, Mistral, Phi Local deployment

Installation

From PyPI

pip install argus-debate-ai

From Source

git clone https://github.com/argus-ai/argus.git
cd argus
pip install -e ".[dev]"

Optional Dependencies

# For all features including dev tools
pip install argus-debate-ai[all]

# For Ollama local LLM support
pip install argus-debate-ai[ollama]

Quick Start

Basic Usage

from argus import RDCOrchestrator, get_llm

# Initialize with any supported LLM
llm = get_llm("openai", model="gpt-4o")

# Run a debate on a proposition
orchestrator = RDCOrchestrator(llm=llm, max_rounds=5)
result = orchestrator.debate(
    "The new treatment reduces symptoms by more than 20%",
    prior=0.5,  # Start with 50/50 uncertainty
)

print(f"Verdict: {result.verdict.label}")
print(f"Posterior: {result.verdict.posterior:.3f}")
print(f"Evidence: {result.num_evidence} items")
print(f"Reasoning: {result.verdict.reasoning}")

Building a Debate Graph Manually

from argus import CDAG, Proposition, Evidence, EdgeType
from argus.cdag.nodes import EvidenceType
from argus.cdag.propagation import compute_posterior

# Create the graph
graph = CDAG(name="drug_efficacy_debate")

# Add the proposition to evaluate
prop = Proposition(
    text="Drug X is effective for treating condition Y",
    prior=0.5,
    domain="clinical",
)
graph.add_proposition(prop)

# Add supporting evidence
trial_evidence = Evidence(
    text="Phase 3 RCT showed 35% symptom reduction (n=500, p<0.001)",
    evidence_type=EvidenceType.EMPIRICAL,
    polarity=1,  # Supports
    confidence=0.9,
    relevance=0.95,
    quality=0.85,
)
graph.add_evidence(trial_evidence, prop.id, EdgeType.SUPPORTS)

# Add challenging evidence
side_effect = Evidence(
    text="15% of patients experienced adverse events",
    evidence_type=EvidenceType.EMPIRICAL,
    polarity=-1,  # Attacks
    confidence=0.8,
    relevance=0.7,
)
graph.add_evidence(side_effect, prop.id, EdgeType.ATTACKS)

# Compute Bayesian posterior
posterior = compute_posterior(graph, prop.id)
print(f"Posterior probability: {posterior:.3f}")

Document Ingestion & Retrieval

from argus import DocumentLoader, Chunker, EmbeddingGenerator
from argus.retrieval import HybridRetriever

# Load documents
loader = DocumentLoader()
doc = loader.load("research_paper.pdf")

# Chunk with overlap
chunker = Chunker(chunk_size=512, chunk_overlap=50)
chunks = chunker.chunk(doc)

# Create hybrid retriever
retriever = HybridRetriever(
    embedding_model="all-MiniLM-L6-v2",
    lambda_param=0.7,  # Weight toward dense retrieval
    use_reranker=True,
)
retriever.index_chunks(chunks)

# Search
results = retriever.retrieve("treatment efficacy results", top_k=10)
for r in results:
    print(f"[{r.rank}] Score: {r.score:.3f} - {r.chunk.text[:100]}...")

Multi-Agent Debate

from argus import get_llm
from argus.agents import Moderator, Specialist, Refuter, Jury
from argus import CDAG, Proposition

llm = get_llm("anthropic", model="claude-3-5-sonnet-20241022")

# Initialize agents
moderator = Moderator(llm)
specialist = Specialist(llm, domain="clinical")
refuter = Refuter(llm)
jury = Jury(llm)

# Create debate
graph = CDAG()
prop = Proposition(text="The intervention is cost-effective", prior=0.5)
graph.add_proposition(prop)

# Moderator creates agenda
agenda = moderator.create_agenda(graph, prop.id)

# Specialists gather evidence
evidence = specialist.gather_evidence(graph, prop.id)

# Refuter challenges
rebuttals = refuter.generate_rebuttals(graph, prop.id)

# Jury renders verdict
verdict = jury.evaluate(graph, prop.id)
print(f"Verdict: {verdict.label} (posterior={verdict.posterior:.3f})")

Command Line Interface

ARGUS provides a CLI for common operations:

# Run a debate
argus debate "The hypothesis is supported by evidence" --prior 0.5 --rounds 3

# Quick evaluation (single LLM call)
argus evaluate "Climate change increases wildfire frequency"

# Ingest documents into index
argus ingest ./documents --output ./index

# Show configuration
argus config

# Specify provider
argus debate "Query" --provider anthropic --model claude-3-5-sonnet-20241022

Configuration

Environment Variables

# LLM API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

# Default settings
export ARGUS_DEFAULT_PROVIDER="openai"
export ARGUS_DEFAULT_MODEL="gpt-4o"
export ARGUS_TEMPERATURE="0.7"
export ARGUS_MAX_TOKENS="2048"

# Ollama (local)
export ARGUS_OLLAMA_HOST="http://localhost:11434"

Programmatic Configuration

from argus import ArgusConfig, get_config

config = ArgusConfig(
    default_provider="anthropic",
    default_model="claude-3-5-sonnet-20241022",
    temperature=0.5,
    max_tokens=4096,
)

# Or get global config
config = get_config()

Architecture

+-----------------------------------------------------------------------------+
|                              ARGUS Architecture                              |
+-----------------------------------------------------------------------------+
|                                                                              |
|  +---------------+    +---------------+    +---------------+                |
|  |   Moderator   |--->|  Specialists  |--->|    Refuter    |                |
|  |   (Planner)   |    |  (Evidence)   |    | (Challenges)  |                |
|  +-------+-------+    +-------+-------+    +-------+-------+                |
|          |                    |                    |                         |
|          v                    v                    v                         |
|  +---------------------------------------------------------------------+    |
|  |                    C-DAG (Debate Graph)                              |    |
|  |  +--------+     +----------+     +----------+                        |    |
|  |  | Props  |---->| Evidence |---->| Rebuttals|                        |    |
|  |  +--------+     +----------+     +----------+                        |    |
|  |         ^              |               |                              |    |
|  |         +--------------+---------------+                              |    |
|  |                 Signed Influence Propagation                         |    |
|  +---------------------------------------------------------------------+    |
|                                    |                                         |
|                                    v                                         |
|  +---------------------------------------------------------------------+    |
|  |                         Jury (Verdict)                               |    |
|  |           Bayesian Aggregation -> Posterior -> Label                 |    |
|  +---------------------------------------------------------------------+    |
|                                                                              |
|  +-----------------+  +-----------------+  +-----------------+              |
|  | Knowledge Layer |  | Decision Layer  |  |   Provenance    |              |
|  | - Ingestion     |  | - Bayesian      |  | - PROV-O Ledger |              |
|  | - Chunking      |  | - EIG/VoI       |  | - Hash Chain    |              |
|  | - Hybrid Index  |  | - Calibration   |  | - Attestations  |              |
|  +-----------------+  +-----------------+  +-----------------+              |
|                                                                              |
+-----------------------------------------------------------------------------+

Module Overview

Module Description
argus.core Configuration, data models, LLM abstractions
argus.cdag Conceptual Debate Graph implementation
argus.decision Bayesian updating, EIG, VoI planning, calibration
argus.knowledge Document ingestion, chunking, embeddings, indexing
argus.retrieval Hybrid retrieval, reranking
argus.agents Moderator, Specialist, Refuter, Jury agents
argus.provenance PROV-O ledger, integrity, attestations
argus.orchestrator RDC orchestration engine

Algorithms

Signed Influence Propagation

The C-DAG uses log-odds space for numerically stable belief propagation:

posterior = sigmoid(log-odds(prior) + sum(signed_weight_i * log(LR_i)))

Where:

  • sigmoid is the logistic function
  • LR_i is the likelihood ratio for evidence i
  • signed_weight = polarity * confidence * relevance * quality

Expected Information Gain

For experiment planning, ARGUS computes EIG via Monte Carlo:

EIG(a) = H(p) - E_y[H(p|y)]

Where H is entropy and the expectation is over possible outcomes.

Calibration

Temperature scaling optimizes:

T* = argmin_T sum(-y_i * log(sigmoid(z_i/T)) - (1-y_i) * log(1-sigmoid(z_i/T)))

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=argus --cov-report=html

# Run specific test module
pytest tests/unit/test_cdag.py -v

# Run integration tests
pytest tests/integration/ -v

Examples

Clinical Evidence Evaluation

from argus import RDCOrchestrator, get_llm
from argus.retrieval import HybridRetriever

# Load clinical literature
retriever = HybridRetriever()
retriever.index_chunks(clinical_chunks)

# Evaluate treatment claim
orchestrator = RDCOrchestrator(
    llm=get_llm("openai", model="gpt-4o"),
    max_rounds=5,
)

result = orchestrator.debate(
    "Metformin reduces HbA1c by >1% in Type 2 diabetes",
    prior=0.6,  # Prior based on existing knowledge
    retriever=retriever,
    domain="clinical",
)

Research Claim Verification

from argus import CDAG, Proposition, Evidence
from argus.cdag.propagation import compute_all_posteriors

graph = CDAG(name="research_verification")

# Main claim
claim = Proposition(
    text="Neural scaling laws predict emergent capabilities",
    prior=0.5,
)
graph.add_proposition(claim)

# Add evidence from multiple papers
# ... (add supporting/attacking evidence)

# Compute all posteriors
posteriors = compute_all_posteriors(graph)

for prop_id, posterior in posteriors.items():
    prop = graph.get_proposition(prop_id)
    print(f"{prop.text[:50]}... : {posterior:.3f}")

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

  • Inspired by debate-native reasoning approaches in AI safety research
  • Built on excellent open-source libraries: Pydantic, NetworkX, FAISS, Sentence-Transformers
  • LLM integrations powered by OpenAI, Anthropic, and Google APIs

Documentation | PyPI | GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_debate_ai-1.0.0.tar.gz (92.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_debate_ai-1.0.0-py3-none-any.whl (109.6 kB view details)

Uploaded Python 3

File details

Details for the file argus_debate_ai-1.0.0.tar.gz.

File metadata

  • Download URL: argus_debate_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 92.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for argus_debate_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9873a6844576c1091c544d248f8792092ae7d416bcfedbe2eb13a6de5dcaac32
MD5 da0707b0ff7984545269f6efbfcb688d
BLAKE2b-256 1ef739664b9484fd3a62246c558b47af6befab6c15ab3fbfde0fa1a12ccf10b1

See more details on using hashes here.

File details

Details for the file argus_debate_ai-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for argus_debate_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a42dadc63db64b01b6cb2397688da3c01d95db09154cd71a474566a315fffe21
MD5 12153f6d1c473e3afb283592a3d69797
BLAKE2b-256 06c490c9023e055390789f924fa5142eaee092606ab41cd461b338ab48e0ab98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page