A multi-model ensemble system using triangular review for fact-checking and verification
Project description
Model Court: A Multi-Model Ensemble Framework for Verification
Project Overview (V0.0.2)
Model Court is an open-source framework designed to make cross-verification and fact-checking with multiple models easier. Model Court is inspired by concepts from the U.S. courtroom system, using the roles of Prosecutor, Jury, and Judge to verify facts, with support for internet search, RAG-based retrieval, and more.
The current version is 0.0.2, released on 2025-11-30, and provides the basic core functionality.
Model Court performs AI content verification using a courtroom-style process:
- Prosecutor: Preprocesses the case, queries historical precedents. If a valid precedent already exists and has not expired, the result is returned directly without entering a full trial.
- Jury: Multiple independent LLM evaluators. Each juror is independent; it is recommended that each uses different retrieval tools and models from different providers.
- Judge: Aggregates votes and produces the final verdict, which is then stored as a precedent in the precedent database.
This courtroom-style process can improve reliability in scenarios where LLM outputs need to be verified, such as:
- Fact-checking: Determining the factual accuracy of news and social media content.
- Content moderation: Detecting harmful, violating, or misleading content.
- Knowledge Q&A: Verifying the correctness of AI-generated answers.
- Academic research: Improving robustness via multi-model ensemble.
- Compliance checking: Verifying whether content complies with certain rules or standards.
The basic courtroom flow is as follows:
Case Input → Prosecutor → [Juror1, Juror2, ..., JurorN] → Judge → Verdict
↓ ↓
Precedent Database Reference Sources
(Past Rulings) (Evidence)
For the full courtroom process, see the detailed introduction below.
Related documents:
- API Documentation – Full API reference and examples
- Installation Guide – Detailed installation and configuration
- Changelog – Version history
- Contribution Guide – How to contribute to this project
Installation
This project is published on PyPI and can be installed via pip.
Install
# Install core package (minimal dependencies)
pip install model-court
# Or install the full version (includes all LLM, RAG, search features)
pip install model-court[full]
# Development install (from source)
pip install -e .
pip install -e .[full] # Full version from source
Note: The package name is
model-court(with a hyphen), but the import name ismodel_court(with an underscore).
Detailed Introduction
Full Courtroom Workflow
The full courtroom workflow is as follows:
┌───────────────────────────────────────────┐
│ 🏛️ Model Court │
│ Main Courtroom Flow │
└───────────────────────────────────────────┘
│
▼
┌──────────────────────────┐
│ Input Case Text │
└──────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 1. Prosecutor (Prosecutor) │
├───────────────────────────────────────┤
│ • Optionally split the case into │
│ multiple claims (if enabled) │
│ • Query precedent DB (SQL + Vector) │
│ to avoid redundant evaluation │
│ - Cache hit → return past ruling │
│ - Similar precedent → provide │
│ as reference to the Judge │
└───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ 2. Launch Multiple Juries in Parallel │
└─────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ 🧑⚖️ Jury Voting Process │
├──────────────────────────────────────────────┤
│ Cross-validate claims using independent LLMs │
│ to reduce hallucinations. │
│ Each jury can focus on different criteria, │
│ either purely model-based or using pluggable │
│ reference sources: │
│ │
│ ① Logical review: evaluate based on logic │
│ and common sense only. │
│ ② Web search: validate claims using real- │
│ time web search (supports iterative │
│ verification). │
│ ③ RAG: use integrated RAG pipeline; Model │
│ Court handles creation, embedding, and │
│ retrieval. │
│ ④ Text document store: a basic fact store │
│ providing textual factual references. │
│ │
│ All jury members choose exactly one of: │
│ • "no_objection" (support) │
│ • "suspicious_fact" (insufficient │
│ evidence) │
│ • "reasonable_doubt" (counter-evidence) │
│ Note: if a jury member fails or cannot │
│ provide a conclusion, it is counted as │
│ "abstains". │
└──────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 3. Judge (Judge) │
├───────────────────────────────────────┤
│ • Aggregates jury votes │
│ • References similar precedents │
│ • Rule-based verdict logic; requires │
│ reaching a minimum quorum of valid │
│ votes │
│ ▶ supported (no objections) │
│ ▶ suspicious (some objections) │
│ ▶ refuted (majority oppose) │
│ • Outputs Judge reasoning │
└───────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 4. Court Generates CaseReport │
├───────────────────────────────────────┤
│ • Structured list of claims │
│ • Jury votes and rationales │
│ • Referenced precedents (if any) │
│ • Final judgment │
│ • Persisted into precedent DB │
│ (SQL + Vector) │
└───────────────────────────────────────┘
Full Example
You must configure the Prosecutor, Jury, and Judge before you can use Model Court. The setup is simple: specify which models to use and configure their APIs.
We recommend using OpenRouter, which allows you to access many LLMs with a single API key. The system also supports ChatGPT, Gemini, Claude, etc. See later sections for more details.
Note: the courtroom process must be run asynchronously.
import asyncio
import os
from pathlib import Path
from dotenv import load_dotenv # Load .env for environment variables
from model_court import Court, Prosecutor, Jury, Judge
from model_court.code import SqliteCourtCode
from model_court.references import SimpleTextStorage, LocalRAGReference
# Load environment variables
load_dotenv()
# ----------------------------------------------------------------------
# 0. Preparation
# ----------------------------------------------------------------------
# Before running this demo, please make sure you have completed:
#
# 1. Environment configuration (.env)
# - Create a .env file in the current directory
# - Add API key, e.g.:
# OPENROUTER_API_KEY=sk-or-v1-xxxx...
#
# 2. Virtual environment (recommended)
# - python -m venv .venv
# - source .venv/bin/activate (Windows: .venv\Scripts\activate)
#
# 3. Install dependencies
# - pip install model-court python-dotenv
# - pip install model-court[full] # Recommended if using RAG
#
# 4. Prepare data files (paths used in the code; below we use RAG jury
# and text-based jury as examples)
# Make sure the directory structure looks like:
#
# .
# ├── .env
# ├── example_court.py (this file)
# └── data/
# ├── rag_init_files/ <-- initialization corpus for RAG jury
# │ └── rumors_2024.txt (any text files as knowledge base)
# └── text_documents/ <-- reference files for text-based jury
# └── basic_facts.txt (basic factual text such as policies,
# legal clauses, etc.)
# ----------------------------------------------------------------------
# ----------------------------------------------------------------------
# 1. Initialize Court: configure Prosecutor, Juries, and Judge
# ----------------------------------------------------------------------
def build_court() -> Court:
# 1. Initialize precedent store (persistent storage)
court_code = SqliteCourtCode(
db_path="./fact_check_history.db",
enable_vector_search=True
)
# 2. Initialize Prosecutor (check precedents and split claims)
prosecutor = Prosecutor(
court_code=court_code,
auto_claim_splitting=False, # Set True to split case into multiple claims
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "openai/gpt-3.5-turbo",
"temperature": 0.1
},
prosecutor_prompt=(
"You are a strict prosecutor. Break the input case into "
"independent, verifiable factual claims."
)
)
# 3. Initialize juries (ensure diversity to keep them independent)
# [Logical perspective]
jury_logic = Jury(
name="Logic_Jury",
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "openai/gpt-4",
"temperature": 0.0
},
reference=None,
jury_prompt=(
"Evaluate whether the statement is reasonable based on logic "
"and common sense only. Do not fabricate information."
)
)
# [Web search perspective]
jury_web = Jury(
name="Web_Jury",
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "perplexity/sonar", # This model has built-in web access
"temperature": 0.0
},
reference=None,
jury_prompt=(
"Use web search to verify each claim and base your judgment "
"on the latest information."
)
)
# [Local RAG perspective]
jury_rag = Jury(
name="RAG_Jury",
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "meta-llama/llama-3-70b-instruct",
"temperature": 0.2
},
reference=LocalRAGReference(
collection_name="common_rumors",
persist_directory="./rag_db_storage",
source_folder="./data/rag_init_files",
embedding_model="MiniLM",
mode="append",
top_k=2
),
jury_prompt="Query the local rumor knowledge base to see if related records exist."
)
# [Text document perspective]
basic_facts_path = Path("./data/text_documents/basic_facts.txt")
# Basic file check for demo convenience
if not basic_facts_path.exists():
raise FileNotFoundError(
f"Demo failed: please create file {basic_facts_path} first."
)
jury_facts = Jury(
name="Facts_Jury",
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "openai/gpt-3.5-turbo",
"temperature": 0.1
},
reference=SimpleTextStorage(text=basic_facts_path.read_text(encoding="utf-8")),
jury_prompt="Compare each claim against the basic facts text to decide if it is true."
)
# 4. Initialize Judge
judge = Judge(
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "openai/gpt-4",
"temperature": 0.2
}
)
# 5. Assemble the Court
return Court(
prosecutor=prosecutor,
juries=[jury_logic, jury_web, jury_rag, jury_facts],
judge=judge,
verdict_rules={
"supported": {"operator": "eq", "value": 0},
"suspicious": {"operator": "lt", "value": 0.5},
"refuted": "default"
},
quorum=3,
concurrency_limit=4
)
# ----------------------------------------------------------------------
# 2. Run a demo
# ----------------------------------------------------------------------
async def demo():
# Instantiate the court; RAG models will be loaded on first run
court = build_court()
# Case input
case_text = "China and the United States have already had diplomatic relations for 300 years, and the two governments held a celebration for this."
# Hear the case asynchronously
report = await court.hear(case_text)
# Display contents of the Report object
print(f"
{'='*20} Case Report (ID: {report.case_id}) {'='*20}")
for i, res in enumerate(report.claims, 1):
print(f"
[Claim {i}] {res.claim.text}")
# Print detailed jury votes
for vote in res.jury_votes:
print(f" - {vote.jury_name}: {vote.decision}")
if vote.reason:
print(f" Reason: {vote.reason[:60]}...")
print(f"
=> Judge Verdict: [{res.verdict}]")
print(f" => Judge Reasoning: {res.judge_reasoning}")
print(f"
{'='*60}")
if __name__ == "__main__":
# The court process must be run asynchronously
asyncio.run(demo())
More examples can be found under the example folder in the project:
- Full CLI example – Command-line script demonstrating all major features (similar to the example above).
- Web app example – A fact-checking web application that shows how to integrate Model Court into a web UI.
Project Configuration
LLM
The project supports the following LLM providers:
| Provider | Description | Example Models |
|---|---|---|
openai |
Native OpenAI API | gpt-4, gpt-3.5-turbo |
google |
Google Gemini | gemini-pro, gemini-1.5-pro |
anthropic |
Anthropic Claude | claude-3-5-sonnet, claude-3-opus |
openai_compatible |
OpenAI-compatible API (recommended) | Access all models via OpenRouter |
custom |
Custom provider | Local models or self-hosted service |
Recommended: openai_compatible + OpenRouter
OpenRouter provides a unified interface to many LLMs. With a single API key, you can access over 100 models, including some that are free (e.g., deepseek).
# Environment variable
export OPENROUTER_API_KEY="sk-or-v1-..."
# In code
model_config = {
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "openai/gpt-4", # Or any other supported model
}
Supported models list: https://openrouter.ai/models
Reference
The project supports the following built-in reference sources and modes:
| Reference Type | Description | Typical Use Case |
|---|---|---|
SimpleTextStorage |
Plain text docs | Simple fact lists, rule descriptions |
LocalRAGReference |
Local RAG KB | Semantic search over large corpora |
GoogleSearchReference |
Google Custom Search | Need real-time web verification |
None |
Blind mode | Pure logical reasoning without external sources |
1. Simple text storage
from model_court.references import SimpleTextStorage
from pathlib import Path
# Read from file
facts_file = Path("./data/rag_documents/basic_facts.txt")
with open(facts_file, "r", encoding="utf-8") as f:
facts_text = f.read()
reference = SimpleTextStorage(text=facts_text)
# Or directly pass a small text block (for quick tests)
# reference = SimpleTextStorage(text="Fact 1: The Earth is round
Fact 2: The chemical formula of water is H2O")
2. Local RAG knowledge base
from model_court.references import LocalRAGReference
reference = LocalRAGReference(
collection_name="my_knowledge",
persist_directory="./vector_db",
source_folder="./documents", # Folder with txt/md files
embedding_model="MiniLM", # "MiniLM", "BGE", or "OpenAI"
mode="append", # "overwrite", "append", or "read_only"
top_k=3 # Return top 3 most relevant chunks
)
3. Google Search
from model_court.references import GoogleSearchReference
reference = GoogleSearchReference(
api_key="your-google-api-key",
search_engine_id="your-search-engine-id",
num_results=5
)
4. Blind mode (no reference)
jury = Jury(
name="Logic_Checker",
model=model_config,
reference=None, # No external references
jury_prompt="Judge only based on logic and common sense."
)
Project Structure
model_court/
├── model_court/ # Core package
│ ├── core/ # Core components
│ │ ├── models.py # Data models
│ │ ├── court.py # Court main class
│ │ ├── prosecutor.py # Prosecutor class
│ │ ├── jury.py # Jury class
│ │ └── judge.py # Judge class
│ ├── llm/ # LLM provider layer
│ │ ├── base.py # Abstract base classes
│ │ ├── openai_provider.py
│ │ ├── google_provider.py
│ │ ├── anthropic_provider.py
│ │ ├── custom_provider.py
│ │ └── factory.py # Provider factory
│ ├── references/ # Reference sources
│ │ ├── base.py # Abstract base classes
│ │ ├── google_search.py
│ │ ├── web_search.py
│ │ ├── rag_reference.py
│ │ └── text_storage.py
│ ├── embeddings/ # Embedding models
│ │ ├── base.py # Abstract base classes
│ │ ├── minilm.py
│ │ ├── bge.py
│ │ └── openai_embedding.py
│ ├── code/ # Court Code (precedent store)
│ │ ├── base.py # Abstract base classes
│ │ └── sqlite_code.py
│ └── utils/ # Helper utilities
│ └── helpers.py
├── example/ # Usage examples
│ ├── example_full.py # Full CLI example
│ ├── backend/ # Web API server
│ ├── frontend/ # Web frontend
│ └── data/ # Example data
├── api_docs.md # API documentation
├── README.md # Project description
├── CHANGELOG.md # Changelog
├── CONTRIBUTING.md # Contribution guide
├── LICENSE # License
├── pyproject.toml # Project configuration
├── setup.py # Setup script
└── requirements.txt # Dependencies
.
Advanced Features
Custom Verdict Rules
You can customize verdict rules according to your business requirements:
# Example 1: Strict mode (single veto)
court_strict = Court(
prosecutor=prosecutor,
juries=[jury_logic, jury_web, jury_rag, jury_facts],
judge=judge,
verdict_rules={
"supported": {"operator": "eq", "value": 0}, # Must have 0 opposing votes
"refuted": "default" # Any opposing vote → refuted
}
)
# Example 2: Lenient mode (majority rule)
court_lenient = Court(
prosecutor=prosecutor,
juries=[jury_logic, jury_web, jury_rag, jury_facts],
judge=judge,
verdict_rules={
"supported": {"operator": "lt", "value": 0.25}, # Opposition < 25%
"suspicious": {"operator": "lt", "value": 0.75}, # Opposition < 75%
"refuted": "default" # Opposition >= 75%
}
)
# Example 3: Multi-level rating
court_detailed = Court(
prosecutor=prosecutor,
juries=[jury_logic, jury_web, jury_rag, jury_facts],
judge=judge,
verdict_rules={
"clearly_true": {"operator": "eq", "value": 0}, # 0 opposition
"likely_true": {"operator": "lt", "value": 0.3}, # < 30% opposition
"uncertain": {"operator": "lt", "value": 0.6}, # < 60% opposition
"likely_false": {"operator": "lt", "value": 0.9}, # < 90% opposition
"clearly_false": "default" # >= 90% opposition
}
)
Automatic Claim Splitting
For complex statements, you can automatically split them into multiple independent claims:
prosecutor = Prosecutor(
court_code=court_code,
auto_claim_splitting=True, # Enable auto splitting
model={
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "openai/gpt-3.5-turbo",
},
prosecutor_prompt="Split the case into independent, verifiable factual claims."
)
# Input: "The Earth is flat, and the Sun orbits the Earth."
# Automatically split into:
# Claim 1: "The Earth is flat."
# Claim 2: "The Sun orbits the Earth."
Precedent Caching System
Automatically cache past rulings to avoid repeated evaluation:
from datetime import timedelta
court_code = SqliteCourtCode(
db_path="./court_history.db",
enable_vector_search=True, # Vector search for similar cases
default_validity_period=timedelta(days=30) # Precedent validity period
)
# First check: full pipeline, typically 10–30 seconds
report1 = await court.hear("The Earth is flat.")
# Second check with same content: directly return cached result, < 1 second
report2 = await court.hear("The Earth is flat.")
FAQ
Q: Why are the package name and import name different?
This is intentional:
- Installation:
pip install model-court(PyPI package name, with hyphen) - Import:
from model_court import ...(Python module name, with underscore)
This is a common pattern in Python because module names cannot contain hyphens.
Q: I get ModuleNotFoundError: No module named 'model_court'
Please ensure the package is installed correctly:
# From project root (where pyproject.toml is located)
pip install -e .
# Or install from PyPI
pip install model-court
Q: How do I use different LLMs?
Recommended: use OpenRouter as a unified entrypoint:
model_config = {
"provider": "openai_compatible",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.getenv("OPENROUTER_API_KEY"),
"model_name": "MODEL_NAME", # e.g., openai/gpt-4, anthropic/claude-3-5-sonnet
}
Supported model list: https://openrouter.ai/models
You can of course also use the official APIs for ChatGPT, Gemini, Claude, or school/corporate APIs that are OpenAI-compatible.
Q: How can I reduce API costs?
Suggestions:
- Use cheaper or free APIs when possible.
- Use smaller or local models (local inference is supported).
- Use the precedent caching system to avoid repeated evaluation.
- Reduce the number of juries.
- Use cheaper models such as
gpt-3.5-turbo. - Disable automatic claim splitting (
auto_claim_splitting=False).
Q: What if evaluation is slow?
Normally, evaluating multiple models in parallel takes about 10–30 seconds. To speed up:
- Enable and leverage precedent caching (second run on the same content is < 1 second).
- Reduce the number of juries.
- Choose faster models.
- Tune the
concurrency_limitparameter.
License & Citation
This project is licensed under the MIT License and can be used freely, including for commercial purposes.
If you use Model Court in your research, please cite:
@software{model-court,
title={Model Court: A Multi-Model Ensemble Framework for Verification},
author={Jeff Liu},
year={2025},
url={https://github.com/LogicGate-AI-Lab/model-court}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_court-0.0.2.tar.gz.
File metadata
- Download URL: model_court-0.0.2.tar.gz
- Upload date:
- Size: 58.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4fef0cf4a77dc8477124e859ee4ea4207338ca9b2667fe0a2a275d52db9f301
|
|
| MD5 |
8ab1b501dad8ac6935fe1c3fd0de0c48
|
|
| BLAKE2b-256 |
dea02708cec43524250eb459dc0d65efba8a2773ec63825c0d6d8d7bff4b4438
|
File details
Details for the file model_court-0.0.2-py3-none-any.whl.
File metadata
- Download URL: model_court-0.0.2-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4e2204e67fe2eaa9c55790c224421127f91f1d8467e1b8dbaec60d4977b89cb
|
|
| MD5 |
eb68d6e47574b4e01cab4efec35d16fa
|
|
| BLAKE2b-256 |
83c09c57396e44dcd1e6c316c45878280b80a063a4c5e791afc852e5234323bc
|