Open-source multi-LLM ensemble tool for systematic review workflows
Project description
MetaScreener
Open-source multi-LLM ensemble for systematic review workflows
MetaScreener is a local Python tool for AI-assisted systematic review (SR) workflows. It uses a Hierarchical Consensus Network (HCN) of 4 open-source LLMs with calibrated confidence aggregation, covering the full SR pipeline -- literature screening, data extraction, and risk-of-bias assessment -- in a single tool.
Note: Looking for MetaScreener v1? See the
v1-legacybranch.
Features
- Multi-LLM Ensemble -- 4 open-source LLMs (Qwen3, DeepSeek-V3, Llama 4 Scout, Mistral Small 3.1) vote on every decision; no single model is a point of failure
- 3 SR Modules -- Title/abstract screening, structured data extraction from PDFs, and risk-of-bias assessment (RoB 2, ROBINS-I, QUADAS-2)
- Reproducible by Design -- All models are open-source with version-locked weights;
temperature=0.0for all inference; seeded randomness; SHA256 prompt hashing in every audit trail entry - Framework-Agnostic Criteria -- Supports PICO, PEO, SPIDER, PCC, and custom frameworks with an interactive criteria wizard
- Multiple Input/Output Formats -- Reads RIS, BibTeX, CSV, PubMed XML, Excel; exports to RIS, CSV, JSON, Excel, and audit trail
- CLI + Web UI -- Full Typer CLI and Streamlit dashboard
- Evaluation Toolkit -- Built-in metrics (sensitivity, specificity, F1, WSS@95, AUROC, ECE, Brier score), Plotly visualizations (ROC, calibration, score distribution), and bootstrap 95% confidence intervals
Installation
pip
pip install metascreener
Docker
# Slim image -- CLI and Streamlit UI
docker pull chaokunhong/metascreener:latest
# Full image -- includes validation experiments
docker pull chaokunhong/metascreener:full
From source
git clone https://github.com/ChaokunHong/MetaScreener.git
cd MetaScreener
uv sync --extra dev
uv run metascreener --help
Configuration
MetaScreener calls LLMs via cloud APIs. Set one of the following environment variables:
export OPENROUTER_API_KEY="your-key-here" # OpenRouter (default)
# or
export TOGETHER_API_KEY="your-key-here" # Together AI
Local inference via vLLM or Ollama is also supported -- see configs/models.yaml.
Quick Start
1. Define review criteria
# From a research topic -- AI generates and refines criteria interactively
metascreener init --topic "antimicrobial resistance in ICU patients"
# From existing criteria text
metascreener init --criteria path/to/criteria.txt
The wizard auto-detects your criteria framework (PICO, PEO, SPIDER, PCC, or custom), generates structured criteria via multi-LLM consensus, validates them, and saves a versioned criteria.yaml.
2. Screen papers
# Title/abstract screening
metascreener screen --input search_results.ris --stage ta
# Full-text screening
metascreener screen --input search_results.ris --stage ft
# Both stages sequentially
metascreener screen --input search_results.ris --stage both
Each record passes through the 4-layer HCN and is assigned a decision (INCLUDE, EXCLUDE, or HUMAN_REVIEW) with a confidence tier (Tier 0--3).
3. Extract data
# Build a YAML extraction form interactively
metascreener extract init-form
# Run extraction on included PDFs
metascreener extract --pdfs papers/ --form extraction_form.yaml
Supports 7 field types: text, integer, float, boolean, date, list, and categorical. Multi-LLM extraction with majority-vote consensus.
4. Assess risk of bias
metascreener assess-rob --pdfs papers/ --tool rob2 # RoB 2 (RCTs)
metascreener assess-rob --pdfs papers/ --tool robins-i # ROBINS-I (observational)
metascreener assess-rob --pdfs papers/ --tool quadas2 # QUADAS-2 (diagnostic)
Each tool follows its official domain structure with signaling questions. Multi-LLM assessment with worst-case-per-domain merging and majority-vote consensus.
5. Evaluate and export
# Evaluate against gold-standard labels with interactive Plotly charts
metascreener evaluate --labels gold_standard.csv --predictions results.json --visualize
# Export results in multiple formats
metascreener export --results results.json --format csv,json,excel,audit
Web UI
metascreener ui # Launches Streamlit dashboard at localhost:8501
Architecture
MetaScreener's screening module uses a 4-layer Hierarchical Consensus Network:
Records (RIS/BibTeX/CSV/XML/Excel)
│
▼
┌────────────────────────────────────────────────────┐
│ Layer 1: Parallel LLM Inference │
│ 4 models evaluate each record independently │
│ Framework-specific prompts (PICO/PEO/SPIDER/PCC) │
├────────────────────────────────────────────────────┤
│ Layer 2: Semantic Rule Engine │
│ 3 hard rules (publication type, language, │
│ study design) → auto-exclude │
│ 3 soft rules (population, outcome, intervention) │
│ → score penalty │
├────────────────────────────────────────────────────┤
│ Layer 3: Calibrated Confidence Aggregation (CCA) │
│ Platt/isotonic calibration + weighted consensus │
│ S = Σ(wᵢ·sᵢ·cᵢ·φᵢ) / Σ(wᵢ·cᵢ·φᵢ) │
│ C = 1 − H(p_inc, p_exc) / log(2) │
├────────────────────────────────────────────────────┤
│ Layer 4: Hierarchical Decision Router │
│ Tier 0: Hard rule violation → EXCLUDE │
│ Tier 1: Unanimous + high conf → AUTO │
│ Tier 2: Majority + mid conf → INCLUDE │
│ Tier 3: Disagreement / low → HUMAN_REVIEW │
└────────────────────────────────────────────────────┘
│
▼
ScreeningDecision + AuditEntry (per record)
LLM Models
All models are open-source and version-locked in configs/models.yaml.
| Model | Parameters | License | Role |
|---|---|---|---|
| Qwen3-235B-A22B | 235B (22B active, MoE) | Apache 2.0 | Multilingual + structured extraction |
| DeepSeek-V3.2 | 685B (37B active, MoE) | MIT | Complex reasoning + rule adherence |
| Llama 4 Scout | ~100B+ (MoE) | Llama License | General understanding |
| Mistral Small 3.1 24B | 24B (dense) | Apache 2.0 | Fast screening + deterministic cases |
Inference runs via OpenRouter or Together AI APIs. Local deployment via vLLM or Ollama is also supported.
Project Structure
src/metascreener/
├── core/ # Shared data models, enums, exceptions
├── io/ # Readers/writers (RIS, BibTeX, CSV, XML, Excel, PDF)
├── llm/ # LLM backends + parallel runner
│ └── adapters/ # OpenRouter, Together AI, vLLM, Ollama, Mock
├── criteria/ # Criteria wizard (8 frameworks, multi-LLM generation)
├── module1_screening/ # HCN screening (4 layers)
├── module2_extraction/ # Structured data extraction from PDFs
├── module3_quality/ # Risk-of-bias assessment (RoB 2, ROBINS-I, QUADAS-2)
├── evaluation/ # Metrics, calibration, Plotly visualization
├── cli/ # Typer CLI commands
└── app/ # Streamlit Web UI
Reproducibility
Every design decision prioritizes reproducibility:
- Deterministic inference:
temperature=0.0for all LLM calls - Version-locked models: Exact model versions pinned in
configs/models.yaml - Seeded randomness: All stochastic operations accept a
seedparameter (default: 42) - Prompt versioning: SHA256 hash of every prompt stored in audit trail
- Full audit trail: Every decision logged with model outputs, rule results, calibration parameters, and confidence scores
- Docker: Complete environment reproduction via
docker/Dockerfile - One-command reproduction:
bash scripts/run_all_validations.shreruns all experiments
Development
# Install with dev dependencies
uv sync --extra dev
# Run tests (645 tests)
uv run pytest
# Run tests with coverage (minimum 80%)
uv run pytest --cov=src/metascreener --cov-report=term-missing --cov-fail-under=80
# Lint
uv run ruff check src/
# Type check
uv run mypy src/
Citation
If you use MetaScreener in your research, please cite:
@software{hong2026metascreener,
author = {Hong, Chaokun},
title = {MetaScreener: Open-Source Multi-LLM Ensemble for Systematic Review Workflows},
url = {https://github.com/ChaokunHong/MetaScreener},
version = {2.0.0},
year = {2026},
license = {Apache-2.0}
}
License
Apache 2.0 -- see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metascreener-2.0.0a2.tar.gz.
File metadata
- Download URL: metascreener-2.0.0a2.tar.gz
- Upload date:
- Size: 589.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25c1b2fb6472cc2fbbb40d15177612522b9ecfd06bc81a33ab377c1dd0fe4fdf
|
|
| MD5 |
626b04a30ee88d4e8f8962acab7b5eaa
|
|
| BLAKE2b-256 |
ce6adff7d0b4eb2161c2b81e6da346285d2c94cf00ebafbf71322d6e2e2ce92e
|
Provenance
The following attestation bundles were made for metascreener-2.0.0a2.tar.gz:
Publisher:
release.yml on ChaokunHong/MetaScreener
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metascreener-2.0.0a2.tar.gz -
Subject digest:
25c1b2fb6472cc2fbbb40d15177612522b9ecfd06bc81a33ab377c1dd0fe4fdf - Sigstore transparency entry: 991676930
- Sigstore integration time:
-
Permalink:
ChaokunHong/MetaScreener@f42af65e698f23e61ed05708bdff863e451d01b7 -
Branch / Tag:
refs/tags/v2.0.0a2 - Owner: https://github.com/ChaokunHong
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f42af65e698f23e61ed05708bdff863e451d01b7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file metascreener-2.0.0a2-py3-none-any.whl.
File metadata
- Download URL: metascreener-2.0.0a2-py3-none-any.whl
- Upload date:
- Size: 157.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8e1ebc11ad5c6c855d2f749b79cd9e0c404d035351147c1b0015acabaacd9db
|
|
| MD5 |
98aab193e257a29c287b4b13be1ebc6b
|
|
| BLAKE2b-256 |
1833cb872e80af4621b375845c4828a67ddeab97b09ed1dea40685f17412da7a
|
Provenance
The following attestation bundles were made for metascreener-2.0.0a2-py3-none-any.whl:
Publisher:
release.yml on ChaokunHong/MetaScreener
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metascreener-2.0.0a2-py3-none-any.whl -
Subject digest:
c8e1ebc11ad5c6c855d2f749b79cd9e0c404d035351147c1b0015acabaacd9db - Sigstore transparency entry: 991676932
- Sigstore integration time:
-
Permalink:
ChaokunHong/MetaScreener@f42af65e698f23e61ed05708bdff863e451d01b7 -
Branch / Tag:
refs/tags/v2.0.0a2 - Owner: https://github.com/ChaokunHong
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f42af65e698f23e61ed05708bdff863e451d01b7 -
Trigger Event:
push
-
Statement type: