Skip to main content

Open-source multi-LLM ensemble tool for systematic review workflows

Project description

MetaScreener

Open-source multi-LLM ensemble for systematic review workflows

PyPI Docker CI License Python


MetaScreener is a local Python tool for AI-assisted systematic review (SR) workflows. It uses a Hierarchical Consensus Network (HCN) of 4 open-source LLMs with calibrated confidence aggregation, covering the full SR pipeline -- literature screening, data extraction, and risk-of-bias assessment -- in a single tool.

Note: Looking for MetaScreener v1? See the v1-legacy branch.

Features

  • Multi-LLM Ensemble -- 4 open-source LLMs (Qwen3, DeepSeek-V3, Llama 4 Scout, Mistral Small 3.1) vote on every decision; no single model is a point of failure
  • 3 SR Modules -- Title/abstract screening, structured data extraction from PDFs, and risk-of-bias assessment (RoB 2, ROBINS-I, QUADAS-2)
  • Reproducible by Design -- All models are open-source with version-locked weights; temperature=0.0 for all inference; seeded randomness; SHA256 prompt hashing in every audit trail entry
  • Framework-Agnostic Criteria -- Supports PICO, PEO, SPIDER, PCC, and custom frameworks with an interactive criteria wizard
  • Multiple Input/Output Formats -- Reads RIS, BibTeX, CSV, PubMed XML, Excel; exports to RIS, CSV, JSON, Excel, and audit trail
  • CLI + Web UI -- Full Typer CLI and Streamlit dashboard
  • Evaluation Toolkit -- Built-in metrics (sensitivity, specificity, F1, WSS@95, AUROC, ECE, Brier score), Plotly visualizations (ROC, calibration, score distribution), and bootstrap 95% confidence intervals

Installation

pip

pip install metascreener

Docker

# Slim image -- CLI and Streamlit UI
docker pull chaokunhong/metascreener:latest

# Full image -- includes validation experiments
docker pull chaokunhong/metascreener:full

From source

git clone https://github.com/ChaokunHong/MetaScreener.git
cd MetaScreener
uv sync --extra dev
uv run metascreener --help

Configuration

MetaScreener calls LLMs via cloud APIs. Set one of the following environment variables:

export OPENROUTER_API_KEY="your-key-here"   # OpenRouter (default)
# or
export TOGETHER_API_KEY="your-key-here"     # Together AI

Local inference via vLLM or Ollama is also supported -- see configs/models.yaml.

Quick Start

1. Define review criteria

# From a research topic -- AI generates and refines criteria interactively
metascreener init --topic "antimicrobial resistance in ICU patients"

# From existing criteria text
metascreener init --criteria path/to/criteria.txt

The wizard auto-detects your criteria framework (PICO, PEO, SPIDER, PCC, or custom), generates structured criteria via multi-LLM consensus, validates them, and saves a versioned criteria.yaml.

2. Screen papers

# Title/abstract screening
metascreener screen --input search_results.ris --stage ta

# Full-text screening
metascreener screen --input search_results.ris --stage ft

# Both stages sequentially
metascreener screen --input search_results.ris --stage both

Each record passes through the 4-layer HCN and is assigned a decision (INCLUDE, EXCLUDE, or HUMAN_REVIEW) with a confidence tier (Tier 0--3).

3. Extract data

# Build a YAML extraction form interactively
metascreener extract init-form

# Run extraction on included PDFs
metascreener extract --pdfs papers/ --form extraction_form.yaml

Supports 7 field types: text, integer, float, boolean, date, list, and categorical. Multi-LLM extraction with majority-vote consensus.

4. Assess risk of bias

metascreener assess-rob --pdfs papers/ --tool rob2       # RoB 2 (RCTs)
metascreener assess-rob --pdfs papers/ --tool robins-i   # ROBINS-I (observational)
metascreener assess-rob --pdfs papers/ --tool quadas2    # QUADAS-2 (diagnostic)

Each tool follows its official domain structure with signaling questions. Multi-LLM assessment with worst-case-per-domain merging and majority-vote consensus.

5. Evaluate and export

# Evaluate against gold-standard labels with interactive Plotly charts
metascreener evaluate --labels gold_standard.csv --predictions results.json --visualize

# Export results in multiple formats
metascreener export --results results.json --format csv,json,excel,audit

Web UI

metascreener ui   # Launches Streamlit dashboard at localhost:8501

Architecture

MetaScreener's screening module uses a 4-layer Hierarchical Consensus Network:

Records (RIS/BibTeX/CSV/XML/Excel)
    │
    ▼
┌────────────────────────────────────────────────────┐
│  Layer 1: Parallel LLM Inference                    │
│  4 models evaluate each record independently        │
│  Framework-specific prompts (PICO/PEO/SPIDER/PCC)  │
├────────────────────────────────────────────────────┤
│  Layer 2: Semantic Rule Engine                      │
│  3 hard rules (publication type, language,           │
│    study design) → auto-exclude                     │
│  3 soft rules (population, outcome, intervention)   │
│    → score penalty                                  │
├────────────────────────────────────────────────────┤
│  Layer 3: Calibrated Confidence Aggregation (CCA)   │
│  Platt/isotonic calibration + weighted consensus    │
│  S = Σ(wᵢ·sᵢ·cᵢ·φᵢ) / Σ(wᵢ·cᵢ·φᵢ)              │
│  C = 1 − H(p_inc, p_exc) / log(2)                 │
├────────────────────────────────────────────────────┤
│  Layer 4: Hierarchical Decision Router              │
│  Tier 0: Hard rule violation  → EXCLUDE             │
│  Tier 1: Unanimous + high conf → AUTO               │
│  Tier 2: Majority + mid conf  → INCLUDE             │
│  Tier 3: Disagreement / low   → HUMAN_REVIEW        │
└────────────────────────────────────────────────────┘
    │
    ▼
ScreeningDecision + AuditEntry (per record)

LLM Models

All models are open-source and version-locked in configs/models.yaml.

Model Parameters License Role
Qwen3-235B-A22B 235B (22B active, MoE) Apache 2.0 Multilingual + structured extraction
DeepSeek-V3.2 685B (37B active, MoE) MIT Complex reasoning + rule adherence
Llama 4 Scout ~100B+ (MoE) Llama License General understanding
Mistral Small 3.1 24B 24B (dense) Apache 2.0 Fast screening + deterministic cases

Inference runs via OpenRouter or Together AI APIs. Local deployment via vLLM or Ollama is also supported.

Project Structure

src/metascreener/
├── core/                  # Shared data models, enums, exceptions
├── io/                    # Readers/writers (RIS, BibTeX, CSV, XML, Excel, PDF)
├── llm/                   # LLM backends + parallel runner
│   └── adapters/          # OpenRouter, Together AI, vLLM, Ollama, Mock
├── criteria/              # Criteria wizard (8 frameworks, multi-LLM generation)
├── module1_screening/     # HCN screening (4 layers)
├── module2_extraction/    # Structured data extraction from PDFs
├── module3_quality/       # Risk-of-bias assessment (RoB 2, ROBINS-I, QUADAS-2)
├── evaluation/            # Metrics, calibration, Plotly visualization
├── cli/                   # Typer CLI commands
└── app/                   # Streamlit Web UI

Reproducibility

Every design decision prioritizes reproducibility:

  • Deterministic inference: temperature=0.0 for all LLM calls
  • Version-locked models: Exact model versions pinned in configs/models.yaml
  • Seeded randomness: All stochastic operations accept a seed parameter (default: 42)
  • Prompt versioning: SHA256 hash of every prompt stored in audit trail
  • Full audit trail: Every decision logged with model outputs, rule results, calibration parameters, and confidence scores
  • Docker: Complete environment reproduction via docker/Dockerfile
  • One-command reproduction: bash scripts/run_all_validations.sh reruns all experiments

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests (645 tests)
uv run pytest

# Run tests with coverage (minimum 80%)
uv run pytest --cov=src/metascreener --cov-report=term-missing --cov-fail-under=80

# Lint
uv run ruff check src/

# Type check
uv run mypy src/

Citation

If you use MetaScreener in your research, please cite:

@software{hong2026metascreener,
  author    = {Hong, Chaokun},
  title     = {MetaScreener: Open-Source Multi-LLM Ensemble for Systematic Review Workflows},
  url       = {https://github.com/ChaokunHong/MetaScreener},
  version   = {2.0.0},
  year      = {2026},
  license   = {Apache-2.0}
}

License

Apache 2.0 -- see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metascreener-2.0.0a2.tar.gz (589.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metascreener-2.0.0a2-py3-none-any.whl (157.7 kB view details)

Uploaded Python 3

File details

Details for the file metascreener-2.0.0a2.tar.gz.

File metadata

  • Download URL: metascreener-2.0.0a2.tar.gz
  • Upload date:
  • Size: 589.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metascreener-2.0.0a2.tar.gz
Algorithm Hash digest
SHA256 25c1b2fb6472cc2fbbb40d15177612522b9ecfd06bc81a33ab377c1dd0fe4fdf
MD5 626b04a30ee88d4e8f8962acab7b5eaa
BLAKE2b-256 ce6adff7d0b4eb2161c2b81e6da346285d2c94cf00ebafbf71322d6e2e2ce92e

See more details on using hashes here.

Provenance

The following attestation bundles were made for metascreener-2.0.0a2.tar.gz:

Publisher: release.yml on ChaokunHong/MetaScreener

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metascreener-2.0.0a2-py3-none-any.whl.

File metadata

  • Download URL: metascreener-2.0.0a2-py3-none-any.whl
  • Upload date:
  • Size: 157.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metascreener-2.0.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 c8e1ebc11ad5c6c855d2f749b79cd9e0c404d035351147c1b0015acabaacd9db
MD5 98aab193e257a29c287b4b13be1ebc6b
BLAKE2b-256 1833cb872e80af4621b375845c4828a67ddeab97b09ed1dea40685f17412da7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for metascreener-2.0.0a2-py3-none-any.whl:

Publisher: release.yml on ChaokunHong/MetaScreener

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page