Skip to main content

Literature-informed Prior distributions for Bayesian model calibration

Project description

Distribird

Automated Bayesian prior construction from scientific literature

PyPI Python 3.10+ License: MIT Tests: 223 passed Linting: ruff LangGraph FastAPI Open in Streamlit


Distribird turns a parameter name and description into a fully cited, publication-ready prior distribution. It searches Semantic Scholar, OpenAlex, and LLM deep-research agents in parallel, extracts numerical values from papers, and fits the best-matching scipy.stats distribution via AIC selection.

"I need a prior for maximum leaf area index of maize."

truncated_normal(mu=5.2, sigma=1.5, a=0, b=12) — fitted from 6 peer-reviewed sources with full citations.

Try it now

Why Distribird?

Bayesian calibration requires informative priors, but building them from literature is tedious. Researchers default to flat priors, losing valuable domain knowledge. Distribird closes that gap: describe your parameter, get a defensible prior in seconds.

Architecture

                        ┌──────────────────────────────┐
                        │        LangGraph DAG         │
                        └──────────────────────────────┘

START ─► Enrich ─► QueryGen ─► Search ─► RelevanceJudge ──┬► CrossEnrich ───┐
                                 ▲                        │                 │
                                 │                        └► FetchFulltext ◄┘
                          RefineSearch                            │
                                 ▲                            Extract
                                 │                                │
                                 │   RefineExtraction ◄─── QualityGate
                                 │         │                │     │
                                 │         └────────────────┘     │
                                 │                                ▼
                                 └────────────  Synthesize ─► ValidityCheck ─► END

Multi-agent search — Semantic Scholar, OpenAlex, and LLM deep-research agents run concurrently; a moderator LLM selects the best papers via deliberation.

Relevance scoring — An LLM-based relevance judge scores each paper before extraction. When multiple high-relevance papers are found, the pipeline routes through cross-enrichment (citation snowballing + follow-up queries) to discover additional sources.

Feedback loops — A quality gate inspects extraction results and can trigger search refinement (new queries) or extraction refinement (web-assisted re-extraction) before falling through to synthesis.

Validity defense — Every request is classified as VALID, SUSPICIOUS, LIKELY_INVALID, or UNKNOWN. When the enrichment LLM does not recognise the parameter, the pipeline short-circuits past search, extraction, and synthesis straight to the validity node, saving roughly 80–95% of wall-clock time and LLM tokens on out-of-scope requests. Ambiguous (SUSPICIOUS) cases trigger a single second-opinion LLM probe.

Budget-boundedIterationBudget caps every loop to guarantee termination.

Live progress — The pipeline streams node-by-node updates to the UI, showing which step is running, paper/value counts, and per-parameter progress bars.

Quickstart

Install

pip install distribird
Development install
git clone https://github.com/HUN-REN-AI1Science/Distribird.git
cd distribird
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest                     # 223 tests, all passing

Configure

Distribird reads configuration from environment variables (prefix DISTRIBIRD_) or a .env file in the project root.

# .env (or export these in your shell)
DISTRIBIRD_LLM_BASE_URL="http://localhost:4000"   # any OpenAI-compatible endpoint
DISTRIBIRD_LLM_API_KEY="your-key"
DISTRIBIRD_LLM_MODEL="gpt-4o"
DISTRIBIRD_SEMANTIC_SCHOLAR_API_KEY=""            # optional, increases rate limits

Sidebar behaviour in the Streamlit UI:

  • Settings provided in .env are used automatically — no manual input needed.
  • Settings not provided in .env appear as required fields in the sidebar; the user must fill them in before generation can start.
  • An "Override configured settings" toggle lets users temporarily replace .env values without editing the file.
  • Literature source toggles (Semantic Scholar, OpenAlex, LLM Web Search, LLM Deep Research) are always visible and control which connection fields are required.

Use

Python

import asyncio
from distribird.agent.pipeline import run_parameter
from distribird.models import ParameterInput, ConstraintSpec

result = asyncio.run(run_parameter(
    ParameterInput(
        name="max_lai",
        description="Maximum leaf area index of maize",
        unit="m2/m2",
        domain_context="Biome-BGCMuSo maize crop modeling",
        constraints=ConstraintSpec(lower_bound=0, upper_bound=12),
    )
))

print(result.prior.display_name())   # truncated_normal(mu=5.2, sigma=1.5, a=0, b=12)
print(result.prior.n_sources)        # 6
print(result.prior.confidence.value) # high

REST API

distribird-api                          # starts on :8000

curl -u demo:distribird2026 -X POST http://localhost:8000/api/v1/parameter \
  -H "Content-Type: application/json" \
  -d '{"name":"max_lai","description":"Maximum leaf area index of maize","unit":"m2/m2"}'

Streamlit UI

Try the hosted version at distribird.streamlit.app, or run locally:

streamlit run src/distribird/ui/app.py

Prior Fitting Strategy

Evidence Method Confidence
5+ values AIC across Normal, Truncated Normal, Gamma, Log-Normal, Beta High
2 – 4 values Moment matching with widened σ Medium
1 value Wide Normal centered on value Low
0 values Jeffreys / wide uninformative prior None

All fitted distributions respect user-specified physical constraints (bounds).

Export Formats

from distribird.export.json_export import export_json
from distribird.export.r_export import export_r
from distribird.export.python_export import export_python
Format Output
JSON Parameter name, family, params, citations, confidence
R Executable R script with distribution calls
Python scipy.stats code ready for MCMC samplers

Demo

A complete worked example using five Biome-BGCMuSo maize parameters:

python examples/maize_bgcmuso/demo.py

Testing

pytest                 # 223 tests
ruff check src/ tests/ # lint
mypy src/distribird/      # type checking (strict)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distribird-0.2.0.tar.gz (103.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distribird-0.2.0-py3-none-any.whl (80.2 kB view details)

Uploaded Python 3

File details

Details for the file distribird-0.2.0.tar.gz.

File metadata

  • Download URL: distribird-0.2.0.tar.gz
  • Upload date:
  • Size: 103.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for distribird-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cd670a21971d51b57e2060fbf2e255fc0041935b703357f5f15667880984b785
MD5 e64f98bab2e0e5c92196a314b4c8d8ad
BLAKE2b-256 1762f291b24d69906c4b4ed0847cea464e73a3673f0fd667f6383bd4d7635d70

See more details on using hashes here.

Provenance

The following attestation bundles were made for distribird-0.2.0.tar.gz:

Publisher: publish.yml on HUN-REN-AI1Science/Distribird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file distribird-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: distribird-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 80.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for distribird-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ca12e29b1ac9219c7dc9711a9603236171f950e7df686bb0e189d609e8f6e55
MD5 ac097ff715e3ffe1fee617b06a451b70
BLAKE2b-256 7a444a32240ec4a34bceb320d8ef71eefe943807cd2620817368a4fa900a3fa7

See more details on using hashes here.

Provenance

The following attestation bundles were made for distribird-0.2.0-py3-none-any.whl:

Publisher: publish.yml on HUN-REN-AI1Science/Distribird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page