Skip to main content

Literature-informed Prior distributions for Bayesian model calibration

Project description

Distribird

Automated Bayesian prior construction from scientific literature

PyPI Python 3.10+ License: MIT Tests: 148 passed Linting: ruff LangGraph FastAPI Open in Streamlit


Distribird turns a parameter name and description into a fully cited, publication-ready prior distribution. It searches Semantic Scholar, OpenAlex, and LLM deep-research agents in parallel, extracts numerical values from papers, and fits the best-matching scipy.stats distribution via AIC selection.

"I need a prior for maximum leaf area index of maize."

truncated_normal(mu=5.2, sigma=1.5, a=0, b=12) — fitted from 6 peer-reviewed sources with full citations.

Try it now

Why Distribird?

Bayesian calibration requires informative priors, but building them from literature is tedious. Researchers default to flat priors, losing valuable domain knowledge. Distribird closes that gap: describe your parameter, get a defensible prior in seconds.

Architecture

                        ┌──────────────────────────────┐
                        │        LangGraph DAG         │
                        └──────────────────────────────┘

START ─► Enrich ─► QueryGen ─► Search ─► RelevanceJudge ──┬► CrossEnrich ───┐
                                 ▲                        │                 │
                                 │                        └► FetchFulltext ◄┘
                          RefineSearch                            │
                                 ▲                            Extract
                                 │                                │
                                 │   RefineExtraction ◄─── QualityGate
                                 │         │                │     │
                                 │         └────────────────┘     │
                                 │                                ▼
                                 └─────────────────────────  Synthesize ─► END

Multi-agent search — Semantic Scholar, OpenAlex, and LLM deep-research agents run concurrently; a moderator LLM selects the best papers via deliberation.

Relevance scoring — An LLM-based relevance judge scores each paper before extraction. When multiple high-relevance papers are found, the pipeline routes through cross-enrichment (citation snowballing + follow-up queries) to discover additional sources.

Feedback loops — A quality gate inspects extraction results and can trigger search refinement (new queries) or extraction refinement (web-assisted re-extraction) before falling through to synthesis.

Budget-boundedIterationBudget caps every loop to guarantee termination.

Live progress — The pipeline streams node-by-node updates to the UI, showing which step is running, paper/value counts, and per-parameter progress bars.

Quickstart

Install

pip install distribird
Development install
git clone https://github.com/HUN-REN-AI1Science/Distribird.git
cd distribird
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest                     # 148 tests, all passing

Configure

Distribird reads configuration from environment variables (prefix DISTRIBIRD_) or a .env file in the project root.

# .env (or export these in your shell)
DISTRIBIRD_LLM_BASE_URL="http://localhost:4000"   # any OpenAI-compatible endpoint
DISTRIBIRD_LLM_API_KEY="your-key"
DISTRIBIRD_LLM_MODEL="gpt-4o"
DISTRIBIRD_SEMANTIC_SCHOLAR_API_KEY=""            # optional, increases rate limits

Sidebar behaviour in the Streamlit UI:

  • Settings provided in .env are used automatically — no manual input needed.
  • Settings not provided in .env appear as required fields in the sidebar; the user must fill them in before generation can start.
  • An "Override configured settings" toggle lets users temporarily replace .env values without editing the file.
  • Literature source toggles (Semantic Scholar, OpenAlex, LLM Web Search, LLM Deep Research) are always visible and control which connection fields are required.

Use

Python

import asyncio
from distribird.agent.pipeline import run_parameter
from distribird.models import ParameterInput, ConstraintSpec

result = asyncio.run(run_parameter(
    ParameterInput(
        name="max_lai",
        description="Maximum leaf area index of maize",
        unit="m2/m2",
        domain_context="Biome-BGCMuSo maize crop modeling",
        constraints=ConstraintSpec(lower_bound=0, upper_bound=12),
    )
))

print(result.prior.display_name())   # truncated_normal(mu=5.2, sigma=1.5, a=0, b=12)
print(result.prior.n_sources)        # 6
print(result.prior.confidence.value) # high

REST API

distribird-api                          # starts on :8000

curl -u demo:distribird2026 -X POST http://localhost:8000/api/v1/parameter \
  -H "Content-Type: application/json" \
  -d '{"name":"max_lai","description":"Maximum leaf area index of maize","unit":"m2/m2"}'

Streamlit UI

Try the hosted version at distribird.streamlit.app, or run locally:

streamlit run src/distribird/ui/app.py

Prior Fitting Strategy

Evidence Method Confidence
5+ values AIC across Normal, Truncated Normal, Gamma, Log-Normal, Beta High
2 – 4 values Moment matching with widened σ Medium
1 value Wide Normal centered on value Low
0 values Jeffreys / wide uninformative prior None

All fitted distributions respect user-specified physical constraints (bounds).

Export Formats

from distribird.export.json_export import export_json
from distribird.export.r_export import export_r
from distribird.export.python_export import export_python
Format Output
JSON Parameter name, family, params, citations, confidence
R Executable R script with distribution calls
Python scipy.stats code ready for MCMC samplers

Demo

A complete worked example using five Biome-BGCMuSo maize parameters:

python examples/maize_bgcmuso/demo.py

Testing

pytest                 # 148 tests
ruff check src/ tests/ # lint
mypy src/distribird/      # type checking (strict)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distribird-0.1.1.tar.gz (82.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distribird-0.1.1-py3-none-any.whl (66.1 kB view details)

Uploaded Python 3

File details

Details for the file distribird-0.1.1.tar.gz.

File metadata

  • Download URL: distribird-0.1.1.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for distribird-0.1.1.tar.gz
Algorithm Hash digest
SHA256 415e53fa8bcea8778524b764e6d723bad0d9ba5b735ad2a4ed3a602833693048
MD5 79106cae97b1225f00dc03389b26d736
BLAKE2b-256 5da8236317f16dddbe6190dae11189979bdaad983084e8bcfce64df1b5fe53bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for distribird-0.1.1.tar.gz:

Publisher: publish.yml on HUN-REN-AI1Science/Distribird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file distribird-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: distribird-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 66.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for distribird-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6e9f531b5f079d11685eedf00684c6ee3c883353e9e156d155fc5af7e4aa15df
MD5 498c817624239ca87ef13118a52dc03e
BLAKE2b-256 3e887751bb703ccc4811f2d562225c588da416a90e77efa162053b4530a1e076

See more details on using hashes here.

Provenance

The following attestation bundles were made for distribird-0.1.1-py3-none-any.whl:

Publisher: publish.yml on HUN-REN-AI1Science/Distribird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page