Literature-informed Prior distributions for Bayesian model calibration
Project description
Automated Bayesian prior construction from scientific literature
Distribird turns a parameter name and description into a fully cited, publication-ready prior distribution.
It searches Semantic Scholar, OpenAlex, and LLM deep-research agents in parallel, extracts numerical values from papers, and fits the best-matching scipy.stats distribution via AIC selection.
"I need a prior for maximum leaf area index of maize."
➜
truncated_normal(mu=5.2, sigma=1.5, a=0, b=12)— fitted from 6 peer-reviewed sources with full citations.
Why Distribird?
Bayesian calibration requires informative priors, but building them from literature is tedious. Researchers default to flat priors, losing valuable domain knowledge. Distribird closes that gap: describe your parameter, get a defensible prior in seconds.
Architecture
┌──────────────────────────────┐
│ LangGraph DAG │
└──────────────────────────────┘
START ─► Enrich ─► QueryGen ─► Search ─► RelevanceJudge ──┬► CrossEnrich ───┐
▲ │ │
│ └► FetchFulltext ◄┘
RefineSearch │
▲ Extract
│ │
│ RefineExtraction ◄─── QualityGate
│ │ │ │
│ └────────────────┘ │
│ ▼
└───────────────────────── Synthesize ─► END
Multi-agent search — Semantic Scholar, OpenAlex, and LLM deep-research agents run concurrently; a moderator LLM selects the best papers via deliberation.
Relevance scoring — An LLM-based relevance judge scores each paper before extraction. When multiple high-relevance papers are found, the pipeline routes through cross-enrichment (citation snowballing + follow-up queries) to discover additional sources.
Feedback loops — A quality gate inspects extraction results and can trigger search refinement (new queries) or extraction refinement (web-assisted re-extraction) before falling through to synthesis.
Budget-bounded — IterationBudget caps every loop to guarantee termination.
Live progress — The pipeline streams node-by-node updates to the UI, showing which step is running, paper/value counts, and per-parameter progress bars.
Quickstart
Install
pip install distribird
Development install
git clone https://github.com/HUN-REN-AI1Science/Distribird.git
cd distribird
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest # 148 tests, all passing
Configure
Distribird reads configuration from environment variables (prefix DISTRIBIRD_) or a .env file in the project root.
# .env (or export these in your shell)
DISTRIBIRD_LLM_BASE_URL="http://localhost:4000" # any OpenAI-compatible endpoint
DISTRIBIRD_LLM_API_KEY="your-key"
DISTRIBIRD_LLM_MODEL="gpt-4o"
DISTRIBIRD_SEMANTIC_SCHOLAR_API_KEY="" # optional, increases rate limits
Sidebar behaviour in the Streamlit UI:
- Settings provided in
.envare used automatically — no manual input needed. - Settings not provided in
.envappear as required fields in the sidebar; the user must fill them in before generation can start. - An "Override configured settings" toggle lets users temporarily replace
.envvalues without editing the file. - Literature source toggles (Semantic Scholar, OpenAlex, LLM Web Search, LLM Deep Research) are always visible and control which connection fields are required.
Use
Python
import asyncio
from distribird.agent.pipeline import run_parameter
from distribird.models import ParameterInput, ConstraintSpec
result = asyncio.run(run_parameter(
ParameterInput(
name="max_lai",
description="Maximum leaf area index of maize",
unit="m2/m2",
domain_context="Biome-BGCMuSo maize crop modeling",
constraints=ConstraintSpec(lower_bound=0, upper_bound=12),
)
))
print(result.prior.display_name()) # truncated_normal(mu=5.2, sigma=1.5, a=0, b=12)
print(result.prior.n_sources) # 6
print(result.prior.confidence.value) # high
REST API
distribird-api # starts on :8000
curl -u demo:distribird2026 -X POST http://localhost:8000/api/v1/parameter \
-H "Content-Type: application/json" \
-d '{"name":"max_lai","description":"Maximum leaf area index of maize","unit":"m2/m2"}'
Streamlit UI
Try the hosted version at distribird.streamlit.app, or run locally:
streamlit run src/distribird/ui/app.py
Prior Fitting Strategy
| Evidence | Method | Confidence |
|---|---|---|
| 5+ values | AIC across Normal, Truncated Normal, Gamma, Log-Normal, Beta | High |
| 2 – 4 values | Moment matching with widened σ | Medium |
| 1 value | Wide Normal centered on value | Low |
| 0 values | Jeffreys / wide uninformative prior | None |
All fitted distributions respect user-specified physical constraints (bounds).
Export Formats
from distribird.export.json_export import export_json
from distribird.export.r_export import export_r
from distribird.export.python_export import export_python
| Format | Output |
|---|---|
| JSON | Parameter name, family, params, citations, confidence |
| R | Executable R script with distribution calls |
| Python | scipy.stats code ready for MCMC samplers |
Demo
A complete worked example using five Biome-BGCMuSo maize parameters:
python examples/maize_bgcmuso/demo.py
Testing
pytest # 148 tests
ruff check src/ tests/ # lint
mypy src/distribird/ # type checking (strict)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distribird-0.1.1.tar.gz.
File metadata
- Download URL: distribird-0.1.1.tar.gz
- Upload date:
- Size: 82.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
415e53fa8bcea8778524b764e6d723bad0d9ba5b735ad2a4ed3a602833693048
|
|
| MD5 |
79106cae97b1225f00dc03389b26d736
|
|
| BLAKE2b-256 |
5da8236317f16dddbe6190dae11189979bdaad983084e8bcfce64df1b5fe53bf
|
Provenance
The following attestation bundles were made for distribird-0.1.1.tar.gz:
Publisher:
publish.yml on HUN-REN-AI1Science/Distribird
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
distribird-0.1.1.tar.gz -
Subject digest:
415e53fa8bcea8778524b764e6d723bad0d9ba5b735ad2a4ed3a602833693048 - Sigstore transparency entry: 1075885755
- Sigstore integration time:
-
Permalink:
HUN-REN-AI1Science/Distribird@02f1a24b0ca0acdaa7e9690de26071e8736c1997 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/HUN-REN-AI1Science
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@02f1a24b0ca0acdaa7e9690de26071e8736c1997 -
Trigger Event:
release
-
Statement type:
File details
Details for the file distribird-0.1.1-py3-none-any.whl.
File metadata
- Download URL: distribird-0.1.1-py3-none-any.whl
- Upload date:
- Size: 66.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e9f531b5f079d11685eedf00684c6ee3c883353e9e156d155fc5af7e4aa15df
|
|
| MD5 |
498c817624239ca87ef13118a52dc03e
|
|
| BLAKE2b-256 |
3e887751bb703ccc4811f2d562225c588da416a90e77efa162053b4530a1e076
|
Provenance
The following attestation bundles were made for distribird-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on HUN-REN-AI1Science/Distribird
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
distribird-0.1.1-py3-none-any.whl -
Subject digest:
6e9f531b5f079d11685eedf00684c6ee3c883353e9e156d155fc5af7e4aa15df - Sigstore transparency entry: 1075885776
- Sigstore integration time:
-
Permalink:
HUN-REN-AI1Science/Distribird@02f1a24b0ca0acdaa7e9690de26071e8736c1997 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/HUN-REN-AI1Science
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@02f1a24b0ca0acdaa7e9690de26071e8736c1997 -
Trigger Event:
release
-
Statement type: