Skip to main content

Literature-informed Prior distributions for Bayesian model calibration

Project description

Distribird

Automated Bayesian prior construction from scientific literature

Python 3.10+ License: MIT Tests: 148 passed Linting: ruff LangGraph FastAPI Open in Streamlit


Distribird turns a parameter name and description into a fully cited, publication-ready prior distribution. It searches Semantic Scholar, OpenAlex, and LLM deep-research agents in parallel, extracts numerical values from papers, and fits the best-matching scipy.stats distribution via AIC selection.

"I need a prior for maximum leaf area index of maize."

truncated_normal(mu=5.2, sigma=1.5, a=0, b=12) — fitted from 6 peer-reviewed sources with full citations.

Try it now

Why Distribird?

Bayesian calibration requires informative priors, but building them from literature is tedious. Researchers default to flat priors, losing valuable domain knowledge. Distribird closes that gap: describe your parameter, get a defensible prior in seconds.

Architecture

                        ┌──────────────────────────────┐
                        │        LangGraph DAG         │
                        └──────────────────────────────┘

START ─► Enrich ─► QueryGen ─► Search ─► RelevanceJudge ──┬► CrossEnrich ───┐
                                 ▲                        │                 │
                                 │                        └► FetchFulltext ◄┘
                          RefineSearch                            │
                                 ▲                            Extract
                                 │                                │
                                 │   RefineExtraction ◄─── QualityGate
                                 │         │                │     │
                                 │         └────────────────┘     │
                                 │                                ▼
                                 └─────────────────────────  Synthesize ─► END

Multi-agent search — Semantic Scholar, OpenAlex, and LLM deep-research agents run concurrently; a moderator LLM selects the best papers via deliberation.

Relevance scoring — An LLM-based relevance judge scores each paper before extraction. When multiple high-relevance papers are found, the pipeline routes through cross-enrichment (citation snowballing + follow-up queries) to discover additional sources.

Feedback loops — A quality gate inspects extraction results and can trigger search refinement (new queries) or extraction refinement (web-assisted re-extraction) before falling through to synthesis.

Budget-boundedIterationBudget caps every loop to guarantee termination.

Live progress — The pipeline streams node-by-node updates to the UI, showing which step is running, paper/value counts, and per-parameter progress bars.

Quickstart

Install

pip install distribird
Development install
git clone https://github.com/HUN-REN-AI1Science/Distribird.git
cd distribird
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest                     # 148 tests, all passing

Configure

Distribird reads configuration from environment variables (prefix DISTRIBIRD_) or a .env file in the project root.

# .env (or export these in your shell)
DISTRIBIRD_LLM_BASE_URL="http://localhost:4000"   # any OpenAI-compatible endpoint
DISTRIBIRD_LLM_API_KEY="your-key"
DISTRIBIRD_LLM_MODEL="gpt-4o"
DISTRIBIRD_SEMANTIC_SCHOLAR_API_KEY=""            # optional, increases rate limits

Sidebar behaviour in the Streamlit UI:

  • Settings provided in .env are used automatically — no manual input needed.
  • Settings not provided in .env appear as required fields in the sidebar; the user must fill them in before generation can start.
  • An "Override configured settings" toggle lets users temporarily replace .env values without editing the file.
  • Literature source toggles (Semantic Scholar, OpenAlex, LLM Web Search, LLM Deep Research) are always visible and control which connection fields are required.

Use

Python

import asyncio
from distribird.agent.pipeline import run_parameter
from distribird.models import ParameterInput, ConstraintSpec

result = asyncio.run(run_parameter(
    ParameterInput(
        name="max_lai",
        description="Maximum leaf area index of maize",
        unit="m2/m2",
        domain_context="Biome-BGCMuSo maize crop modeling",
        constraints=ConstraintSpec(lower_bound=0, upper_bound=12),
    )
))

print(result.prior.display_name())   # truncated_normal(mu=5.2, sigma=1.5, a=0, b=12)
print(result.prior.n_sources)        # 6
print(result.prior.confidence.value) # high

REST API

distribird-api                          # starts on :8000

curl -u demo:distribird2026 -X POST http://localhost:8000/api/v1/parameter \
  -H "Content-Type: application/json" \
  -d '{"name":"max_lai","description":"Maximum leaf area index of maize","unit":"m2/m2"}'

Streamlit UI

Try the hosted version at distribird.streamlit.app, or run locally:

streamlit run src/distribird/ui/app.py

Prior Fitting Strategy

Evidence Method Confidence
5+ values AIC across Normal, Truncated Normal, Gamma, Log-Normal, Beta High
2 – 4 values Moment matching with widened σ Medium
1 value Wide Normal centered on value Low
0 values Jeffreys / wide uninformative prior None

All fitted distributions respect user-specified physical constraints (bounds).

Export Formats

from distribird.export.json_export import export_json
from distribird.export.r_export import export_r
from distribird.export.python_export import export_python
Format Output
JSON Parameter name, family, params, citations, confidence
R Executable R script with distribution calls
Python scipy.stats code ready for MCMC samplers

Demo

A complete worked example using five Biome-BGCMuSo maize parameters:

python examples/maize_bgcmuso/demo.py

Testing

pytest                 # 148 tests
ruff check src/ tests/ # lint
mypy src/distribird/      # type checking (strict)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distribird-0.1.0.tar.gz (82.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distribird-0.1.0-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file distribird-0.1.0.tar.gz.

File metadata

  • Download URL: distribird-0.1.0.tar.gz
  • Upload date:
  • Size: 82.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for distribird-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dbbbd01f997c593433d6b48f038ba6eff54fd16872a866cc035682e6eddc2135
MD5 ef79a9e5052d90486bcc15c52b52a1bf
BLAKE2b-256 a444d45ab5cb00ea4ea7ec7022da2b6a57786141e27bdcb1975c31b8a4afc6df

See more details on using hashes here.

Provenance

The following attestation bundles were made for distribird-0.1.0.tar.gz:

Publisher: publish.yml on HUN-REN-AI1Science/Distribird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file distribird-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: distribird-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for distribird-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 beb2704882e2762497ac7e976d5ca4d6108bffeea09a3061ca4f5d4830585c9a
MD5 436274fcf1daf7af398ea878c9ca3072
BLAKE2b-256 b7c843ff3df5b72754a0b739a604a43fa8942f009e5564b535b6830f25e89a59

See more details on using hashes here.

Provenance

The following attestation bundles were made for distribird-0.1.0-py3-none-any.whl:

Publisher: publish.yml on HUN-REN-AI1Science/Distribird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page