Skip to main content

Permanent archiving of Bayesian/MCMC analysis artifacts on Arweave

Project description

BioAnchor 🔬⛓️

Permanent, verifiable archiving of Bayesian/MCMC analysis artifacts on Arweave.

PyPI version License: MIT


The Problem

Bayesian and MCMC analyses in computational biology are nearly impossible to reproduce:

  • Raw MCMC chains are large and rarely shared
  • Zenodo, GitHub, and NCBI can delete or restrict data
  • There is no standard for what constitutes a "reproducible" Bayesian analysis artifact
  • Drug discovery and genomics data have provenance and audit requirements that no current tool addresses

The Solution

BioAnchor defines a minimal, standardised manifest of everything needed to verify and re-run a Bayesian analysis, then uploads it to Arweave — a decentralised storage network where data is permanent and immutable.

You do not upload raw data (privacy + cost). You upload:

  • SHA-256 hash of input data (proof of what was used)
  • MCMC configuration: sampler, chains, draws, seed, priors
  • Posterior summary statistics (mean, SD, R̂, ESS)
  • Software environment fingerprint

This is typically < 5 KB — negligible cost on Arweave.

The Arweave TX ID goes into your paper alongside the DOI. Anyone can verify the analysis, check convergence, and reproduce results with the same seed.


Quick Start

pip install bioanchor[all]

With PyMC

import pymc as pm
from bioanchor import BioAnchor

with pm.Model() as model:
    alpha = pm.Normal("alpha", 0, 1)
    beta  = pm.HalfNormal("beta", 1)
    # ... your model ...
    idata = pm.sample(2000, random_seed=42)

ba = BioAnchor(wallet_path="wallet.json")

tx_id = ba.archive_pymc(
    idata=idata,
    model=model,
    seed=42,
    data=X,                                        # numpy array, hashed not uploaded
    data_description="TCGA-BRCA expression matrix (n=500, p=200)",
    data_source="TCGA",
    title="Sparse Bayesian regression for drug target identification",
    authors=["Your Name <you@uni.ac.kr>"],
    domain="drug_discovery",
    tags=["sparse-regression", "drug-target", "tcga"],
)

print(f"https://arweave.net/{tx_id}")
# → Add this URL to your Methods section

CLI

# Generate a manifest template
bioanchor init --output my_analysis.json

# Upload to Arweave (edit the template first)
bioanchor upload --manifest my_analysis.json --wallet wallet.json

# Verify an uploaded manifest
bioanchor verify xK9mP2abc...

# Check wallet balance
bioanchor balance --wallet wallet.json

Development / Testing (no wallet needed)

ba = BioAnchor(mock=True)   # uses MockUploader, prints fake TX ID
tx_id = ba.archive_pymc(idata, seed=42, data=X, title="Test")

Manifest Format

The manifest is a small JSON file (~2–5 KB). This is the core scientific contribution of BioAnchor.

{
  "schema_version": "1.0",
  "bioanchor_version": "0.1.0",
  "created_at": "2026-04-16T09:00:00+00:00",
  "title": "Bayesian dose-response IC50 estimation",
  "authors": ["Author Name"],
  "analysis_type": "MCMC",
  "domain": "drug_discovery",
  "tags": ["dose-response", "ic50", "hill-equation"],
  "software": {
    "language": "Python 3.11.0",
    "packages": { "pymc": "5.9.0", "arviz": "0.18.0", "numpy": "1.26.0" }
  },
  "data": {
    "sha256": "de2fb170...",
    "description": "12-point IC50 curve (log10 μM, % activity)",
    "n_samples": 12,
    "n_features": 2,
    "source": "synthetic"
  },
  "mcmc": {
    "sampler": "NUTS",
    "n_chains": 4,
    "n_draws": 2000,
    "n_warmup": 1000,
    "seed": 42,
    "prior_spec": {
      "log_ic50": "Normal(mu=-0.3, sigma=1.0)",
      "hill_n":   "HalfNormal(sigma=2.0)"
    },
    "posterior_mean": { "log_ic50": -0.31, "ic50": 0.49, "hill_n": 1.79 },
    "posterior_std":  { "log_ic50": 0.09,  "ic50": 0.10, "hill_n": 0.21 },
    "r_hat":          { "log_ic50": 1.001, "ic50": 1.001, "hill_n": 1.003 },
    "ess_bulk":       { "log_ic50": 892.0, "ic50": 887.0, "hill_n": 755.0 },
    "divergences": 0,
    "acceptance_rate": 0.93
  }
}

Why Arweave?

Storage Permanent? Immutable? Decentralised?
GitHub ✗ (account deletion)
Zenodo ✗ (operator control)
IPFS ✗ (requires pinning)
Arweave ✓ (mathematical guarantee)

Arweave's endowment model guarantees storage for a minimum of 200 years. A 5 KB manifest costs ~$0.0001 USD to archive permanently.


Arweave Wallet Setup

  1. Go to arweave.app and generate a wallet
  2. Download the JWK JSON file (wallet.json)
  3. Fund with a small amount of AR (~0.01 AR is enough for thousands of uploads)
  4. Use bioanchor balance --wallet wallet.json to check

For testnet experimentation, use Irys devnet.


Supported Integrations

Framework Status
PyMC 5.x ✅ Full integration
ArviZ ✅ (via PyMC)
Stan/CmdStanPy 🔜 Planned
NumPyro 🔜 Planned
R (rstan) 🔜 Planned

Citing This Tool

If you use BioAnchor in your research, please cite:

[Paper citation — in preparation]

And add to your Methods section:

"MCMC analysis artifacts (manifest, posterior summary, and software environment) were permanently archived on the Arweave network using BioAnchor v0.1.0 (TX ID: https://arweave.net/YOUR_TX_ID)."


License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioanchor-0.1.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bioanchor-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file bioanchor-0.1.0.tar.gz.

File metadata

  • Download URL: bioanchor-0.1.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for bioanchor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f743bd8c1692766904174308c43bd8326f5dba3b9742a7afb4e6b23102af26f5
MD5 7515f0126577699153e66a15a86c9245
BLAKE2b-256 1669dfe199cbd15a5607040313df8f29f6a8fc55fb88238c8e2af286cb9077b6

See more details on using hashes here.

File details

Details for the file bioanchor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bioanchor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for bioanchor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c08e8e8f5680c2add4cbf29399ea20a3b9baf25de01c3823f096149afe256be8
MD5 a78893ae4139e3c0dae6de4169810ab8
BLAKE2b-256 2b377c8816abac8301a9f572a8a0b6c91dd4e6ba2a6c9cfd024158511d9198c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page