Permanent archiving of Bayesian/MCMC analysis artifacts on Arweave
Project description
BioAnchor 🔬⛓️
Permanent, verifiable archiving of Bayesian/MCMC analysis artifacts on Arweave.
The Problem
Bayesian and MCMC analyses in computational biology are nearly impossible to reproduce:
- Raw MCMC chains are large and rarely shared
- Zenodo, GitHub, and NCBI can delete or restrict data
- There is no standard for what constitutes a "reproducible" Bayesian analysis artifact
- Drug discovery and genomics data have provenance and audit requirements that no current tool addresses
The Solution
BioAnchor defines a minimal, standardised manifest of everything needed to verify and re-run a Bayesian analysis, then uploads it to Arweave — a decentralised storage network where data is permanent and immutable.
You do not upload raw data (privacy + cost). You upload:
- SHA-256 hash of input data (proof of what was used)
- MCMC configuration: sampler, chains, draws, seed, priors
- Posterior summary statistics (mean, SD, R̂, ESS)
- Software environment fingerprint
This is typically < 5 KB — negligible cost on Arweave.
The Arweave TX ID goes into your paper alongside the DOI. Anyone can verify the analysis, check convergence, and reproduce results with the same seed.
Quick Start
pip install bioanchor[all]
With PyMC
import pymc as pm
from bioanchor import BioAnchor
with pm.Model() as model:
alpha = pm.Normal("alpha", 0, 1)
beta = pm.HalfNormal("beta", 1)
# ... your model ...
idata = pm.sample(2000, random_seed=42)
ba = BioAnchor(wallet_path="wallet.json")
tx_id = ba.archive_pymc(
idata=idata,
model=model,
seed=42,
data=X, # numpy array, hashed not uploaded
data_description="TCGA-BRCA expression matrix (n=500, p=200)",
data_source="TCGA",
title="Sparse Bayesian regression for drug target identification",
authors=["Your Name <you@uni.ac.kr>"],
domain="drug_discovery",
tags=["sparse-regression", "drug-target", "tcga"],
)
print(f"https://arweave.net/{tx_id}")
# → Add this URL to your Methods section
CLI
# Generate a manifest template
bioanchor init --output my_analysis.json
# Upload to Arweave (edit the template first)
bioanchor upload --manifest my_analysis.json --wallet wallet.json
# Verify an uploaded manifest
bioanchor verify xK9mP2abc...
# Check wallet balance
bioanchor balance --wallet wallet.json
Development / Testing (no wallet needed)
ba = BioAnchor(mock=True) # uses MockUploader, prints fake TX ID
tx_id = ba.archive_pymc(idata, seed=42, data=X, title="Test")
Manifest Format
The manifest is a small JSON file (~2–5 KB). This is the core scientific contribution of BioAnchor.
{
"schema_version": "1.0",
"bioanchor_version": "0.1.0",
"created_at": "2026-04-16T09:00:00+00:00",
"title": "Bayesian dose-response IC50 estimation",
"authors": ["Author Name"],
"analysis_type": "MCMC",
"domain": "drug_discovery",
"tags": ["dose-response", "ic50", "hill-equation"],
"software": {
"language": "Python 3.11.0",
"packages": { "pymc": "5.9.0", "arviz": "0.18.0", "numpy": "1.26.0" }
},
"data": {
"sha256": "de2fb170...",
"description": "12-point IC50 curve (log10 μM, % activity)",
"n_samples": 12,
"n_features": 2,
"source": "synthetic"
},
"mcmc": {
"sampler": "NUTS",
"n_chains": 4,
"n_draws": 2000,
"n_warmup": 1000,
"seed": 42,
"prior_spec": {
"log_ic50": "Normal(mu=-0.3, sigma=1.0)",
"hill_n": "HalfNormal(sigma=2.0)"
},
"posterior_mean": { "log_ic50": -0.31, "ic50": 0.49, "hill_n": 1.79 },
"posterior_std": { "log_ic50": 0.09, "ic50": 0.10, "hill_n": 0.21 },
"r_hat": { "log_ic50": 1.001, "ic50": 1.001, "hill_n": 1.003 },
"ess_bulk": { "log_ic50": 892.0, "ic50": 887.0, "hill_n": 755.0 },
"divergences": 0,
"acceptance_rate": 0.93
}
}
Why Arweave?
| Storage | Permanent? | Immutable? | Decentralised? |
|---|---|---|---|
| GitHub | ✗ (account deletion) | ✗ | ✗ |
| Zenodo | ✗ (operator control) | ✗ | ✗ |
| IPFS | ✗ (requires pinning) | ✓ | ✓ |
| Arweave | ✓ (mathematical guarantee) | ✓ | ✓ |
Arweave's endowment model guarantees storage for a minimum of 200 years. A 5 KB manifest costs ~$0.0001 USD to archive permanently.
Arweave Wallet Setup
- Go to arweave.app and generate a wallet
- Download the JWK JSON file (
wallet.json) - Fund with a small amount of AR (~0.01 AR is enough for thousands of uploads)
- Use
bioanchor balance --wallet wallet.jsonto check
For testnet experimentation, use Irys devnet.
Supported Integrations
| Framework | Status |
|---|---|
| PyMC 5.x | ✅ Full integration |
| ArviZ | ✅ (via PyMC) |
| Stan/CmdStanPy | 🔜 Planned |
| NumPyro | 🔜 Planned |
| R (rstan) | 🔜 Planned |
Citing This Tool
If you use BioAnchor in your research, please cite:
[Paper citation — in preparation]
And add to your Methods section:
"MCMC analysis artifacts (manifest, posterior summary, and software environment) were permanently archived on the Arweave network using BioAnchor v0.1.0 (TX ID: https://arweave.net/YOUR_TX_ID)."
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bioanchor-0.1.0.tar.gz.
File metadata
- Download URL: bioanchor-0.1.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f743bd8c1692766904174308c43bd8326f5dba3b9742a7afb4e6b23102af26f5
|
|
| MD5 |
7515f0126577699153e66a15a86c9245
|
|
| BLAKE2b-256 |
1669dfe199cbd15a5607040313df8f29f6a8fc55fb88238c8e2af286cb9077b6
|
File details
Details for the file bioanchor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bioanchor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c08e8e8f5680c2add4cbf29399ea20a3b9baf25de01c3823f096149afe256be8
|
|
| MD5 |
a78893ae4139e3c0dae6de4169810ab8
|
|
| BLAKE2b-256 |
2b377c8816abac8301a9f572a8a0b6c91dd4e6ba2a6c9cfd024158511d9198c0
|