From microscope to manuscript, in one repo. The AI lab for biological researchers.
Project description
vaultlab
"From microscope to manuscript, in one repo."
vaultlab is a research companion for biological scientists. Most AI lab tools take a research question and try to write the paper for you. vaultlab is different โ it accompanies you through whatever you're actually doing today: searching literature, analyzing your CODEX run, drafting the methods section, building tomorrow's lab-meeting deck, triaging your inbox for the manuscript-deadline email you've been avoiding. With full context of your work โ your knowledge base, your Google Docs, your Outlook calendar โ Claude Code becomes a useful colleague instead of a generic chatbot.
Open-source. Local-first. Claude-Code-native. MIT licensed.
๐ง Alpha software. vaultlab is under active development toward v0.1.0 (target: late May 2026). Expect rough edges. See
docs/KNOWN_LIMITATIONS.md.
What it does
| ๐ | Literature search & citation verification โ PubMed, Semantic Scholar, CrossRef, bioRxiv, Springer, Elsevier, paperclip MCP |
| ๐งฌ | Wet-lab data analysis โ CODEX, MALDI, Visium, scRNA-seq, H&E, flow cytometry |
| ๐ | Publication-quality figures with corpus-backed recipes (every recipe cites โฅ3 published examples) |
| โ๏ธ | Manuscript drafting with NotebookLM-style evidence retrieval โ every [N] shows the exact passage on hover |
| ๐ค | Slide decks built from research outputs โ journal-club, thesis-committee, conference-talk modes |
| ๐ง | Knowledge base (Obsidian-native) that links it all, queryable via semantic search |
| ๐ฅ | Email + calendar context โ vaultlab reads your Outlook (Windows) or Gmail to know what's pressing |
| ๐ | Google Docs integration โ your lab work log + Sheets data + Drive files become first-class context |
Companion mode, not autonomous mode
vaultlab is not an autonomous AI scientist. It does not generate experiment ideas in a vacuum, run robots, or submit papers without you. It assumes:
- You have ideas โ vaultlab amplifies them
- You have context โ vaultlab indexes it
- You make the calls โ vaultlab does the rote 60% so you can focus on the insightful 40%
- You ship the paper โ vaultlab drafts, verifies, formats, but the byline is yours alone
The "research companion" framing is intentional. The published-paper-via-AI bans many journals impose? Not our use case. "Here are 23 things vaultlab made my week easier" is.
Install
git clone https://github.com/bobbyni819/vaultlab && cd vaultlab
pip install -e ".[all]"
vaultlab setup # interactive: API keys, KB path, Obsidian, Google, Outlook
Or, if you only want a piece (citations, lit search, figures):
pip install vaultlab # core
pip install "vaultlab[research,citations]" # specific subpackages
5-minute Hello World
vaultlab demo pbmc3k
In ~2 minutes on a laptop, this:
- Downloads the 3k PBMC dataset (50 MB)
- Runs QC + normalization + Leiden clustering
- Auto-annotates clusters via LLM (with hedged voice and quoted evidence)
- Renders 3 publication-quality figures
- Builds a 5-slide journal-club deck with speaker notes
- Auto-writes a KB summary note linking everything
Use cases (real ones, not benchmarks)
These are the workflows vaultlab solves end-to-end:
- "I have a CODEX run. Get me to a labeled figure." Ingest TIFF โ segment with Cellpose โ cluster โ LLM-annotate โ publication-tight spatial overlay โ caption draft โ KB note.
- "Draft me a Methods paragraph for the lung paper." Reads project KB โ drafts โ verifies every citation semantically โ flags any HALLUCINATED โ produces a draft you edit, not write from scratch.
- "Find papers using GPR55 in intestinal epithelium." Multi-source lit search (PubMed + bioRxiv + paperclip MCP) โ smart query expansion โ dedupe โ re-rank โ KB ingest of top 10 โ citation-graph view.
- "Build me a journal-club deck on Smith et al. 2024."
/paper-to-slides 10.1038/...extracts figures from PDF โ composes 12-slide deck โ auto-drafts speaker notes โ exports.pptx. - "What's on my calendar this week + which manuscripts are due?" Outlook reads upcoming meetings, Gmail reads journal deadlines, KB cross-references active manuscripts โ integrated daily brief.
See docs/use-cases.md for more (post-v0.1).
Architecture philosophy
vaultlab is a capability layer FOR Claude Code, not a competing harness. Markdown is the user-facing interface; Python is the engine. Slash commands, role prompts, recipes, layouts, and skill definitions are all markdown files Claude Code can read at first repo open.
See docs/architecture.md for the full spec.
The four core commitments
- Markdown is the interface; Python is the engine. Slash commands, role prompts, workflow descriptions are markdown.
- Anti-laziness on semantic reading. Every LLM call requires quoted evidence. No surface-skim.
- Result-oriented agentic loop. User says "draft methods" โ vaultlab plans + verifies + refines internally โ returns finished result.
- KB is the smartness. Every analysis writes to KB; every analysis reads from it. The LLM gets smarter project-by-project.
What's unique vs PaperQA / scanpy / FutureHouse / scverse / Aider
| vaultlab | PaperQA2 | scanpy | FutureHouse | scverse | Aider | |
|---|---|---|---|---|---|---|
| Wet-lab data analysis | โ | โ | โ | โ | โ | โ |
| Literature + citation verification | โ | โ | โ | โ | โ | โ |
| NotebookLM-style evidence retrieval | โ | partial | โ | โ | โ | โ |
| Manuscript drafting | โ | โ | โ | partial | โ | โ |
| Slide deck output | โ | โ | โ | โ | โ | โ |
| Calendar / inbox context | โ | โ | โ | โ | โ | โ |
| Knowledge base (Obsidian) | โ | โ | โ | โ | โ | โ |
| Local-first | โ | โ | โ | โ | โ | โ |
| Companion mode (not autonomous) | โ | partial | n/a | โ | n/a | โ |
| Claude-Code-native skill bundle | โ | โ | โ | โ | โ | partial |
No tool does all of these. vaultlab's value is the combination โ wet-lab analysis (scverse-grade) + literature verification (PaperQA-grade) + manuscript + slides + life-context (calendar/inbox/docs) wired through Claude Code.
If you only need one piece, those tools are great. If you want a research companion, vaultlab is the only OSS option.
See docs/comparison.md for the full positioning analysis.
Demos
| Demo | Dataset | Time |
|---|---|---|
examples/pbmc3k |
3k PBMCs (scRNA-seq) | 2 min โ Hello World |
examples/visium_brain |
10x mouse brain Visium | 30 min โ spatial transcriptomics |
examples/codex_hubmap_tonsil |
HuBMAP tonsil CODEX | 30 min โ flagship spatial imaging |
Documentation
Setup:
docs/setup-obsidian.mdโ Obsidian + plugin walkthroughdocs/setup-api-keys.mdโ Anthropic + literature API keysdocs/setup-google.mdโ Google ecosystem (Docs, Sheets, Drive, Gmail, Calendar)docs/setup-outlook-windows.mdโ Outlook COM (Windows-only)
Reference:
docs/architecture.mdโ full architectural specdocs/use-cases.mdโ concrete examples of what vaultlab solvesdocs/comparison.mdโ vs other tools
Privacy & limits:
docs/data-privacy.mdโ what data leaves your machinedocs/compliance.mdโ explicit non-HIPAA disclosuredocs/long-term-reproducibility.mdโ model-version philosophydocs/KNOWN_LIMITATIONS.mdโ honest failures
For contributors:
CONTRIBUTING.mdโ how to contributeAGENTS.mdโ invariants and conventionsCLAUDE.mdโ entrypoint for Claude Code sessionsINSPIRATIONS.mdโ what we drew from where (auditable lineage)
Citation
See CITATION.cff. Once v0.1.0 ships, the preferred citation is:
@software{ni_vaultlab_2026,
author = {Ni, Bobby Y.X.},
title = {vaultlab: A research companion for biological scientists},
year = 2026,
url = {https://github.com/bobbyni819/vaultlab},
version= {0.1.0}
}
Privacy & compliance
vaultlab uses Anthropic's Claude API. Prompt content is sent to Anthropic. vaultlab is NOT HIPAA-compliant. Do NOT use with PHI/PII/IRB-restricted data. See docs/data-privacy.md.
When you opt into Google or Outlook integration, vaultlab also reads:
- Google Docs / Sheets / Drive content you authorize
- Gmail messages matching your search criteria
- Outlook calendar events + email subjects/bodies
This data may end up in prompts sent to Anthropic. Do not enable Google/Outlook integration if your account contains PHI or institution-restricted data. Each integration has its own scopes you can audit; see docs/data-privacy.md.
By using vaultlab, you take full responsibility for compliance with your institutional, IRB, IACUC, and regulatory obligations.
Author
Bobby Y.X. Ni โ Hickey Lab, Duke University Biomedical Engineering.
License
MIT โ anyone can use, modify, distribute, including commercial.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vaultlab-0.0.1.tar.gz.
File metadata
- Download URL: vaultlab-0.0.1.tar.gz
- Upload date:
- Size: 107.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e19446fedbc8403c772069b9d8c19972ea75a70874e15ae9a7c25b89634d7a00
|
|
| MD5 |
6ac07db3a2b75f3da7b41425b2a59460
|
|
| BLAKE2b-256 |
61b395d50a021c393cd8d7bd18134ed2e2af34f05ed09d78d317e7965bd84596
|
Provenance
The following attestation bundles were made for vaultlab-0.0.1.tar.gz:
Publisher:
release.yml on bobbyni819/vaultlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vaultlab-0.0.1.tar.gz -
Subject digest:
e19446fedbc8403c772069b9d8c19972ea75a70874e15ae9a7c25b89634d7a00 - Sigstore transparency entry: 1397947171
- Sigstore integration time:
-
Permalink:
bobbyni819/vaultlab@f7803d14dd85cf299ab1434c1676e0f7128fb85a -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/bobbyni819
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f7803d14dd85cf299ab1434c1676e0f7128fb85a -
Trigger Event:
push
-
Statement type:
File details
Details for the file vaultlab-0.0.1-py3-none-any.whl.
File metadata
- Download URL: vaultlab-0.0.1-py3-none-any.whl
- Upload date:
- Size: 83.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe5af01e3047dc2f96c51e9d7e02a4fc67d1ab11178f428971759f4f801c4d3f
|
|
| MD5 |
3e6bf15d684904b4bf49c3973194324a
|
|
| BLAKE2b-256 |
f75bd97a7040cd5964d9b48c7a1233793b3712a706aaa149dff274cc44cf2a14
|
Provenance
The following attestation bundles were made for vaultlab-0.0.1-py3-none-any.whl:
Publisher:
release.yml on bobbyni819/vaultlab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vaultlab-0.0.1-py3-none-any.whl -
Subject digest:
fe5af01e3047dc2f96c51e9d7e02a4fc67d1ab11178f428971759f4f801c4d3f - Sigstore transparency entry: 1397947182
- Sigstore integration time:
-
Permalink:
bobbyni819/vaultlab@f7803d14dd85cf299ab1434c1676e0f7128fb85a -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/bobbyni819
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f7803d14dd85cf299ab1434c1676e0f7128fb85a -
Trigger Event:
push
-
Statement type: