Pythonic API for R/Bioconductor statistical methods — calls validated R code, returns pandas DataFrames.
Project description
🪨 rosetta
Python interface to R/Bioconductor — pandas in, pandas out, .report() when you're done.
pip install rosetta-bioc
30-second demo
import rosetta as rb
# DESeq2 differential expression — one call, pandas out
results = rb.deseq2(counts_df, metadata_df, design="~ condition")
results.report()
DESeq2 Results Summary
──────────────────────────────
Total genes tested: 12,000
Significant (padj<0.05): 843 (7.0%)
↑ Upregulated: 428
↓ Downregulated: 415
LFC range: [-4.71, 3.50]
That's it. No R code. No rpy2 boilerplate. No type conversion. Just results.
What it wraps
| R Package | Python | What it does |
|---|---|---|
| DESeq2 | rb.deseq2() |
Differential expression (negative binomial) |
| edgeR | rb.edger() |
Quasi-likelihood differential expression |
| limma | rb.limma_voom() |
Linear models + TREAT significance |
| clusterProfiler | rb.enrich_go() |
GO/KEGG/Reactome pathway enrichment |
| phyloseq | rb.phyloseq() |
Microbiome diversity analysis |
| Seurat | rb.seurat() |
Single-cell RNA-seq |
All functions return a RosettaDataFrame (pandas DataFrame subclass) with a .report() method.
Not a toy — full design support
- Multi-factor designs:
design="~ batch + condition", interaction terms, blocking factors - LFC thresholds: proper hypothesis testing via
lfcThreshold(not post-hoc filtering) - Shrinkage: apeglm, ashr, normal — via
lfc_shrink() - Contrasts:
contrast=["genotype", "mutant", "wildtype"] - QC/normalization/outliers: DESeq2's size factors, Cook's distance, independent filtering all run normally — Rosetta doesn't hide the fitted object
- Weights, correlations: limma-voom with
duplicateCorrelation, sample weights — everything the R function accepts, Rosetta passes through
Show me the R code
Don't trust a black box? Turn on codegen to see exactly what's running:
import rosetta as rb
rb.codegen.enable()
dds = rb.wrappers.deseq2.run_deseq2(counts, meta, design="~ batch + condition")
res = rb.wrappers.deseq2.get_results(dds, lfc_threshold=1.0)
R> library(DESeq2)
R> dds <- DESeqDataSetFromMatrix(countData=counts, colData=metadata, design=~ batch + condition)
R> dds <- DESeq(dds)
R> res <- results(dds, alpha=0.1, lfcThreshold=1.0)
rb.codegen.last() returns it as a string — paste into R to reproduce independently.
Modular DESeq2 API
For more control, use the step-by-step interface:
from rosetta.wrappers.deseq2 import run_deseq2, get_results, lfc_shrink
dds = run_deseq2(counts_df, metadata_df, design="~ condition")
res = get_results(dds, contrast=["condition", "treated", "control"], alpha=0.05)
shrunk = lfc_shrink(dds, coef="condition_treated_vs_control", type="apeglm")
res.report()
shrunk.report()
Enrichment analysis
import rosetta as rb
# Over-representation analysis
go_results = rb.enrich_go(gene_list, org_db="org.Hs.eg.db", ont="BP")
go_results.report()
# KEGG pathways
kegg = rb.enrich_kegg(gene_list, organism="hsa")
kegg.report()
Setup
Python side:
pip install rosetta-bioc
R side (one-time):
Rscript install.R
Or manually:
BiocManager::install(c("DESeq2", "edgeR", "limma", "clusterProfiler"))
Posit Cloud: See docs/posit-cloud.md for zero-config setup.
Requirements
- Python 3.9+
- R 4.0+ with Bioconductor
- rpy2 ≥ 3.5
Philosophy
- Rosetta calls R — it doesn't reimplement it. All statistics run in the original, validated R packages.
- Pandas in, pandas out. No R objects leak into your Python workflow.
- Fail early, fail clearly. Input validation happens in Python before crossing the R boundary.
.report()everything. Results should be immediately interpretable without manual inspection.- Show your work.
codegenprints the equivalent R code so you can verify, reproduce, or learn.
Contributing
See CONTRIBUTING.md. Good first issues are labeled — start with Issue #1: report() enhancements.
Acknowledgments
Built on rpy2 and the extraordinary R/Bioconductor ecosystem. All credit for the statistical methods goes to the original R package authors.
Supported by:
- Google Summer of Code 2026 — funding Catherine's development work
- JPMorgan Chase — startup banking and advisory through their Innovation Economy program
- AWS — quantum computing infrastructure via Amazon Braket
- Nodes Bio, Inc. — project lead, CI/hosting, and engineering
GSoC 2026 · MIT License · Nodes Bio
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rosetta_bioc-0.2.1.tar.gz.
File metadata
- Download URL: rosetta_bioc-0.2.1.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64c7d21c7b41c283d4b1ea8d8897e7621e0e2cfe3a81aee515ca7fb5656daf5a
|
|
| MD5 |
58f54727e555869f6e96da7be6dd4dcc
|
|
| BLAKE2b-256 |
deeb5bdfd1b12cdc48a34d9fb9e55ab904cbe00b1bc6ce0284b0f74b4cb2dd99
|
Provenance
The following attestation bundles were made for rosetta_bioc-0.2.1.tar.gz:
Publisher:
publish.yml on rosetta-bioc/rosetta
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rosetta_bioc-0.2.1.tar.gz -
Subject digest:
64c7d21c7b41c283d4b1ea8d8897e7621e0e2cfe3a81aee515ca7fb5656daf5a - Sigstore transparency entry: 1797359607
- Sigstore integration time:
-
Permalink:
rosetta-bioc/rosetta@b028c4b1c97ff8154091295718fd2f38257de3fd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/rosetta-bioc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b028c4b1c97ff8154091295718fd2f38257de3fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file rosetta_bioc-0.2.1-py3-none-any.whl.
File metadata
- Download URL: rosetta_bioc-0.2.1-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90f6f7ef550d0f065b7c786c32c10766331cab182988466105fd93008d8fca44
|
|
| MD5 |
0369a9aff44d547a40cc62118c2dbff3
|
|
| BLAKE2b-256 |
1580aec2c5b7b9c1c6a51416577c717f733e0c725c61f442f4b7a8cb7512588c
|
Provenance
The following attestation bundles were made for rosetta_bioc-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on rosetta-bioc/rosetta
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rosetta_bioc-0.2.1-py3-none-any.whl -
Subject digest:
90f6f7ef550d0f065b7c786c32c10766331cab182988466105fd93008d8fca44 - Sigstore transparency entry: 1797359727
- Sigstore integration time:
-
Permalink:
rosetta-bioc/rosetta@b028c4b1c97ff8154091295718fd2f38257de3fd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/rosetta-bioc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b028c4b1c97ff8154091295718fd2f38257de3fd -
Trigger Event:
push
-
Statement type: