A thin wrapper around rpy2 with strong opinions on how data types should be converted.
Project description
rwrap
A thin wrapper around rpy2 with strong opinions on how data types should be converted. This enables easy usage of R packages from Python with no boilerplate code.
Warning: still work-in-progress, issues and PRs welcome
Installation
pip install rwrap
Usage
Genomic Annotations
Accessing Bioconductor's biomaRt package can be as simple as follows:
from rwrap import biomaRt
biomaRt
## <module 'biomaRt' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/biomaRt'>
snp_list = ["rs7329174", "rs4948523", "rs479445"]
ensembl = biomaRt.useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
df = biomaRt.getBM(
attributes=["refsnp_id", "chr_name", "chrom_start", "consequence_type_tv"],
filters="snp_filter", values=snp_list, mart=ensembl
)
print(df) # pandas.DataFrame
## refsnp_id chr_name chrom_start consequence_type_tv
## 1 rs479445 1 60875960 intron_variant
## 2 rs479445 1 60875960 NMD_transcript_variant
## 3 rs4948523 10 58579338 intron_variant
## 4 rs7329174 13 40983974 intron_variant
Differential Gene Expression analysis workflow
Differentially expressed genes between conditions can be determined using DESeq2 and annotated with biomaRt:
import pandas as pd
from rwrap import DESeq2, biomaRt, base, stats
DESeq2
## <module 'DESeq2' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/DESeq2'>
biomaRt
## <module 'biomaRt' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/biomaRt'>
# retrieve count data (https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP009615)
df_counts = pd.read_csv(
"http://duffel.rail.bio/recount/v2/SRP009615/counts_gene.tsv.gz", sep="\t"
).set_index("gene_id")
df_design = pd.DataFrame(
{"condition": ["1", "2", "1", "2", "3", "4", "3", "4", "5", "6", "5", "6"]}
)
# run differential gene expression analysis
dds = DESeq2.DESeqDataSetFromMatrix(
countData=df_counts, colData=df_design, design=stats.as_formula("~ condition")
)
dds = DESeq2.DESeq(dds)
res = DESeq2.results(dds, contrast=("condition", "1", "2"))
df_res = base.as_data_frame(res)
# annotate result
ensembl = biomaRt.useEnsembl(biomart="genes", dataset="hsapiens_gene_ensembl")
df_anno = biomaRt.getBM(
attributes=["ensembl_gene_id_version", "gene_biotype"],
filters="ensembl_gene_id_version",
values=df_res.index,
mart=ensembl,
).set_index("ensembl_gene_id_version")
df_res = df_res.merge(df_anno, left_index=True, right_index=True).sort_values("padj")
print(df_res.head()) # pd.DataFrame
## baseMean log2FoldChange lfcSE stat pvalue padj gene_biotype
## ENSG00000222806.1 158.010377 22.137400 2.745822 8.062214 7.492501e-16 2.853744e-11 rRNA_pseudogene
## ENSG00000255099.1 65.879611 21.835651 2.915452 7.489627 6.906949e-14 1.315359e-09 processed_pseudogene
## ENSG00000261065.1 92.351998 22.273400 3.144991 7.082182 1.419019e-12 1.351190e-08 lncRNA
## ENSG00000249923.1 154.037908 18.364027 2.636083 6.966407 3.251381e-12 2.476772e-08 lncRNA
## ENSG00000267658.1 64.371181 -19.545702 3.041247 -6.426871 1.302573e-10 8.268736e-07 lncRNA
More examples
Check the tests/
directory for more examples showing how to rewrite R scripts in Python.
Tests
A comprehensive test suite aims at providing stability and avoiding regressions.
The examples in tests/
are validated using pytest
.
Run tests as follows:
$ pytest tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rwrap-0.2.0.tar.gz
.
File metadata
- Download URL: rwrap-0.2.0.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 423debcefe8d21fb55d87678049a09ab94ec153d5e6e41cc727f7b5432d00df1 |
|
MD5 | fd35b26eab0b3d6627c0324d24ddf4e3 |
|
BLAKE2b-256 | cb4aae887b9dfb9d87320b65186854bcf5d8b12452daee00086e93a4dee0a104 |
File details
Details for the file rwrap-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: rwrap-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfa3ade7cde3a0b693297d55da02bd07495f1b47639e88d3a525aa0840fd9b03 |
|
MD5 | b1141024bd3469da1fd716831abc05db |
|
BLAKE2b-256 | 122de24c0080d54173ada3ce5590228da54b89826f1fa4c6b2ca6bbf2bd75c01 |