BioLearns: Computational Biology and Bioinformatics Toolbox in Python
Project description
biolearns
BioLearns: Computational Biology and Bioinformatics Toolbox in Python http://biolearns.com
Installation
- From PyPI
pip install biolearns
Documentation and Tutorials
-
We select three examples listed below. For full list of tutorial, check our github wiki page:
1. Read TCGA Data
Example: Read TCGA Breast invasive carcinoma (BRCA) data
Data is downloaded directly from https://gdac.broadinstitute.org/. The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
clinical = brca.clinical
TCGA cancer table shortcut:
Barcode | Cancer full name | Version | |
---|---|---|---|
1 | ACC | Adrenocortical carcinoma | 2016_01_28 |
2 | BLCA | Bladder urothelial carcinoma | 2016_01_28 |
3 | BRCA | Breast invasive carcinoma | 2016_01_28 |
4 | CESC | Cervical and endocervical cancers | 2016_01_28 |
5 | CHOL | Cholangiocarcinoma | 2016_01_28 |
6 | COAD | Colon adenocarcinoma | 2016_01_28 |
7 | COADREAD | Colorectal adenocarcinoma | 2016_01_28 |
8 | DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | 2016_01_28 |
9 | ESCA | Esophageal carcinoma | 2016_01_28 |
... | ... | ... | ... |
2. Gene Co-expression Analysis
We firstly download and access the mRNAseq data.
from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
mRNAseq data is noisy. We filter out 50% of genes with lowest mean values, and then filter out 50% remained genes with lowest variance values.
from biolearns.preprocessing.filter import expression_filter
mRNAseq = expression_filter(mRNAseq, meanq = 0.5, varq = 0.5)
We then use lmQCM class to create an lmQCM object lobj
.
The gene co-expression analysis is performed by simply call the fit()
function.
from biolearns.coexpression.lmQCM import lmQCM
lobj = lmQCM(mRNAseq)
clusters, genes, eigengene_mat = lobj.fit()
3. Univariate survival analysis
We firstly download and access the mRNAseq data. Use breast cancer as an example.
from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
We import logranktest from survival subpackage. Choose gene "ABLIM3" as the univariate input.
from biolearns.survival import logranktest
r = mRNAseq.loc['ABLIM3',].values
We find the intersection of univariate, time, and event data
bcd_m = [b[:12] for b in mRNAseq.columns]
bcd_p = [b[:12] for b in clinical.index]
bcd = np.intersect1d(bcd_m, bcd_p)
r = r[np.nonzero(np.in1d(bcd, bcd_m))[0]]
t = brca.overall_survival_time[np.nonzero(np.in1d(bcd, bcd_p))[0]]
e = brca.overall_survival_event[np.nonzero(np.in1d(bcd, bcd_p))[0]]
We perform log-rank test:
logrank_results, fig = logranktest(r[~np.isnan(t)], t[~np.isnan(t)], e[~np.isnan(t)])
test_statistic, p_value = logrank_results.test_statistic, logrank_results.p_value
The output figure looks like:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for biolearns-0.0.21-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9302ee1cba168a2fa41f5f7ad92e128d6d8270d3774cc78858a9e7bc959c5053 |
|
MD5 | a8f1f3d9c9c1556646666301a1acb4c3 |
|
BLAKE2b-256 | f1374ad16dd1e2095be2807b60c55255ad262da77c4e326c485ebdb53fc60ba6 |