BioLearns: Computational Biology and Bioinformatics Toolbox in Python
Project description
biolearns
BioLearns: Computational Biology and Bioinformatics Toolbox in Python http://biolearns.com
Installation
- From PyPI
pip install biolearns
Documentation and Tutorials
-
We select three examples listed below. For full list of tutorial, check our github wiki page:
1. Read TCGA Data
Example: Read TCGA Breast invasive carcinoma (BRCA) data
Data is downloaded directly from https://gdac.broadinstitute.org/. The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
clinical = brca.clinical
TCGA cancer table shortcut:
Barcode | Cancer full name | Version | |
---|---|---|---|
1 | ACC | Adrenocortical carcinoma | 2016_01_28 |
2 | BLCA | Bladder urothelial carcinoma | 2016_01_28 |
3 | BRCA | Breast invasive carcinoma | 2016_01_28 |
4 | CESC | Cervical and endocervical cancers | 2016_01_28 |
5 | CHOL | Cholangiocarcinoma | 2016_01_28 |
6 | COAD | Colon adenocarcinoma | 2016_01_28 |
7 | COADREAD | Colorectal adenocarcinoma | 2016_01_28 |
8 | DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | 2016_01_28 |
9 | ESCA | Esophageal carcinoma | 2016_01_28 |
... | ... | ... | ... |
2. Gene Co-expression Analysis
We firstly download and access the mRNAseq data.
from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
mRNAseq data is noisy. We filter out 50% of genes with lowest mean values, and then filter out 50% remained genes with lowest variance values.
from biolearns.preprocessing.filter import expression_filter
mRNAseq = expression_filter(mRNAseq, meanq = 0.5, varq = 0.5)
We then use lmQCM class to create an lmQCM object lobj
.
The gene co-expression analysis is performed by simply call the fit()
function.
from biolearns.coexpression.lmQCM import lmQCM
lobj = lmQCM(mRNAseq)
clusters, genes, eigengene_mat = lobj.fit()
3. Univariate survival analysis
We firstly download and access the mRNAseq data. Use breast cancer as an example.
from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
We import logranktest from survival subpackage. Choose gene "ABLIM3" as the univariate input.
from biolearns.survival import logranktest
r = mRNAseq.loc['ABLIM3',].values
We find the intersection of univariate, time, and event data
bcd_m = [b[:12] for b in mRNAseq.columns]
bcd_p = [b[:12] for b in clinical.index]
bcd = np.intersect1d(bcd_m, bcd_p)
r = r[np.nonzero(np.in1d(bcd, bcd_m))[0]]
t = brca.overall_survival_time[np.nonzero(np.in1d(bcd, bcd_p))[0]]
e = brca.overall_survival_event[np.nonzero(np.in1d(bcd, bcd_p))[0]]
We perform log-rank test:
logrank_results, fig = logranktest(r[~np.isnan(t)], t[~np.isnan(t)], e[~np.isnan(t)])
test_statistic, p_value = logrank_results.test_statistic, logrank_results.p_value
The output figure looks like:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for biolearns-0.0.16-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbb7d1a567e15749b55106211edf9d118d2ed59210a382fd03a70a748efa3a0f |
|
MD5 | a52ddd1c398b199251edb38abe0ff225 |
|
BLAKE2b-256 | 4322503d45f0de7d4501da0290fb589e92526c095340363e443ba8ba75beae4e |