Skip to main content

BioLearns: Computational Biology and Bioinformatics Toolbox in Python

Project description

biolearns

BioLearns: Computational Biology and Bioinformatics Toolbox in Python http://biolearns.com

Installation

  • From PyPI
pip install biolearns

Documentation and Tutorials

  • We select three examples listed below. For full list of tutorial, check our github wiki page:

    Wiki

1. Read TCGA Data

Example: Read TCGA Breast invasive carcinoma (BRCA) data

Data is downloaded directly from https://gdac.broadinstitute.org/. The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

from biolearns.dataset.TCGA import TCGACancer
brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq
clinical = brca.clinical

TCGA cancer table shortcut:

Barcode Cancer full name Version
1 ACC Adrenocortical carcinoma 2016_01_28
2 BLCA Bladder urothelial carcinoma 2016_01_28
3 BRCA Breast invasive carcinoma 2016_01_28
4 CESC Cervical and endocervical cancers 2016_01_28
5 CHOL Cholangiocarcinoma 2016_01_28
6 COAD Colon adenocarcinoma 2016_01_28
7 COADREAD Colorectal adenocarcinoma 2016_01_28
8 DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma 2016_01_28
9 ESCA Esophageal carcinoma 2016_01_28
... ... ... ...

2. Gene Co-expression Analysis

We firstly download and access the mRNAseq data.

from biolearns.dataset.TCGA import TCGACancer

brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq

mRNAseq data is noisy. We filter out 50% of genes with lowest mean values, and then filter out 50% remained genes with lowest variance values.

from biolearns.preprocessing.filter import expression_filter
mRNAseq = expression_filter(mRNAseq, meanq = 0.5, varq = 0.5)

We then use lmQCM class to create an lmQCM object lobj.

The gene co-expression analysis is performed by simply call the fit() function.

from biolearns.coexpression.lmQCM import lmQCM

lobj = lmQCM(mRNAseq)
clusters, genes, eigengene_mat = lobj.fit()

3. Univariate survival analysis

We firstly download and access the mRNAseq data. Use breast cancer as an example.

from biolearns.dataset.TCGA import TCGACancer

brca = TCGACancer('BRCA')
mRNAseq = brca.mRNAseq

We import logranktest from survival subpackage. Choose gene "ABLIM3" as the univariate input.

from biolearns.survival import logranktest

r = mRNAseq.loc['ABLIM3',].values

We find the intersection of univariate, time, and event data

bcd_m = [b[:12] for b in mRNAseq.columns]
bcd_p = [b[:12] for b in clinical.index]
bcd = np.intersect1d(bcd_m, bcd_p)

r = r[np.nonzero(np.in1d(bcd, bcd_m))[0]]
t = brca.overall_survival_time[np.nonzero(np.in1d(bcd, bcd_p))[0]]
e = brca.overall_survival_event[np.nonzero(np.in1d(bcd, bcd_p))[0]]

We perform log-rank test:

logrank_results, fig = logranktest(r[~np.isnan(t)], t[~np.isnan(t)], e[~np.isnan(t)])
test_statistic, p_value = logrank_results.test_statistic, logrank_results.p_value

The output figure looks like:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biolearns-0.0.15.tar.gz (16.7 kB view hashes)

Uploaded Source

Built Distribution

biolearns-0.0.15-py3-none-any.whl (29.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page