Skip to main content

BioLearns: Computational Biology and Bioinformatics Toolbox in Python

Project description

biolearns

BioLearns: Computational Biology and Bioinformatics Toolbox in Python http://biolearns.medicine.iu.edu

license

Installation

  • From PyPI
pip install biolearns -U

Documentation and Tutorials

  • We select three examples listed below. For full list of tutorial, check our github wiki page:

    Wiki

Disclaimer

Please note that this is a pre-release version of the BioLearns which is still undergoing final testing before its official release. The website, its software and all content found on it are provided on an "as is" and "as available" basis. BioLearns does not give any warranties, whether express or implied, as to the suitability or usability of the website, its software or any of its content. BioLearns will not be liable for any loss, whether such loss is direct, indirect, special or consequential, suffered by any party as a result of their use of the libraries or content. Any usage of the libraries is done at the user's own risk and the user will be solely responsible for any damage to any computer system or loss of data that results from such activities. Should you encounter any bugs, glitches, lack of functionality or other problems on the website, please let us know immediately so we can rectify these accordingly. Your help in this regard is greatly appreciated.

1. Read TCGA Data

Example: Read TCGA Breast invasive carcinoma (BRCA) data

Data is downloaded directly from https://gdac.broadinstitute.org/. The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

from biolearns.dataset import TCGA
brca = TCGA('BRCA')
mRNAseq = brca.mRNAseq
clinical = brca.clinical

TCGA cancer table shortcut:

Barcode Cancer full name Version
1 ACC Adrenocortical carcinoma 2016_01_28
2 BLCA Bladder urothelial carcinoma 2016_01_28
3 BRCA Breast invasive carcinoma 2016_01_28
4 CESC Cervical and endocervical cancers 2016_01_28
5 CHOL Cholangiocarcinoma 2016_01_28
6 COAD Colon adenocarcinoma 2016_01_28
7 COADREAD Colorectal adenocarcinoma 2016_01_28
8 DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma 2016_01_28
9 ESCA Esophageal carcinoma 2016_01_28
... ... ... ...

2. Gene Co-expression Analysis

We firstly download and access the mRNAseq data.

from biolearns.dataset import TCGA

brca = TCGA('BRCA')
mRNAseq = brca.mRNAseq

mRNAseq data is noisy. We filter out 50% of genes with lowest mean values, and then filter out 50% remained genes with lowest variance values.

from biolearns.preprocessing import expression_filter
mRNAseq = expression_filter(mRNAseq, meanq = 0.5, varq = 0.5)

We then use lmQCM class to create an lmQCM object lobj.

The gene co-expression analysis is performed by simply call the fit() function.

from biolearns.coexpression import lmQCM

lobj = lmQCM(mRNAseq)
clusters, genes, eigengene_mat = lobj.fit()

3. Univariate survival analysis

We firstly download and access the mRNAseq data. Use breast cancer as an example.

from biolearns.dataset import TCGA

brca = TCGA('BRCA')
mRNAseq = brca.mRNAseq

We import logranktest from survival subpackage. Choose gene "ABLIM3" as the univariate input.

from biolearns.survival import logranktest

r = mRNAseq.loc['ABLIM3',].values

We find the intersection of univariate, time, and event data

bcd_m = [b[:12] for b in mRNAseq.columns]
bcd_p = [b[:12] for b in clinical.index]
bcd = np.intersect1d(bcd_m, bcd_p)

r = r[np.nonzero(np.in1d(bcd, bcd_m))[0]]
t = brca.overall_survival_time[np.nonzero(np.in1d(bcd, bcd_p))[0]]
e = brca.overall_survival_event[np.nonzero(np.in1d(bcd, bcd_p))[0]]

We perform log-rank test:

logrank_results, fig = logranktest(r[~np.isnan(t)], t[~np.isnan(t)], e[~np.isnan(t)])
test_statistic, p_value = logrank_results.test_statistic, logrank_results.p_value

The output figure looks like:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biolearns-0.0.62.tar.gz (24.8 kB view details)

Uploaded Source

File details

Details for the file biolearns-0.0.62.tar.gz.

File metadata

  • Download URL: biolearns-0.0.62.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for biolearns-0.0.62.tar.gz
Algorithm Hash digest
SHA256 2e9f9da34dd1581f4d0ace6c743001ddf5aa8082e5b4f52a171fdaa94f29366c
MD5 01d0d0c5a7931f859c49f60621377375
BLAKE2b-256 e22255d18231c9d8cf773852fd9f52167bbdaa015f479ec33508a664a1fa9c51

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page