BioLearns: Computational Biology and Bioinformatics Toolbox in Python
Project description
biolearns
BioLearns: Computational Biology and Bioinformatics Toolbox in Python http://biolearns.medicine.iu.edu
Installation
- From PyPI
pip install biolearns -U
Documentation and Tutorials
-
We select three examples listed below. For full list of tutorial, check our github wiki page:
Disclaimer
Please note that this is a pre-release version of the BioLearns which is still undergoing final testing before its official release. The website, its software and all content found on it are provided on an "as is" and "as available" basis. BioLearns does not give any warranties, whether express or implied, as to the suitability or usability of the website, its software or any of its content. BioLearns will not be liable for any loss, whether such loss is direct, indirect, special or consequential, suffered by any party as a result of their use of the libraries or content. Any usage of the libraries is done at the user's own risk and the user will be solely responsible for any damage to any computer system or loss of data that results from such activities. Should you encounter any bugs, glitches, lack of functionality or other problems on the website, please let us know immediately so we can rectify these accordingly. Your help in this regard is greatly appreciated.
1. Read TCGA Data
Example: Read TCGA Breast invasive carcinoma (BRCA) data
Data is downloaded directly from https://gdac.broadinstitute.org/. The results here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
from biolearns.dataset import TCGA
brca = TCGA('BRCA')
mRNAseq = brca.mRNAseq
clinical = brca.clinical
TCGA cancer table shortcut:
Barcode | Cancer full name | Version | |
---|---|---|---|
1 | ACC | Adrenocortical carcinoma | 2016_01_28 |
2 | BLCA | Bladder urothelial carcinoma | 2016_01_28 |
3 | BRCA | Breast invasive carcinoma | 2016_01_28 |
4 | CESC | Cervical and endocervical cancers | 2016_01_28 |
5 | CHOL | Cholangiocarcinoma | 2016_01_28 |
6 | COAD | Colon adenocarcinoma | 2016_01_28 |
7 | COADREAD | Colorectal adenocarcinoma | 2016_01_28 |
8 | DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | 2016_01_28 |
9 | ESCA | Esophageal carcinoma | 2016_01_28 |
... | ... | ... | ... |
2. Gene Co-expression Analysis
We firstly download and access the mRNAseq data.
from biolearns.dataset import TCGA
brca = TCGA('BRCA')
mRNAseq = brca.mRNAseq
mRNAseq data is noisy. We filter out 50% of genes with lowest mean values, and then filter out 50% remained genes with lowest variance values.
from biolearns.preprocessing import expression_filter
mRNAseq = expression_filter(mRNAseq, meanq = 0.5, varq = 0.5)
We then use lmQCM class to create an lmQCM object lobj
.
The gene co-expression analysis is performed by simply call the fit()
function.
from biolearns.coexpression import lmQCM
lobj = lmQCM(mRNAseq)
clusters, genes, eigengene_mat = lobj.fit()
3. Univariate survival analysis
We firstly download and access the mRNAseq data. Use breast cancer as an example.
from biolearns.dataset import TCGA
brca = TCGA('BRCA')
mRNAseq = brca.mRNAseq
We import logranktest from survival subpackage. Choose gene "ABLIM3" as the univariate input.
from biolearns.survival import logranktest
r = mRNAseq.loc['ABLIM3',].values
We find the intersection of univariate, time, and event data
bcd_m = [b[:12] for b in mRNAseq.columns]
bcd_p = [b[:12] for b in clinical.index]
bcd = np.intersect1d(bcd_m, bcd_p)
r = r[np.nonzero(np.in1d(bcd, bcd_m))[0]]
t = brca.overall_survival_time[np.nonzero(np.in1d(bcd, bcd_p))[0]]
e = brca.overall_survival_event[np.nonzero(np.in1d(bcd, bcd_p))[0]]
We perform log-rank test:
logrank_results, fig = logranktest(r[~np.isnan(t)], t[~np.isnan(t)], e[~np.isnan(t)])
test_statistic, p_value = logrank_results.test_statistic, logrank_results.p_value
The output figure looks like:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file biolearns-0.0.62.tar.gz
.
File metadata
- Download URL: biolearns-0.0.62.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e9f9da34dd1581f4d0ace6c743001ddf5aa8082e5b4f52a171fdaa94f29366c |
|
MD5 | 01d0d0c5a7931f859c49f60621377375 |
|
BLAKE2b-256 | e22255d18231c9d8cf773852fd9f52167bbdaa015f479ec33508a664a1fa9c51 |