A biclustering library with datasets, evaluation measures and a benchmarking framework
Project description
biclustlib
The package is an extension of biclustlib Python library by Victor Alexandre Padilha.
It is highly recommended to see the original repository first.
The goal of this package is to create a unified biclustering framework for performing research on gene expression data and comparing different biclustering algorithms and measures.
Distributed under GPLv3 license.
Installation
pip install biclustlib
You must also install R and the following R packages:
Benchmarking example
import multiprocessing
import pandas as pd
from sklearn.preprocessing import KBinsDiscretizer
from biclustlib.algorithms import *
from biclustlib.algorithms.wrappers import *
from biclustlib.benchmark import GeneExpressionBenchmark, Algorithm
from biclustlib.benchmark.util import Converter
from biclustlib.benchmark.data import load_tavazoie, load_prelic
def discretize_data(raw_data: pd.DataFrame, n_bins: int = 2) -> pd.DataFrame:
return pd.DataFrame(KBinsDiscretizer(n_bins, encode='ordinal', strategy='kmeans').fit_transform(raw_data),
index=data.index).astype(int if n_bins > 2 else bool)
if __name__ == '__main__':
pool = multiprocessing.Pool()
data = load_tavazoie()
n_biclusters = 5
discretion_level = 30
reduction_level = 15
significance_cutoff = .05
data_dis = discretize_data(data, discretion_level)
data_bin = discretize_data(data)
setup = [
Algorithm('CCA', ChengChurchAlgorithm(n_biclusters), data),
Algorithm('xMotifs', RConservedGeneExpressionMotifs(n_biclusters), data_dis),
Algorithm('BiBit', BitPatternBiclusteringAlgorithm(), data_bin),
Algorithm('Bimax', RBinaryInclusionMaximalBiclusteringAlgorithm(n_biclusters), data_bin),
Algorithm('LAS', LargeAverageSubmatrices(n_biclusters), data),
Algorithm('Plaid', RPlaid(n_biclusters), data),
Algorithm('Spectral', Spectral(n_clusters=data.shape[1] // 2), data + 2),
Algorithm('QUBIC', RConservedGeneExpressionMotifs(n_biclusters), data_bin),
]
tavazoie_benchmark = GeneExpressionBenchmark(algorithms=setup,
raw_data=data,
n_biclusters=n_biclusters,
reduction_level=reduction_level).run(pool)
tavazoie_benchmark.perform_goea()
tavazoie_benchmark.generate_report()
pool.close()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
biclustlib-0.0.11.tar.gz
(18.7 MB
view details)
Built Distribution
File details
Details for the file biclustlib-0.0.11.tar.gz
.
File metadata
- Download URL: biclustlib-0.0.11.tar.gz
- Upload date:
- Size: 18.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8aefd1dc74c9c3b0e506a8d4d603729a437efa765526a5159bf3601235c4c43 |
|
MD5 | b4b26498caff5b83632130b2bd44dfb2 |
|
BLAKE2b-256 | b8629868da65689a5b6c5bf5753a3735d83bf48733def3e29954fdb583bb4898 |
File details
Details for the file biclustlib-0.0.11-py3-none-any.whl
.
File metadata
- Download URL: biclustlib-0.0.11-py3-none-any.whl
- Upload date:
- Size: 19.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 604f3bc65cbc4879e12a58cc0ac771b33d52bb4fb6890e4aad53394604c3b82c |
|
MD5 | 6285750c9f7ea999a7fae65ee926c438 |
|
BLAKE2b-256 | 8e214dbf2bfbd86e86e82a4c9c3009ad22acfb14332407151b557c76bb7e8aaf |