Skip to main content

A biclustering library with datasets, evaluation measures and a benchmarking framework

Project description

biclustlib

The package is an extension of biclustlib Python library by Victor Alexandre Padilha.
It is highly recommended to see the original repository first.
The goal of this package is to create a unified biclustering framework for performing research on gene expression data and comparing different biclustering algorithms and measures.

Distributed under GPLv3 license.

Installation

pip install biclustlib
You must also install R and the following R packages:

Benchmarking example

import multiprocessing

import pandas as pd
from sklearn.preprocessing import KBinsDiscretizer

from biclustlib.algorithms import *
from biclustlib.algorithms.wrappers import *
from biclustlib.benchmark import GeneExpressionBenchmark, Algorithm
from biclustlib.benchmark.util import Converter
from biclustlib.benchmark.data import load_tavazoie, load_prelic


def discretize_data(raw_data: pd.DataFrame, n_bins: int = 2) -> pd.DataFrame:
    return pd.DataFrame(KBinsDiscretizer(n_bins, encode='ordinal', strategy='kmeans').fit_transform(raw_data),
                        index=data.index).astype(int if n_bins > 2 else bool)


if __name__ == '__main__':
    pool = multiprocessing.Pool()

    data = load_tavazoie()
    n_biclusters = 5
    discretion_level = 30
    reduction_level = 15
    significance_cutoff = .05

    data_dis = discretize_data(data, discretion_level)
    data_bin = discretize_data(data)

    setup = [
        Algorithm('CCA', ChengChurchAlgorithm(n_biclusters), data),
        Algorithm('xMotifs', RConservedGeneExpressionMotifs(n_biclusters), data_dis),
        Algorithm('BiBit', BitPatternBiclusteringAlgorithm(), data_bin),
        Algorithm('Bimax', RBinaryInclusionMaximalBiclusteringAlgorithm(n_biclusters), data_bin),
        Algorithm('LAS', LargeAverageSubmatrices(n_biclusters), data),
        Algorithm('Plaid', RPlaid(n_biclusters), data),
        Algorithm('Spectral', Spectral(n_clusters=data.shape[1] // 2), data + 2),
        Algorithm('QUBIC', RConservedGeneExpressionMotifs(n_biclusters), data_bin),
    ]

    tavazoie_benchmark = GeneExpressionBenchmark(algorithms=setup,
                                                 raw_data=data,
                                                 n_biclusters=n_biclusters,
                                                 reduction_level=reduction_level).run(pool)
    tavazoie_benchmark.perform_goea()
    tavazoie_benchmark.generate_report()

    pool.close()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biclustlib-0.0.11.tar.gz (18.7 MB view details)

Uploaded Source

Built Distribution

biclustlib-0.0.11-py3-none-any.whl (19.3 MB view details)

Uploaded Python 3

File details

Details for the file biclustlib-0.0.11.tar.gz.

File metadata

  • Download URL: biclustlib-0.0.11.tar.gz
  • Upload date:
  • Size: 18.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for biclustlib-0.0.11.tar.gz
Algorithm Hash digest
SHA256 a8aefd1dc74c9c3b0e506a8d4d603729a437efa765526a5159bf3601235c4c43
MD5 b4b26498caff5b83632130b2bd44dfb2
BLAKE2b-256 b8629868da65689a5b6c5bf5753a3735d83bf48733def3e29954fdb583bb4898

See more details on using hashes here.

File details

Details for the file biclustlib-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: biclustlib-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 19.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for biclustlib-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 604f3bc65cbc4879e12a58cc0ac771b33d52bb4fb6890e4aad53394604c3b82c
MD5 6285750c9f7ea999a7fae65ee926c438
BLAKE2b-256 8e214dbf2bfbd86e86e82a4c9c3009ad22acfb14332407151b557c76bb7e8aaf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page