Skip to main content

A package for automatic clustering hyperparameter optmization

Project description

Hypercluster

A package for clustering optimization with sklearn.

Requirements:

pandas
numpy
scipy
matplotlib
seaborn
scikit-learn
hdbscan

Optional: snakemake

Install

With pip:

pip install hypercluster

or with conda:

conda install hypercluster
# or
conda install -c conda-forge -c bioconda hypercluster

If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended. To check channel priority: conda config --get channels It should look like:

--add channels 'defaults'   # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge'   # highest priority

If it doesn't look like that, try:

conda config --add channels bioconda
conda config --add channels conda-forge

Docs

https://hypercluster.readthedocs.io/en/latest/index.html

Examples

https://github.com/liliblu/hypercluster/tree/dev/examples

Quickstart with SnakeMake

Default config.yml and hypercluster.smk are in the snakemake repo above.
Edit the config.yml file or arguments.

snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=. 

Example editing with python:

import yaml

with open('config.yml', 'r') as fh:
    config = yaml.load(fh)

input_data_prefix = 'test_data'
config['input_data_folder'] = os.path.abspath('.')
config['input_data_files'] = [input_data_prefix]
config['read_csv_kwargs'] = {input_data_prefix:{'index_col': [0]}}

with open('config.yml', 'w') as fh:
    yaml.dump(config, stream=fh)

Then call snakemake.

snakemake -s hypercluster.smk

Or submit the snakemake scheduler as an sbatch job e.g. with BigPurple Slurm:

module add slurm
sbatch snakemake_submit.sh

Examples for snakemake_submit.sh and cluster.json is in the scRNA-seq example.

Quickstart with python

import pandas as pd
from sklearn.datasets import make_blobs
import hypercluster

data, labels = make_blobs()
data = pd.DataFrame(data)
labels = pd.Series(labels, index=data.index, name='labels')

# With a single clustering algorithm
clusterer = hypercluster.AutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, 
  gold_standard = labels
  )

clusterer.visualize_evaluations()

# With a range of algorithms

clusterer = hypercluster.MultiAutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, 
  gold_standard = labels
  )

clusterer.visualize_evaluations()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypercluster-0.1.5.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

hypercluster-0.1.5-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file hypercluster-0.1.5.tar.gz.

File metadata

  • Download URL: hypercluster-0.1.5.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for hypercluster-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7f169d49320e6c7f60507550b8e68212fac5fb396c2bc59d4b11e75e2bc2367f
MD5 80d7b8eb89aee2e0e969f6c5347b9ca2
BLAKE2b-256 88eebe8989a06b4c747657d27541a00bdfc6961703cc42448fba1d5e33f57e47

See more details on using hashes here.

File details

Details for the file hypercluster-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: hypercluster-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for hypercluster-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ec4a8da9fb3120f92347dcfdc11b6cb7d1ff6adf1c07ed9b501045c4ace99744
MD5 5b69d90d9ed3acd9ecd1ed8d7b8e0e84
BLAKE2b-256 19b32506a1073880ec941c3476647213504d823eaea1e2576133aa8ae770bdee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page