Skip to main content

A package for automatic clustering hyperparameter optmization

Project description

Hypercluster

A package for clustering optimization with sklearn.

Requirements:

pandas
numpy
scipy
matplotlib
seaborn
scikit-learn
hdbscan

Optional: snakemake

Install

With pip:

pip install hypercluster

or with conda:

conda install hypercluster
# or
conda install -c conda-forge -c bioconda hypercluster

If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended. To check channel priority: conda config --get channels It should look like:

--add channels 'defaults'   # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge'   # highest priority

If it doesn't look like that, try:

conda config --add channels bioconda
conda config --add channels conda-forge

Docs

https://hypercluster.readthedocs.io/en/latest/index.html

It will also be useful to check out sklearn's page on clustering and evaluation metrics

Examples

https://github.com/liliblu/hypercluster/tree/dev/examples

Quickstart with SnakeMake

Default config.yml and hypercluster.smk are in the snakemake repo above.
Edit the config.yml file or arguments.

snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=. 

Example editing with python:

import yaml

with open('config.yml', 'r') as fh:
    config = yaml.load(fh)

input_data_prefix = 'test_data'
config['input_data_folder'] = os.path.abspath('.')
config['input_data_files'] = [input_data_prefix]
config['read_csv_kwargs'] = {input_data_prefix:{'index_col': [0]}}

with open('config.yml', 'w') as fh:
    yaml.dump(config, stream=fh)

Then call snakemake.

snakemake -s hypercluster.smk

Or submit the snakemake scheduler as an sbatch job e.g. with BigPurple Slurm:

module add slurm
sbatch snakemake_submit.sh

Examples for snakemake_submit.sh and cluster.json is in the scRNA-seq example.

Quickstart with python

import pandas as pd
from sklearn.datasets import make_blobs
import hypercluster

data, labels = make_blobs()
data = pd.DataFrame(data)
labels = pd.Series(labels, index=data.index, name='labels')

# With a single clustering algorithm
clusterer = hypercluster.AutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, 
  gold_standard = labels
  )

clusterer.visualize_evaluations()

# With a range of algorithms

clusterer = hypercluster.MultiAutoClusterer()
clusterer.fit(data).evaluate(
  methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, 
  gold_standard = labels
  )

clusterer.visualize_evaluations()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypercluster-0.1.13.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

hypercluster-0.1.13-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file hypercluster-0.1.13.tar.gz.

File metadata

  • Download URL: hypercluster-0.1.13.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for hypercluster-0.1.13.tar.gz
Algorithm Hash digest
SHA256 f1428887c3b922932c51304687ac300b84c57a8474bf2b5cdbaa0729aec1e305
MD5 0964a7751af13619c13d8d463a7ceced
BLAKE2b-256 34622baf4903ac02ed2053e362e559dcf2730852a6298fa7cf8869afca914425

See more details on using hashes here.

File details

Details for the file hypercluster-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: hypercluster-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for hypercluster-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 740a4e7fb1faa4af1a45824953774929b31adb575ac3c9281d36d9fdabdfdf83
MD5 5d3f3c85623ce055176ae5083fde7139
BLAKE2b-256 3257c8c886d43ffed2e15ebf54d6a2ea64293c79117107d4a3ace43182eb5475

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page