A package for automatic clustering hyperparameter optmization
Project description
Hypercluster
A package for clustering optimization with sklearn.
Requirements:
pandas
numpy
scipy
matplotlib
seaborn
scikit-learn
hdbscan
Optional: snakemake
Install
With pip:
pip install hypercluster
or with conda:
conda install hypercluster
# or
conda install -c conda-forge -c bioconda hypercluster
If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended.
To check channel priority: conda config --get channels
It should look like:
--add channels 'defaults' # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge' # highest priority
If it doesn't look like that, try:
conda config --add channels bioconda
conda config --add channels conda-forge
Docs
https://hypercluster.readthedocs.io/en/latest/index.html
Examples
https://github.com/liliblu/hypercluster/tree/dev/examples
Quickstart with SnakeMake
Default config.yml
and hypercluster.smk
are in the snakemake repo above.
Edit the config.yml
file or arguments.
snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=.
Example editing with python:
import yaml
with open('config.yml', 'r') as fh:
config = yaml.load(fh)
input_data_prefix = 'test_data'
config['input_data_folder'] = os.path.abspath('.')
config['input_data_files'] = [input_data_prefix]
config['read_csv_kwargs'] = {input_data_prefix:{'index_col': [0]}}
with open('config.yml', 'w') as fh:
yaml.dump(config, stream=fh)
Then call snakemake.
snakemake -s hypercluster.smk
Or submit the snakemake scheduler as an sbatch job e.g. with BigPurple Slurm:
module add slurm
sbatch snakemake_submit.sh
Examples for snakemake_submit.sh
and cluster.json
is in the scRNA-seq example.
Quickstart with python
import pandas as pd
from sklearn.datasets import make_blobs
import hypercluster
data, labels = make_blobs()
data = pd.DataFrame(data)
labels = pd.Series(labels, index=data.index, name='labels')
# With a single clustering algorithm
clusterer = hypercluster.AutoClusterer()
clusterer.fit(data).evaluate(
methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics,
gold_standard = labels
)
clusterer.visualize_evaluations()
# With a range of algorithms
clusterer = hypercluster.MultiAutoClusterer()
clusterer.fit(data).evaluate(
methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics,
gold_standard = labels
)
clusterer.visualize_evaluations()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hypercluster-0.1.5.tar.gz
.
File metadata
- Download URL: hypercluster-0.1.5.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f169d49320e6c7f60507550b8e68212fac5fb396c2bc59d4b11e75e2bc2367f |
|
MD5 | 80d7b8eb89aee2e0e969f6c5347b9ca2 |
|
BLAKE2b-256 | 88eebe8989a06b4c747657d27541a00bdfc6961703cc42448fba1d5e33f57e47 |
File details
Details for the file hypercluster-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: hypercluster-0.1.5-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec4a8da9fb3120f92347dcfdc11b6cb7d1ff6adf1c07ed9b501045c4ace99744 |
|
MD5 | 5b69d90d9ed3acd9ecd1ed8d7b8e0e84 |
|
BLAKE2b-256 | 19b32506a1073880ec941c3476647213504d823eaea1e2576133aa8ae770bdee |