Divisive iK-means algorithm implementation
Project description
divik
Python implementation of Divisive iK-means (DiviK) algorithm.
Tools within this package
- Clustering at your command line with fit-clusters
- Set of algorithm implementations for unsupervised analyses
- Clustering
- DiviK - hands-free clustering method with built-in feature selection
- K-Means with Dunn method for selecting the number of clusters
- K-Means with GAP index for selecting the number of clusters
- Modular K-Means implementation with custom distance metrics and initializations
- Feature extraction
- PCA with knee-based components selection
- Locally Adjusted RBF Spectral Embedding
- Feature selection
- EXIMS
- Gaussian Mixture Model based data-driven feature selection
- High Abundance And Variance Selector - allows you to select highly variant features above noise level, based on GMM-decomposition
- Outlier based Selector
- Outlier Abundance And Variance Selector - allows you to select highly variant features above noise level, based on outlier detection
- Percentage based Selector - allows you to select highly variant features above noise level with your predefined thresholds for each
- Sampling
- StratifiedSampler - generates samples of fixed number of rows from given dataset
- UniformPCASampler - generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data
- UniformSampler - generates samples of random observations within boundaries of an original dataset
- Clustering
Installation
Docker
The recommended way to use this software is through
Docker. This is the most convenient way, if you want
to use divik
application.
To install latest stable version use:
docker pull gmrukwa/divik
Python package
Prerequisites for installation of base package:
- Python 3.7 / 3.8 / 3.9
- compiler capable of compiling the native C code and OpenMP support
Installation of OpenMP for Ubuntu / Debian
You should have it already installed with GCC compiler, but if somehow not, try the following:
sudo apt-get install libgomp1
Installation of OpenMP for Mac
OpenMP is available as part of LLVM. You may need to install it with conda:
conda install -c conda-forge "compilers>=1.0.4,!=1.1.0" llvm-openmp
Installation of dependencied on Mac
You may see messages that some dependencies are invalid for the platform. It is a known bug, with a workaround.
Use:
SYSTEM_VERSION_COMPAT=0 pip install divik
DiviK Installation
Having prerequisites installed, one can install latest base version of the package:
pip install divik
If you want to have compatibility with
gin-config
, you can install
necessary extras with:
pip install divik[gin]
Note: Remember about \
before [
and ]
in zsh
shell.
You can install all extras with:
pip install divik[all]
High-Volume Data Considerations
If you are using DiviK to run the analysis that could fail to fit RAM of your computer, consider disabling the default parallelism and switch to dask. It's easy to achieve through configuration:
- set all parameters named
n_jobs
to1
; - set all parameters named
allow_dask
toTrue
.
Note: Never set n_jobs>1
and allow_dask=True
at the same time, the
computations will freeze due to how multiprocessing
and dask
handle
parallelism.
Known Issues
Segmentation Fault
It can happen if the he gamred_native
package (part of divik
package) was
compiled with different numpy ABI than scikit-learn. This could happen if you
used different set of compilers than the developers of the scikit-learn
package.
In such a case, a handler is defined to display the stack trace. If the trace
comes from _matlab_legacy.py
, the most probably this is the issue.
To resolve the issue, consider following the installation instructions once again. The exact versions get updated to avoid the issue.
Contributing
Contribution guide will be developed soon.
Format the code with:
isort -m 3 --fgw 3 --tc .
black -t py36 .
References
This software is part of contribution made by Data Mining Group of Silesian University of Technology, rest of which is published here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file divik-3.2.4.tar.gz
.
File metadata
- Download URL: divik-3.2.4.tar.gz
- Upload date:
- Size: 83.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62009c32c63c3cc15563c52c0cb6744e38a294fea8c529697cb0ff7ea5db44f3 |
|
MD5 | 1808823304afe459b24377748e886225 |
|
BLAKE2b-256 | 6e451a4e348b437fd15d46fd0e5eda2e879fd2551240e1f2931f582731b55639 |
File details
Details for the file divik-3.2.4-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: divik-3.2.4-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 113.9 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.13 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d5134940a61b2723e199ea5e8c15b71fc3cb4397b92f94bbd94c38979f850d3 |
|
MD5 | 2832b4f18a0ce3114e109b9d87d8e198 |
|
BLAKE2b-256 | 58789c4afabccb1c636604fb6ee05a022e2a706aa9c470fb6a9a3ae8de3ad5bf |
File details
Details for the file divik-3.2.4-cp39-cp39-manylinux_2_35_x86_64.whl
.
File metadata
- Download URL: divik-3.2.4-cp39-cp39-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 165.6 kB
- Tags: CPython 3.9, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a70483e2298f07281ae6e466e489dfa1d23c511df6bbeba4228b7a6de25ffd4d |
|
MD5 | 99b187e79f74bd2cb6829a1fc5e2b782 |
|
BLAKE2b-256 | 0dff7145acf26536da0e1c52d27b2becf325ba84196fc95645feb87a4819be92 |
File details
Details for the file divik-3.2.4-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: divik-3.2.4-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 113.9 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.10 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13927ce0533479dec96170e30c338ab8fac58b8d979593065dd1be3736570b9a |
|
MD5 | 30938916b848762e90bc3fc103b91083 |
|
BLAKE2b-256 | b8e285d08e25bebb34e6c3445c9869e474665121e6d8f393d64b2add96cc741e |
File details
Details for the file divik-3.2.4-cp38-cp38-manylinux_2_35_x86_64.whl
.
File metadata
- Download URL: divik-3.2.4-cp38-cp38-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 165.9 kB
- Tags: CPython 3.8, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.18 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 972ab03cadcccf672c90b3fe86a9a12faf5f213d1435558df6eddc63877e3a65 |
|
MD5 | 4c5f17fb385006b9db18ed6cbd627a7e |
|
BLAKE2b-256 | f4e7384147cba13d28318e970fe2384d50b431a2eb55516cef858a8da680da27 |
File details
Details for the file divik-3.2.4-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: divik-3.2.4-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 113.9 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.7.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21c563aa5d37f685ab1282177087fd282bba4fe41a5f7b588c3619ad78dbc8f3 |
|
MD5 | d160032b7dfdd3994362e1c79642e44f |
|
BLAKE2b-256 | 84e5b82d9df92f9c1500f280c08f523f504d94d6c02bb8c1f2e84299f5318a56 |
File details
Details for the file divik-3.2.4-cp37-cp37m-manylinux_2_35_x86_64.whl
.
File metadata
- Download URL: divik-3.2.4-cp37-cp37m-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 165.5 kB
- Tags: CPython 3.7m, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.7.17 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a1f50ab2a5e03a5a99f9e5d69359b5cfa797d767d44f42ca0b82ac8b8e661aa |
|
MD5 | 13baf1daf69fe0ef686331d6d337fda7 |
|
BLAKE2b-256 | 376422b7bc4bd3eaf3c78419951de1925b10a3b9a35821eb150d37fcaf5b6399 |