Skip to main content

Divisive iK-means algorithm implementation

Project description

CodeFactor Maintainability Documentation Status

divik

Python implementation of Divisive iK-means (DiviK) algorithm.

Tools within this package

  • Clustering at your command line with fit-clusters
  • Set of algorithm implementations for unsupervised analyses
    • Clustering
      • DiviK - hands-free clustering method with built-in feature selection
      • K-Means with Dunn method for selecting the number of clusters
      • K-Means with GAP index for selecting the number of clusters
      • Modular K-Means implementation with custom distance metrics and initializations
    • Feature extraction
      • PCA with knee-based components selection
      • Locally Adjusted RBF Spectral Embedding
    • Feature selection
      • EXIMS
      • Gaussian Mixture Model based data-driven feature selection
        • High Abundance And Variance Selector - allows you to select highly variant features above noise level, based on GMM-decomposition
      • Outlier based Selector
        • Outlier Abundance And Variance Selector - allows you to select highly variant features above noise level, based on outlier detection
      • Percentage based Selector - allows you to select highly variant features above noise level with your predefined thresholds for each
    • Sampling
      • StratifiedSampler - generates samples of fixed number of rows from given dataset
      • UniformPCASampler - generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data
      • UniformSampler - generates samples of random observations within boundaries of an original dataset

Installation

Docker

The recommended way to use this software is through Docker. This is the most convenient way, if you want to use divik application.

To install latest stable version use:

docker pull gmrukwa/divik

Python package

Prerequisites for installation of base package:

  • Python 3.7 / 3.8 / 3.9
  • compiler capable of compiling the native C code and OpenMP support

Installation of OpenMP for Ubuntu / Debian

You should have it already installed with GCC compiler, but if somehow not, try the following:

sudo apt-get install libgomp1

Installation of OpenMP for Mac

OpenMP is available as part of LLVM. You may need to install it with conda:

conda install -c conda-forge "compilers>=1.0.4,!=1.1.0" llvm-openmp

Installation of dependencied on Mac

You may see messages that some dependencies are invalid for the platform. It is a known bug, with a workaround.

Use:

SYSTEM_VERSION_COMPAT=0 pip install divik

DiviK Installation

Having prerequisites installed, one can install latest base version of the package:

pip install divik

If you want to have compatibility with gin-config, you can install necessary extras with:

pip install divik[gin]

Note: Remember about \ before [ and ] in zsh shell.

You can install all extras with:

pip install divik[all]

High-Volume Data Considerations

If you are using DiviK to run the analysis that could fail to fit RAM of your computer, consider disabling the default parallelism and switch to dask. It's easy to achieve through configuration:

  • set all parameters named n_jobs to 1;
  • set all parameters named allow_dask to True.

Note: Never set n_jobs>1 and allow_dask=True at the same time, the computations will freeze due to how multiprocessing and dask handle parallelism.

Known Issues

Segmentation Fault

It can happen if the he gamred_native package (part of divik package) was compiled with different numpy ABI than scikit-learn. This could happen if you used different set of compilers than the developers of the scikit-learn package.

In such a case, a handler is defined to display the stack trace. If the trace comes from _matlab_legacy.py, the most probably this is the issue.

To resolve the issue, consider following the installation instructions once again. The exact versions get updated to avoid the issue.

Contributing

Contribution guide will be developed soon.

Format the code with:

isort -m 3 --fgw 3 --tc .
black -t py36 .

References

This software is part of contribution made by Data Mining Group of Silesian University of Technology, rest of which is published here.

Project details


Release history Release notifications | RSS feed

This version

3.2.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

divik-3.2.4.tar.gz (83.2 kB view details)

Uploaded Source

Built Distributions

divik-3.2.4-cp39-cp39-win_amd64.whl (113.9 kB view details)

Uploaded CPython 3.9 Windows x86-64

divik-3.2.4-cp39-cp39-manylinux_2_35_x86_64.whl (165.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.35+ x86-64

divik-3.2.4-cp38-cp38-win_amd64.whl (113.9 kB view details)

Uploaded CPython 3.8 Windows x86-64

divik-3.2.4-cp38-cp38-manylinux_2_35_x86_64.whl (165.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.35+ x86-64

divik-3.2.4-cp37-cp37m-win_amd64.whl (113.9 kB view details)

Uploaded CPython 3.7m Windows x86-64

divik-3.2.4-cp37-cp37m-manylinux_2_35_x86_64.whl (165.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.35+ x86-64

File details

Details for the file divik-3.2.4.tar.gz.

File metadata

  • Download URL: divik-3.2.4.tar.gz
  • Upload date:
  • Size: 83.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.9

File hashes

Hashes for divik-3.2.4.tar.gz
Algorithm Hash digest
SHA256 62009c32c63c3cc15563c52c0cb6744e38a294fea8c529697cb0ff7ea5db44f3
MD5 1808823304afe459b24377748e886225
BLAKE2b-256 6e451a4e348b437fd15d46fd0e5eda2e879fd2551240e1f2931f582731b55639

See more details on using hashes here.

File details

Details for the file divik-3.2.4-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: divik-3.2.4-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 113.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.13 Windows/10

File hashes

Hashes for divik-3.2.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2d5134940a61b2723e199ea5e8c15b71fc3cb4397b92f94bbd94c38979f850d3
MD5 2832b4f18a0ce3114e109b9d87d8e198
BLAKE2b-256 58789c4afabccb1c636604fb6ee05a022e2a706aa9c470fb6a9a3ae8de3ad5bf

See more details on using hashes here.

File details

Details for the file divik-3.2.4-cp39-cp39-manylinux_2_35_x86_64.whl.

File metadata

  • Download URL: divik-3.2.4-cp39-cp39-manylinux_2_35_x86_64.whl
  • Upload date:
  • Size: 165.6 kB
  • Tags: CPython 3.9, manylinux: glibc 2.35+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/6.5.0-1022-azure

File hashes

Hashes for divik-3.2.4-cp39-cp39-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 a70483e2298f07281ae6e466e489dfa1d23c511df6bbeba4228b7a6de25ffd4d
MD5 99b187e79f74bd2cb6829a1fc5e2b782
BLAKE2b-256 0dff7145acf26536da0e1c52d27b2becf325ba84196fc95645feb87a4819be92

See more details on using hashes here.

File details

Details for the file divik-3.2.4-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: divik-3.2.4-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 113.9 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Windows/10

File hashes

Hashes for divik-3.2.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 13927ce0533479dec96170e30c338ab8fac58b8d979593065dd1be3736570b9a
MD5 30938916b848762e90bc3fc103b91083
BLAKE2b-256 b8e285d08e25bebb34e6c3445c9869e474665121e6d8f393d64b2add96cc741e

See more details on using hashes here.

File details

Details for the file divik-3.2.4-cp38-cp38-manylinux_2_35_x86_64.whl.

File metadata

  • Download URL: divik-3.2.4-cp38-cp38-manylinux_2_35_x86_64.whl
  • Upload date:
  • Size: 165.9 kB
  • Tags: CPython 3.8, manylinux: glibc 2.35+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.18 Linux/6.5.0-1022-azure

File hashes

Hashes for divik-3.2.4-cp38-cp38-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 972ab03cadcccf672c90b3fe86a9a12faf5f213d1435558df6eddc63877e3a65
MD5 4c5f17fb385006b9db18ed6cbd627a7e
BLAKE2b-256 f4e7384147cba13d28318e970fe2384d50b431a2eb55516cef858a8da680da27

See more details on using hashes here.

File details

Details for the file divik-3.2.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: divik-3.2.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 113.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.7.9 Windows/10

File hashes

Hashes for divik-3.2.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 21c563aa5d37f685ab1282177087fd282bba4fe41a5f7b588c3619ad78dbc8f3
MD5 d160032b7dfdd3994362e1c79642e44f
BLAKE2b-256 84e5b82d9df92f9c1500f280c08f523f504d94d6c02bb8c1f2e84299f5318a56

See more details on using hashes here.

File details

Details for the file divik-3.2.4-cp37-cp37m-manylinux_2_35_x86_64.whl.

File metadata

  • Download URL: divik-3.2.4-cp37-cp37m-manylinux_2_35_x86_64.whl
  • Upload date:
  • Size: 165.5 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.35+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.7.17 Linux/6.5.0-1022-azure

File hashes

Hashes for divik-3.2.4-cp37-cp37m-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 1a1f50ab2a5e03a5a99f9e5d69359b5cfa797d767d44f42ca0b82ac8b8e661aa
MD5 13baf1daf69fe0ef686331d6d337fda7
BLAKE2b-256 376422b7bc4bd3eaf3c78419951de1925b10a3b9a35821eb150d37fcaf5b6399

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page