Skip to main content

Divisive iK-means algorithm implementation

Project description

CodeFactor BCH compliance Maintainability Documentation Status

divik

Python implementation of Divisive iK-means (DiviK) algorithm.

Tools within this package

  • Clustering at your command line with fit-clusters
  • Set of algorithm implementations for unsupervised analyses
    • Clustering
      • DiviK - hands-free clustering method with built-in feature selection
      • K-Means with Dunn method for selecting the number of clusters
      • K-Means with GAP index for selecting the number of clusters
      • Modular K-Means implementation with custom distance metrics and initializations
    • Feature extraction
      • PCA with knee-based components selection
      • Locally Adjusted RBF Spectral Embedding
    • Feature selection
      • EXIMS
      • Gaussian Mixture Model based data-driven feature selection
        • High Abundance And Variance Selector - allows you to select highly variant features above noise level, based on GMM-decomposition
      • Outlier based Selector
        • Outlier Abundance And Variance Selector - allows you to select highly variant features above noise level, based on outlier detection
      • Percentage based Selector - allows you to select highly variant features above noise level with your predefined thresholds for each
    • Sampling
      • StratifiedSampler - generates samples of fixed number of rows from given dataset
      • UniformPCASampler - generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data
      • UniformSampler - generates samples of random observations within boundaries of an original dataset

Installation

Docker

The recommended way to use this software is through Docker. This is the most convenient way, if you want to use divik application.

To install latest stable version use:

docker pull gmrukwa/divik

Python package

Prerequisites for installation of base package:

  • Python 3.7 / 3.8 / 3.9
  • compiler capable of compiling the native C code and OpenMP support

Installation of OpenMP for Ubuntu / Debian

You should have it already installed with GCC compiler, but if somehow not, try the following:

sudo apt-get install libgomp1

Installation of OpenMP for Mac

OpenMP is available as part of LLVM. You may need to install it with conda:

conda install -c conda-forge "compilers>=1.0.4,!=1.1.0" llvm-openmp

DiviK Installation

Having prerequisites installed, one can install latest base version of the package:

pip install divik

If you want to have compatibility with gin-config, you can install necessary extras with:

pip install divik[gin]

Note: Remember about \ before [ and ] in zsh shell.

You can install all extras with:

pip install divik[all]

High-Volume Data Considerations

If you are using DiviK to run the analysis that could fail to fit RAM of your computer, consider disabling the default parallelism and switch to dask. It's easy to achieve through configuration:

  • set all parameters named n_jobs to 1;
  • set all parameters named allow_dask to True.

Note: Never set n_jobs>1 and allow_dask=True at the same time, the computations will freeze due to how multiprocessing and dask handle parallelism.

Known Issues

Segmentation Fault

It can happen if the he gamred_native package (part of divik package) was compiled with different numpy ABI than scikit-learn. This could happen if you used different set of compilers than the developers of the scikit-learn package.

In such a case, a handler is defined to display the stack trace. If the trace comes from _matlab_legacy.py, the most probably this is the issue.

To resolve the issue, consider following the installation instructions once again. The exact versions get updated to avoid the issue.

Contributing

Contribution guide will be developed soon.

Format the code with:

isort -m 3 --fgw 3 --tc .
black -t py36 .

References

This software is part of contribution made by Data Mining Group of Silesian University of Technology, rest of which is published here.

Project details


Release history Release notifications | RSS feed

This version

3.2.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

divik-3.2.1.tar.gz (82.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

divik-3.2.1-cp39-cp39-win_amd64.whl (181.6 kB view details)

Uploaded CPython 3.9Windows x86-64

divik-3.2.1-cp39-cp39-manylinux2014_x86_64.whl (168.9 kB view details)

Uploaded CPython 3.9

divik-3.2.1-cp39-cp39-manylinux2014_aarch64.whl (159.6 kB view details)

Uploaded CPython 3.9

divik-3.2.1-cp39-cp39-macosx_10_16_x86_64.whl (107.8 kB view details)

Uploaded CPython 3.9macOS 10.16+ x86-64

divik-3.2.1-cp38-cp38-win_amd64.whl (181.6 kB view details)

Uploaded CPython 3.8Windows x86-64

divik-3.2.1-cp38-cp38-manylinux2014_x86_64.whl (169.4 kB view details)

Uploaded CPython 3.8

divik-3.2.1-cp38-cp38-manylinux2014_aarch64.whl (159.7 kB view details)

Uploaded CPython 3.8

divik-3.2.1-cp38-cp38-macosx_10_16_x86_64.whl (107.7 kB view details)

Uploaded CPython 3.8macOS 10.16+ x86-64

divik-3.2.1-cp37-cp37m-win_amd64.whl (181.6 kB view details)

Uploaded CPython 3.7mWindows x86-64

divik-3.2.1-cp37-cp37m-manylinux2014_x86_64.whl (170.1 kB view details)

Uploaded CPython 3.7m

divik-3.2.1-cp37-cp37m-manylinux2014_aarch64.whl (159.4 kB view details)

Uploaded CPython 3.7m

divik-3.2.1-cp37-cp37m-macosx_10_16_x86_64.whl (107.7 kB view details)

Uploaded CPython 3.7mmacOS 10.16+ x86-64

File details

Details for the file divik-3.2.1.tar.gz.

File metadata

  • Download URL: divik-3.2.1.tar.gz
  • Upload date:
  • Size: 82.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for divik-3.2.1.tar.gz
Algorithm Hash digest
SHA256 300fdddf79bcc99bc45f5a7ff930a166cc264b56189fecc584802549cf8076c0
MD5 9d1df85f82ca455bedc295da11d1c061
BLAKE2b-256 878fa6f80a3d2bd2104412d551704ff33c51a395d12060a93bbeea7c8ad23f12

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: divik-3.2.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 181.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.9 Windows/10

File hashes

Hashes for divik-3.2.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 80ee2ead880bd98ff4aa1869b40db0e734cd1450f524ceb8a27fa4bf02df6ed9
MD5 bdc04474a1d9742a30c01664ae2c3e32
BLAKE2b-256 6f1f25d478ececef3d079f3f538292ec669f01c3509fbd9cbef5c2a54fd5fee9

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for divik-3.2.1-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d32e4dcc81e49fcc0acdd3f9f567b2077a5d4b8843bb53a7b5c4f103fd15adc
MD5 667e3a88950f643d633aa66a9b93c8f6
BLAKE2b-256 200bd139750f84780cb4077b18a6795dd6726f3ab07f5cc0cfc278c423cb6db1

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp39-cp39-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for divik-3.2.1-cp39-cp39-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 36046b33062c0ea56d460baee4e0054edc56972dd0f75f8673ba248286cd3a12
MD5 2edaa6b8f83c43f1abb00819127b7924
BLAKE2b-256 d12beb9b70b069ac82a5bf1e8d3355d2ca054cefddad50294f881ce4857b54c7

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp39-cp39-macosx_10_16_x86_64.whl.

File metadata

  • Download URL: divik-3.2.1-cp39-cp39-macosx_10_16_x86_64.whl
  • Upload date:
  • Size: 107.8 kB
  • Tags: CPython 3.9, macOS 10.16+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.9 Darwin/20.6.0

File hashes

Hashes for divik-3.2.1-cp39-cp39-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 916a549e01b7dd62daee01c24168bb4cbb96e08897ef712ad2e345735e44f55c
MD5 954c803d73250295ca137c31207cbc8c
BLAKE2b-256 8bb113014540245e88a68b3d71dbc208306c08299983d87791c516f16b1ff1c1

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: divik-3.2.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 181.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.10 Windows/10

File hashes

Hashes for divik-3.2.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d2b1f1bd0134621aad60cacb061474916cc0f164b61a792068aa28eee9ecd5fc
MD5 bb4fd1fdf38547e8f248819a9968089f
BLAKE2b-256 a768f213ea39a8c51f662a0eff1266c91a96582625f87183aee1c3be5a19ba12

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for divik-3.2.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8cdde1ba57fee66e3595af5c9a2733bd96c28944d724ca9372dae2b00ba72ef
MD5 66f5a258d45f3134cbcca6f4cfb03710
BLAKE2b-256 9783ccbff69c26d05b39c4fa473c0e0a91b72d8e95dbb0e5dd43c0554c49d41d

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp38-cp38-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for divik-3.2.1-cp38-cp38-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6a7f6d1d73f7c460f6cc6adfa7e240d2c4396db7fcb3d8bf85e1612f4b1f271b
MD5 12f9367a14190231ad229074a5f55025
BLAKE2b-256 21079f27f8fbc4ec10ea0bdaf901b58205b1ff2bd8c81103fd80a7f2a204cc59

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp38-cp38-macosx_10_16_x86_64.whl.

File metadata

  • Download URL: divik-3.2.1-cp38-cp38-macosx_10_16_x86_64.whl
  • Upload date:
  • Size: 107.7 kB
  • Tags: CPython 3.8, macOS 10.16+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.12 Darwin/20.6.0

File hashes

Hashes for divik-3.2.1-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 f5e41bb02c699b251e5eb3c4aef9a5a672454ce8a690bc2491d614fbf39adc83
MD5 f0247d0c02988523638640d8922861ee
BLAKE2b-256 43e0cb8fc0f6d6e08348c0993ce0d31c69c916cce3c1fca24c4e8fced6c7ef83

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: divik-3.2.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 181.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.7.9 Windows/10

File hashes

Hashes for divik-3.2.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b020dec7004db2e23bcbfd8595a05c45bffe27ff4c113162612db582f37461ee
MD5 0157995dadb1c40a6300e9b4feb54a0a
BLAKE2b-256 e0006a0757fd5bc869848938823bed96869d2594e6ffce328a5b283807b3f9cd

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for divik-3.2.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 163ddb4f7a063b46a3756746d4f5b863492fe1959e1e7d60249c6eeed03075af
MD5 51efcf72d3f043bcb308ab3749f5d738
BLAKE2b-256 62e214840a7b4c7c9dbfeee5d7a3f5ad0a71a92ee9c1b32c3a0d9f0cc2b68d40

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp37-cp37m-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for divik-3.2.1-cp37-cp37m-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b879b36aecb14522944c8318426508422da7fe28f3dd6dc7ec573c18a1dcf936
MD5 5e99789e89c09610dded9d278ed33c9f
BLAKE2b-256 2af69ee11e48464d2c2d06e5c6f4b2cc7f68675151db41fe8ac5eec46f9c6d06

See more details on using hashes here.

File details

Details for the file divik-3.2.1-cp37-cp37m-macosx_10_16_x86_64.whl.

File metadata

  • Download URL: divik-3.2.1-cp37-cp37m-macosx_10_16_x86_64.whl
  • Upload date:
  • Size: 107.7 kB
  • Tags: CPython 3.7m, macOS 10.16+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.7.12 Darwin/20.6.0

File hashes

Hashes for divik-3.2.1-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm Hash digest
SHA256 4a749905085dd1cc916100ef429fc396d838db8df57670a21af33cb6bf5b0021
MD5 91b124f52a5117f1b3ccc6737a89fece
BLAKE2b-256 f794ba77685298c927e256b76c9e97893aed22e1b1584269e44cee1b42d57f9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page