Skip to main content

An implementation of hierarchical minibatch kmeans

Project description

Hierarchical Minibatch Kmeans

An implementation of hierarchical kmeans that uses mini-batches for increased efficiency for large datasets.

Install

pip3 install hkmeans-minibatch

Usage

$ python3 -m hkmeans_minibatch -h
usage: __main__.py [-h] -r ROOT_FEATURE_PATH -p FEATURES_PREFIX [-b BATCH_SIZE] -s SAVE_DIR -c CENTROID_DIR -hr HIERARCHIES -k CLUSTERS [-e EPOCHS]
optional arguments:
  -h, --help            show this help message and exit
  -r ROOT_FEATURE_PATH, --root-feature_path ROOT_FEATURE_PATH
                        path to folder containing all the feature files
  -p FEATURES_PREFIX, --features-prefix FEATURES_PREFIX
                        prefix that contains the desired files to read
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        batch_size to use for the minibatch kmeans
  -s SAVE_DIR, --save-dir SAVE_DIR
                        save directory for sorted hierarchical kmeans vectors
  -c CENTROID_DIR, --centroid-dir CENTROID_DIR
                        directory to save the centroids in
  -hr HIERARCHIES, --hierarchies HIERARCHIES
                        number of hierarchies to run the kmeans on
  -k CLUSTERS, --clusters CLUSTERS
                        number of clusters for each part of the hierarchy
  -e EPOCHS, --epochs EPOCHS
                        number of epochs to run the kmeans for each hierarchy

Have the .npy files all in one root feature directory to do kmeans over (they can be in subdirectories). For optimal results have the batch size be larger than the number of vectors in each .npy file. The features prefix is the common prefix of the .npy files to kmeans over. The save directory should be an empty directory, which the program will fill with sorted vectors and delete after it is finished. The centroid directory should be an empty directory where all the centroids will be stored. Note that the centroids will be stored in separate files in the centroid directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hkmeans_minibatch-1.0.2.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hkmeans_minibatch-1.0.2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file hkmeans_minibatch-1.0.2.tar.gz.

File metadata

  • Download URL: hkmeans_minibatch-1.0.2.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/28.8.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2

File hashes

Hashes for hkmeans_minibatch-1.0.2.tar.gz
Algorithm Hash digest
SHA256 fa8bad2361fd2932e733af1562667064d181b3e98c6335fb6d4e5119d11b7750
MD5 7fd4609d75aa7e1eeea98952f50ce9fa
BLAKE2b-256 ab025f0ade2bb71ab49881c14b5716aa065bf3bbf0752939eea12b93fb7d8b76

See more details on using hashes here.

File details

Details for the file hkmeans_minibatch-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: hkmeans_minibatch-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/28.8.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2

File hashes

Hashes for hkmeans_minibatch-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e2b51f1b7529f8b5b3f65e1d976777b386800180b01482266b7590c416d0ceec
MD5 617d79b49ebb3dca65bc492b913c1a19
BLAKE2b-256 2f416c0b849f6b37405d06c359f67f0884e36a3887ff2a55b05d6d92fe5f0f70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page