An implementation of hierarchical minibatch kmeans
Project description
Hierarchical Minibatch Kmeans
An implementation of hierarchical kmeans that uses mini-batches for increased efficiency for large datasets.
Install
pip3 install hkmeans-minibatch
Usage
$ python3 -m hkmeans_minibatch -h
usage: __main__.py [-h] -r ROOT_FEATURE_PATH -p FEATURES_PREFIX [-b BATCH_SIZE] -s SAVE_DIR -c CENTROID_DIR -hr HIERARCHIES -k CLUSTERS [-e EPOCHS]
optional arguments:
-h, --help show this help message and exit
-r ROOT_FEATURE_PATH, --root-feature_path ROOT_FEATURE_PATH
path to folder containing all the feature files
-p FEATURES_PREFIX, --features-prefix FEATURES_PREFIX
prefix that contains the desired files to read
-b BATCH_SIZE, --batch-size BATCH_SIZE
batch_size to use for the minibatch kmeans
-s SAVE_DIR, --save-dir SAVE_DIR
save directory for sorted hierarchical kmeans vectors
-c CENTROID_DIR, --centroid-dir CENTROID_DIR
directory to save the centroids in
-hr HIERARCHIES, --hierarchies HIERARCHIES
number of hierarchies to run the kmeans on
-k CLUSTERS, --clusters CLUSTERS
number of clusters for each part of the hierarchy
-e EPOCHS, --epochs EPOCHS
number of epochs to run the kmeans for each hierarchy
Have the .npy files all in one root feature directory to do kmeans over (they can be in subdirectories). For optimal results have the batch size be larger than the number of vectors in each .npy file. The features prefix is the common prefix of the .npy files to kmeans over. The save directory should be an empty directory, which the program will fill with sorted vectors and delete after it is finished. The centroid directory should be an empty directory where all the centroids will be stored. Note that the centroids will be stored in separate files in the centroid directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hkmeans_minibatch-1.0.2.tar.gz.
File metadata
- Download URL: hkmeans_minibatch-1.0.2.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/28.8.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa8bad2361fd2932e733af1562667064d181b3e98c6335fb6d4e5119d11b7750
|
|
| MD5 |
7fd4609d75aa7e1eeea98952f50ce9fa
|
|
| BLAKE2b-256 |
ab025f0ade2bb71ab49881c14b5716aa065bf3bbf0752939eea12b93fb7d8b76
|
File details
Details for the file hkmeans_minibatch-1.0.2-py3-none-any.whl.
File metadata
- Download URL: hkmeans_minibatch-1.0.2-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/28.8.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2b51f1b7529f8b5b3f65e1d976777b386800180b01482266b7590c416d0ceec
|
|
| MD5 |
617d79b49ebb3dca65bc492b913c1a19
|
|
| BLAKE2b-256 |
2f416c0b849f6b37405d06c359f67f0884e36a3887ff2a55b05d6d92fe5f0f70
|