PyTorch Extension Library of Optimized Scatter Operations
Project description
Flexi Hash Embeddings
This PyTorch Module hashes and sums variably-sized dictionaries of features into a single fixed-size embedding. Feature keys are hashed, which is ideal for streaming contexts and online-learning such that we don't have to memorize a mapping between feature keys and indices.
So for example:
>>> X = [{'dog': 1, 'cat':2, 'elephant':4},
{'dog': 2, 'run': 5}]
>>> from flexi_hash_embedding import FlexiHashEmbedding
>>> embed = FlexiHashEmbedding(dim=5)
>>> embed(X)
tensor([[ 2.5842e+00, 1.9553e+01, 1.0246e+00, 2.2797e+01, 1.7812e+01],
[-6.2967e+00, 1.4947e+01, -2.6539e+01, -1.4348e+01, -6.7396e-01]])
Speed
A large batchsize of 4096 with on average 5 features per row equates to about 20,000 total features. This module will hash that many features in about 20ms on a modern MacBook Pro.
Installation
Install from PyPi do pip install flexi_hash_embedding
Install locally by doing git@github.com:cemoody/flexi_hash_embedding.git
.
Testing
>>> pip install -e .
>>> py.test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for flexi_hash_embedding-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | be123e20e3caf229898c4f9c8aeb46a146a669a8190d4523986f52a64f95bf5b |
|
MD5 | c84a1f943452c8d11e834dd6217bb3ab |
|
BLAKE2b-256 | 93335bc49df11ecf5d4482a364dfe9558e90c732171b538250ae14f33451e939 |
Hashes for flexi_hash_embedding-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9138e7fc08b509609593249101b40f385dbe150d397536602001a1c712270d8 |
|
MD5 | c412b02e2cb2e272bc116b5715a34277 |
|
BLAKE2b-256 | ad1f9e834e798b0998c380186edd50a6bea7c61789ef712e47e1df1578ca4be6 |