Skip to main content

PyTorch Extension Library of Optimized Scatter Operations

Project description

Flexi Hash Embeddings

This PyTorch Module hashes and sums variably-sized dictionaries of features into a single fixed-size embedding. Feature keys are hashed, which is ideal for streaming contexts and online-learning such that we don't have to memorize a mapping between feature keys and indices.

So for example:

>>> X = [{'dog': 1, 'cat':2, 'elephant':4},
         {'dog': 2, 'run': 5}]
>>> from flexi_hash_embedding import FlexiHashEmbedding
>>> embed = FlexiHashEmbedding(dim=5)
>>> embed(X)
tensor([[ 2.5842e+00,  1.9553e+01,  1.0246e+00,  2.2797e+01,  1.7812e+01],
        [-6.2967e+00,  1.4947e+01, -2.6539e+01, -1.4348e+01, -6.7396e-01]])

img

Speed

A large batchsize of 4096 with on average 5 features per row equates to about 20,000 total features. This module will hash that many features in about 20ms on a modern MacBook Pro.

Installation

Install from PyPi do pip install flexi_hash_embedding

Install locally by doing git@github.com:cemoody/flexi_hash_embedding.git.

Testing

>>> pip install -e .
>>> py.test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexi_hash_embedding-0.0.1.tar.gz (3.8 kB view hashes)

Uploaded Source

Built Distribution

flexi_hash_embedding-0.0.1-py3-none-any.whl (4.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page