Skip to main content

A biologically inspired method to create sparse, binary word vectors

Project description

FlyVec

Flybrain-inspired Sparse Binary Word Embeddings

Code based on the ICLR 2021 paper Can a Fruit Fly Learn Word Embeddings?. A work in progress.

Install

pip install flyvec

How to use

model = FlyVec.from_config("../data/model_config.yaml") # TODO Load on first instantiation with default model
embed_info = model.get_sparse_embedding("market")
Loading Tokenizer...
No phraser specified. Proceeding without phrases
Loading synapses...

FlyVec uses a simple, word-based tokenizer with to isolate concepts. The provided model uses a tokenizer with about 40,000 words, all lower-cased, with special tokens for numbers (<NUM>) and unknown words (<UNK). See Tokenizer for details.

# Batch generate word embeddings
sentence = "Supreme Court dismissed the criminal charges."
tokens = model.tokenize(sentence)
embedding_info = [model.get_sparse_embedding(t) for t in tokens]
embeddings = np.array([e['embedding'] for e in embedding_info])
print("TOKENS: ", [e['token'] for e in embedding_info])
print("EMBEDDINGS: ", embeddings)
TOKENS:  ['supreme', 'court', 'dismissed', 'the', 'criminal', 'charges']
EMBEDDINGS:  [[0 1 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 1 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 1 0]
 [0 0 0 ... 0 1 0]]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flyvec-0.0.3.tar.gz (15.0 kB view hashes)

Uploaded Source

Built Distribution

flyvec-0.0.3-py3-none-any.whl (13.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page