Skip to main content

Loaders and savers for different implentations of word embedding.

Project description

Loaders and savers for different implentations of word embedding. The motivation of this project is that it is cumbersome to write loaders for different pretrained word embedding files. This project provides a simple interface for loading pretrained word embedding files in different formats.

from word_embedding_loader import WordEmbedding

# it will automatically determine format from content
wv = WordEmbedding.load('path/to/embedding.bin')

# This project provides minimum interface for word embedding
print wv.vectors[wv.vocab[u'is']]

# Modify and save word embedding file with arbitrary format
wv.save('path/to/save.txt', 'word2vec', binary=False)

This project currently supports following formats:

  • GloVe, Global Vectors for Word Representation, by Jeffrey Pennington, Richard Socher, Christopher D. Manning from Stanford NLP group.

  • word2vec, by Mikolov.
    • text (create with -binary 0 option (the default))

    • binary (create with -binary 1 option)

  • gensim ‘s models.word2vec module (coming)

  • original HDFS format: a performance centric option for loading and saving word embedding (coming)

Sometimes, you want combine an external program with word embedding file of your own choice. This project also provides a simple executable to convert a word embedding format to another.

# it will automatically determine the format from the content
word-embedding-loader convert -t glove test/word_embedding_loader/word2vec.bin test.bin

# Get help for command/subcommand
word-embedding-loader --help
word-embedding-loader convert --help

Development

This project us Cython to build some modules, so you need Cython for development.

`bash pip install -r requirements.txt `

If environment variable DEVELOP_WE is set, it will try to rebuild .pyx modules.

`bash DEVELOP_WE=1 python setup.py test `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

WordEmbeddingLoader-0.1.0.tar.gz (117.0 kB view details)

Uploaded Source

File details

Details for the file WordEmbeddingLoader-0.1.0.tar.gz.

File metadata

File hashes

Hashes for WordEmbeddingLoader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5248744a8e862117d2973857dbf74d934b3700ad61f7eeb452b898c0dc6b0dc5
MD5 71992d4d087a03ace07f112c265922c9
BLAKE2b-256 945ba480e365839bc20181c32e278159b3dffb948e46dcda7a56722a45121acc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page