Unified framework for word embeddings (Word2Vec, GloVe, FastText, ...) compatible with scikit-learn Pipeline
Project description
Zeugma
Unified framework for word embeddings (Word2Vec, GloVe, FastText, …) use in machine learning pipelines, compatible with scikit-learn Pipelines.
Installation
Install package with pip install zeugma.
Examples
Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.
Pretrained embeddings
As an illustrative example the cosine similarity of the sentences zeugma and figure of speech is computed using the GloVeTransformer with downloaded embeddings (default URL is used here):
>>> from zeugma.embeddings import EmbeddingTransformer >>> glove = EmbeddingTransformer('glove') >>> embeddings = glove.transform(['zeugma', 'figure of speech']) >>> from sklearn.metrics.pairwise import cosine_similarity >>> cosine_similarity(embeddings)[0, 1] 0.32840478
Training embeddings
To train your own Word2Vec embeddings use the Gensim sklearn API.
Fine-tuning embeddings
Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.