Skip to main content

Unified framework for word embeddings (Word2Vec, GloVe, FastText, ...) compatible with scikit-learn Pipeline

Project description

Unified framework for word embeddings (Word2Vec, GloVe, FastText, …) use in machine learning pipelines, compatible with scikit-learn Pipelines.

Installation

Install package with pip install Cython && pip install zeugma (Cython is required by the fastText package, on which zeugma is dependent).

Examples

Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.

Pretrained downloaded embeddings

As an illustrative example the cosine similarity of the sentences zeugma and figure of speech is computed using the GloVeTransformer with downloaded embeddings (default URL is used here):

>>> from zeugma.embeddings import GloVeEmbeddings
>>> GloVeTransformer.download_embeddings()
>>> glove = GloVeTransformer(model_path)
>>> embeddings = GloVe.transform(['zeugma', 'figure of speech'])
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> cosine_similarity(embeddings)[0, 1]
0.32840478

Training embeddings

Zeugma can also be used to compute the embeddings on your own corpus (composed of only two sentences here):

>>> from zeugma.embeddings import Word2VecTransformer
>>> w2v = Word2VecTransformer(trainable=True)
>>> embeddings = w2v.fit_transform(['zeugma', 'figure of speech'])
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> cosine_similarity(embeddings)[0, 1]
-0.028218582

Fine-tuning embeddings

Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeugma-0.41.tar.gz (8.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page