Skip to main content

Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, ...) and preprocessing transformers, compatible with scikit-learn Pipelines.

Project description

Python36 TravisBuild Coveralls

Zeugma

📝 Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, …) and preprocessing transformers, compatible with scikit-learn Pipelines. 🛠

Installation

Install package with pip install zeugma.

Examples

Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.

Pretrained embeddings

As an illustrative example the cosine similarity of the sentences what is zeugma and a figure of speech is computed using the GloVe pretrained embeddings.:

>>> from zeugma.embeddings import EmbeddingTransformer
>>> glove = EmbeddingTransformer('glove')
>>> embeddings = glove.transform(['what is zeugma', 'a figure of speech'])
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> cosine_similarity(embeddings)[0, 1]
0.8721696

Training embeddings

To train your own Word2Vec embeddings use the Gensim sklearn API.

Fine-tuning embeddings

Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeugma-0.43.tar.gz (6.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page