Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, ...) and preprocessing transformers, compatible with scikit-learn Pipelines.
Project description
Zeugma
📝 Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, …) and preprocessing transformers, compatible with scikit-learn Pipelines. 🛠
Installation
Install package with pip install zeugma.
Examples
Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.
Pretrained embeddings
As an illustrative example the cosine similarity of the sentences what is zeugma and a figure of speech is computed using the GloVe pretrained embeddings.:
>>> from zeugma.embeddings import EmbeddingTransformer >>> glove = EmbeddingTransformer('glove') >>> embeddings = glove.transform(['what is zeugma', 'a figure of speech']) >>> from sklearn.metrics.pairwise import cosine_similarity >>> cosine_similarity(embeddings)[0, 1] 0.8721696
Training embeddings
To train your own Word2Vec embeddings use the Gensim sklearn API.
Fine-tuning embeddings
Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.