Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, ...) and preprocessing transformers, compatible with scikit-learn Pipelines.
Project description
Zeugma
📝 Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, …) and preprocessing transformers, compatible with scikit-learn Pipelines. 🛠
Installation
Install package with pip install zeugma.
Examples
Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.
Pretrained embeddings
As an illustrative example the cosine similarity of the sentences what is zeugma and a figure of speech is computed using the GloVe pretrained embeddings.:
>>> from zeugma.embeddings import EmbeddingTransformer >>> glove = EmbeddingTransformer('glove') >>> embeddings = glove.transform(['what is zeugma', 'a figure of speech']) >>> from sklearn.metrics.pairwise import cosine_similarity >>> cosine_similarity(embeddings)[0, 1] 0.8721696
Training embeddings
To train your own Word2Vec embeddings use the Gensim sklearn API.
Fine-tuning embeddings
Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.
Other examples
Usage examples are present in the examples folder.
Additional examples using Zeugma can be found in some posts of my blog.
Contribute
Feel free to fork this repo and submit a Pull Request.
Development
The development workflow for this repo is the following:
create a virtual environment: python -m venv venv && source venv/bin/activate
install required packages: pip install -r requirements.txt
install the pre-commit hooks: pre-commit install
run the test suite with: pytest from the root folder
Distribution via PyPI
To upload a new version to PyPI, simply:
tag your new version on git: git tag -a x.x -m "my tag message"
update the download_url field in the setup.py file
commit, push the code and the tag (git push origin x.x), and make a PR
once the updated code is present in master run python setup.py sdist bdist_wheel from the root of the package to distribute it.
Building documentation
To build the documentation locally simply run make html from the docs folder.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.