Skip to main content

Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, ...) and preprocessing transformers, compatible with scikit-learn Pipelines.

Project description

TravisBuild Colab PythonVersions Coveralls ReadTheDocs LGTM Black

Zeugma

📝 Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, …) and preprocessing transformers, compatible with scikit-learn Pipelines. 🛠 Check the documentation for more information.

Installation

Install package with pip install zeugma.

Examples

Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.

Pretrained embeddings

As an illustrative example the cosine similarity of the sentences what is zeugma and a figure of speech is computed using the GloVe pretrained embeddings.:

>>> from zeugma.embeddings import EmbeddingTransformer
>>> glove = EmbeddingTransformer('glove')
>>> embeddings = glove.transform(['what is zeugma', 'a figure of speech'])
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> cosine_similarity(embeddings)[0, 1]
0.8721696

Training embeddings

To train your own Word2Vec embeddings use the Gensim sklearn API.

Fine-tuning embeddings

Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.

Other examples

Usage examples are present in the examples folder.

Additional examples using Zeugma can be found in some posts of my blog.

Contribute

Feel free to fork this repo and submit a Pull Request.

Development

The development workflow for this repo is the following:

  1. create a virtual environment: python -m venv venv && source venv/bin/activate

  2. install required packages: pip install -r requirements.txt

  3. install the pre-commit hooks: pre-commit install

  4. install the package itself in editable mode: pip install -e .

  5. run the test suite with: pytest from the root folder

Distribution via PyPI

To upload a new version to PyPI, simply:

  1. tag your new version on git: git tag -a x.x -m "my tag message"

  2. update the download_url field in the setup.py file

  3. commit, push the code and the tag (git push origin x.x), and make a PR

  4. Make sure you have a .pypirc file structured like this in your home folder (you can use https://upload.pypi.org/legacy/ for the URL field)

  5. once the updated code is present in master run python setup.py sdist && twine upload dist/* from the root of the package to distribute it.

Building documentation

To build the documentation locally simply run make html from the docs folder.

Bonus: what’s a zeugma?

It’s a figure of speech: “The act of using a word, particularly an adjective or verb, to apply to more than one noun when its sense is appropriate to only one.” (from Wiktionary).

For example, “He lost his wallet and his mind.” is a zeugma.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeugma-0.49.tar.gz (9.9 kB view details)

Uploaded Source

File details

Details for the file zeugma-0.49.tar.gz.

File metadata

  • Download URL: zeugma-0.49.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for zeugma-0.49.tar.gz
Algorithm Hash digest
SHA256 833f98cd22275be30e17bc415d08478b2156ae610d07ead4f811196a8635a149
MD5 3bc4af66dd55bd6841d65171cd369a8a
BLAKE2b-256 01081a1f56e5ab0c860fbafa39a3fd31b3ebeadbf9cbdecdd32df7934e61cfda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page