Skip to main content

Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. The shortname is `SeaQuBe` or `seaqube`. Simple call it '| ˈsi: kjuːb |'

Project description



SeaQuBe

Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym SeaQuBe or seaqube.

This python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The BaseAugmentation class provides the same api as the python package nlpaug, so that this packages can used together smoothly. However BaseAugmentation provides also other methods. Detailed examples see beneath.

SeaQuBe provides also a toolkit to wrap a trained nlp model to a nice interactive tool.

Travis build Status code:qualityPyPI version

Features

  • Text Data Augmentation
  • Chaining and Reducing of Text Data Augmentations
  • Word Embedding Quality Methods
  • Interactive NLM Model Wrapper

Demo

Augmentation

Level Augmenter Description
Character QwertyAugmentation Simulate keyboard distance error
Corpus UnigramAugmentation Replace ubiquitous words with other ubiquitous words
Word Active2PassiveAugmentation Change surface of document using an simple active-to-passive transformer
Word EDAAugmentation Augment document using the EDA algorithm
Word EmbeddingAugmentation Replace similar word using WordNet
Word TranslationAugmentation Change surface of document using translation and back-translation (with GoogleTranslate)

Augmentation Chainer

The streaming feature of augmentation is implemented in the AugmentationStreamer class. One Reduceing class exist, more can implemented extending the BaseReduction class.

Action Class Description
Streaming AugmentationStreamer Run augmentation for each document through all chained augmentations.
Reducing UniqueCorpusReduction Getting a list of documents, only unique documents are returned.

Word Embedding Evaluation

Method Description
WordAnalogyBenchmark This method benchmark how go relations of the type: a is to b as c is to d can be solved correctly.
WordSimilarityBenchmark This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score.
WordOutliersBenchmark This method benchmark how good a outlier of a group of words can be detected.
SemanticWordnetBenchmark Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked.

Installation

SeaQuBe can be installed from PyPip using: pip install seaqube or run in the main directory: python setup.py install.

External Dependencies

Some external dependencies are not installed automatically, but seaqube or nltk might throw errors with an instruction what to do. For example seqube might ask you to run:

python -c "from seaqube import download;download('vec4ir')"

Quick Demo

from seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation
translate = TranslationAugmentation(max_length=2)
translate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])

Setup Dev Environment

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seaqube-0.1.11.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seaqube-0.1.11-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file seaqube-0.1.11.tar.gz.

File metadata

  • Download URL: seaqube-0.1.11.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9

File hashes

Hashes for seaqube-0.1.11.tar.gz
Algorithm Hash digest
SHA256 aa6301d1e1fbbbde82c61d440e1970cab8a783d0d3843537649b9a2566b55b7c
MD5 76e3f9ca9da7fb025cadf35b5876b3ce
BLAKE2b-256 5190e862d9b191319341534da14c7aa58fe0f30e932514bf64aa19386614a237

See more details on using hashes here.

File details

Details for the file seaqube-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: seaqube-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9

File hashes

Hashes for seaqube-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 fa43035722b3e591a3776ffd525fe215846c7c2ecc55cb24765a466051496372
MD5 b10f5c1c0dd09409d7015cf2b739be34
BLAKE2b-256 143ce43b3af9690579b65285af58baa7baf134cac4be9b1d06670ddd09182d4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page