Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. The shortname is `SeaQuBe` or `seaqube`. Simple call it '| ˈsi: kjuːb |'
Project description
SeaQuBe
Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym SeaQuBe or seaqube.
This python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The BaseAugmentation class provides the same api as the python package nlpaug, so that this packages can used together smoothly. However BaseAugmentation provides also other methods. Detailed examples see beneath.
SeaQuBe provides also a toolkit to wrap a trained nlp model to a nice interactive tool.
Features
- Text Data Augmentation
- Chaining and Reducing of Text Data Augmentations
- Word Embedding Quality Methods
- Interactive NLM Model Wrapper
Demo
- Augmentation in three lines
- Example of Basic Text Augmentation
- Example of Text Augmentation Chaining
- Example of Word Embedding Evaluation
- Example of Interactive NLP
Augmentation
| Level | Augmenter | Description |
|---|---|---|
| Character | QwertyAugmentation | Simulate keyboard distance error |
| Corpus | UnigramAugmentation | Replace ubiquitous words with other ubiquitous words |
| Word | Active2PassiveAugmentation | Change surface of document using an simple active-to-passive transformer |
| Word | EDAAugmentation | Augment document using the EDA algorithm |
| Word | EmbeddingAugmentation | Replace similar word using WordNet |
| Word | TranslationAugmentation | Change surface of document using translation and back-translation (with GoogleTranslate) |
Augmentation Chainer
The streaming feature of augmentation is implemented in the AugmentationStreamer class. One Reduceing class exist, more can implemented
extending the BaseReduction class.
| Action | Class | Description |
|---|---|---|
| Streaming | AugmentationStreamer | Run augmentation for each document through all chained augmentations. |
| Reducing | UniqueCorpusReduction | Getting a list of documents, only unique documents are returned. |
Word Embedding Evaluation
| Method | Description |
|---|---|
| WordAnalogyBenchmark | This method benchmark how go relations of the type: a is to b as c is to d can be solved correctly. |
| WordSimilarityBenchmark | This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score. |
| WordOutliersBenchmark | This method benchmark how good a outlier of a group of words can be detected. |
| SemanticWordnetBenchmark | Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked. |
Installation
SeaQuBe can be installed from PyPip using: pip install seaqube or run in the main directory: python setup.py install.
External Dependencies
Some external dependencies are not installed automatically, but seaqube or nltk might throw errors with an instruction what to do.
For example seqube might ask you to run:
python -c "from seaqube import download;download('vec4ir')"
Quick Demo
from seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation
translate = TranslationAugmentation(max_length=2)
translate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])
Setup Dev Environment
TODO
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seaqube-0.1.11.tar.gz.
File metadata
- Download URL: seaqube-0.1.11.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa6301d1e1fbbbde82c61d440e1970cab8a783d0d3843537649b9a2566b55b7c
|
|
| MD5 |
76e3f9ca9da7fb025cadf35b5876b3ce
|
|
| BLAKE2b-256 |
5190e862d9b191319341534da14c7aa58fe0f30e932514bf64aa19386614a237
|
File details
Details for the file seaqube-0.1.11-py3-none-any.whl.
File metadata
- Download URL: seaqube-0.1.11-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa43035722b3e591a3776ffd525fe215846c7c2ecc55cb24765a466051496372
|
|
| MD5 |
b10f5c1c0dd09409d7015cf2b739be34
|
|
| BLAKE2b-256 |
143ce43b3af9690579b65285af58baa7baf134cac4be9b1d06670ddd09182d4a
|