Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. The shortname is `SeaQuBe` or `seaqube`. Simple call it '| ˈsi: kjuːb |'
Project description
SeaQuBe
Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym SeaQuBe
or seaqube
.
This python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The BaseAugmentation
class provides the same api as the python package nlpaug, so that this packages can used together smoothly. However BaseAugmentation
provides also other methods. Detailed examples see beneath.
SeaQuBe
provides also a toolkit to wrap a trained nlp model to a nice interactive tool.
Features
- Text Data Augmentation
- Chaining and Reducing of Text Data Augmentations
- Word Embedding Quality Methods
- Interactive NLM Model Wrapper
Demo
- Augmentation in three lines
- Example of Basic Text Augmentation
- Example of Text Augmentation Chaining
- Example of Word Embedding Evaluation
- Example of Interactive NLP
Augmentation
Level | Augmenter | Description |
---|---|---|
Character | QwertyAugmentation | Simulate keyboard distance error |
Corpus | UnigramAugmentation | Replace ubiquitous words with other ubiquitous words |
Word | Active2PassiveAugmentation | Change surface of document using an simple active-to-passive transformer |
Word | EDAAugmentation | Augment document using the EDA algorithm |
Word | EmbeddingAugmentation | Replace similar word using WordNet |
Word | TranslationAugmentation | Change surface of document using translation and back-translation (with GoogleTranslate) |
Augmentation Chainer
The streaming feature of augmentation is implemented in the AugmentationStreamer
class. One Reduceing
class exist, more can implemented
extending the BaseReduction
class.
Action | Class | Description |
---|---|---|
Streaming | AugmentationStreamer | Run augmentation for each document through all chained augmentations. |
Reducing | UniqueCorpusReduction | Getting a list of documents, only unique documents are returned. |
Word Embedding Evaluation
Method | Description |
---|---|
WordAnalogyBenchmark | This method benchmark how go relations of the type: a is to b as c is to d can be solved correctly. |
WordSimilarityBenchmark | This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score. |
WordOutliersBenchmark | This method benchmark how good a outlier of a group of words can be detected. |
SemanticWordnetBenchmark | Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked. |
Installation
SeaQuBe
can be installed from PyPip using: pip install seaqube
or run in the main directory: python setup.py install
.
External Dependencies
Some external dependencies are not installed automatically, but seaqube
or nltk
might throw errors with an instruction what to do.
For example seqube
might ask you to run:
python -c "from seaqube import download;download('vec4ir')"
Quick Demo
from seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation
translate = TranslationAugmentation(max_length=2)
translate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])
Setup Dev Environment
TODO
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.