wild-nlp·PyPI

Text aspects for nlp models

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: BSD License
Topic
- Text Processing

Project description

alt wildnlp-logo

Corrupt an input text to test NLP models' robustness.
For details refer to https://nlp-demo.readthedocs.io

Installation

pip install wild-nlp

Supported aspects

All together we defined and implemented 11 aspects of text corruption.

Articles

Randomly removes or swaps articles into wrong ones.
Digits2Words

Converts numbers into words. Handles floating numbers as well.
Misspellings

Misspells words appearing in the Wikipedia list of:
- commonly misspelled English words
- homophones
Punctuation

Randomly adds or removes specified punctuation marks.
QWERTY

Simulates errors made while writing on a QWERTY-type keyboard.
RemoveChar

Randomly removes:
- characters from words or
- white spaces from sentences
SentimentMasking

Replaces random, single character with for example an asterisk in:
- negative or
- positive words from Opinion Lexicon:
  http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Swap

Randomly swaps two characters within a word, excluding punctuations.
Change char

Randomly change characters according to chosen dictionary, default is 'ocr' to simulate simple OCR errors.
White spaces

Randomly add or remove white spaces (listed as a parameter).

Sub string

Randomly add a substring to simulate more comples signs.

- All aspects can be chained together with the wildnlp.aspects.utils.compose function.

Supported datasets

Aspects can be applied to any text. Below is the list of datasets for which we already implemented processing pipelines.

CoNLL

The CoNLL-2003 shared task data for language-independent named entity recognition.
IMDB

The IMDB dataset containing movie reviews for a sentiment analysis. The dataset consists of 50 000 reviews of two classes, negative and positive.
SNLI

The SNLI dataset supporting the task of natural language inference.
SQuAD

The SQuAD dataset for the Machine Comprehension problem.

Usage

from wildnlp.aspects.dummy import Reverser, PigLatin
from wildnlp.aspects.utils import compose
from wildnlp.datasets import SampleDataset

# Create a dataset object and load the dataset
dataset = SampleDataset()
dataset.load()

# Crate a composed corruptor function.
# Functions will be applied in the same order they appear.
composed = compose(Reverser(), PigLatin())

# Apply the function to the dataset
modified = dataset.apply(composed)

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: BSD License
Topic
- Text Processing

Release history Release notifications | RSS feed

This version

1.0.2

Sep 9, 2019

1.0.1

Mar 31, 2019

1.0.0

Mar 29, 2019

0.0.6

Mar 29, 2019

0.0.5

Mar 29, 2019

0.0.4

Mar 29, 2019

0.0.3

Mar 9, 2019

0.0.2

Mar 9, 2019

0.0.1

Mar 9, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wild-nlp-1.0.2.tar.gz (44.9 kB view details)

Uploaded Sep 9, 2019 Source

Built Distribution

wild_nlp-1.0.2-py3-none-any.whl (53.3 kB view details)

Uploaded Sep 9, 2019 Python 3

File details

Details for the file wild-nlp-1.0.2.tar.gz.

File metadata

Download URL: wild-nlp-1.0.2.tar.gz
Upload date: Sep 9, 2019
Size: 44.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8

File hashes

Hashes for wild-nlp-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`def51dce4d5be1644b1109798631e75e780741a7effb99ea9ecb1a1b4a860031`
MD5	`58a3b292d0824ed743fba421013c6b5c`
BLAKE2b-256	`8200a656ff3a918c6b83bff6966f99a88e523a07685f1a0001dddd93f3c7bcbb`

See more details on using hashes here.

File details

Details for the file wild_nlp-1.0.2-py3-none-any.whl.

File metadata

Download URL: wild_nlp-1.0.2-py3-none-any.whl
Upload date: Sep 9, 2019
Size: 53.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8

File hashes

Hashes for wild_nlp-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7105880ba002f3bb0a02340a945230fe117863316bf0f80ad1218b07a98099d`
MD5	`546c18a0bc18bbff626901772ab23a91`
BLAKE2b-256	`735516cac5d14cb71bfc31297e3d12662ab7b11bf2dd8ec4e79c648255cb1bdc`

See more details on using hashes here.

wild-nlp 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Supported aspects

Supported datasets

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes