Skip to main content

Probabilistic Noising of Natural Language

Project description

LICENSE GitHub issues PyPI CircleCI

Artext: Artificial Text Generation

Probabilistic Noising of Natural Language

Artext is a work on injecting noise into text without affecting the core meaning for a human reader. This kind of data can be useful for many NLP tasks, particulary in making models robust to noisy/erroneous input.

Note: Noising will generally increase the vocabulary size of the data sets, as it introduces word inflections and orthographic variations that may not have existed before. Therefore, it should be used with caution, especially for closed-vocabulary neural network models such as machine translation. In such scenarios, consider using subword based vocabulary (BPE for instance).

This is a work in progress, and the result of our experiments we will published soon. Meanwhile, if you use artext in your research please cite this repository.

Setup

artext's developed and tested with Python 3.6 and can be installed in two ways:

  1. Using pip:
 pip install artext
  1. From source code:
git clone https://github.com/fgaim/artext
cd artext
pip install -r requirements.txt
python setup.py install

Get required resources:

python -m spacy download 'en_core_web_sm'
python -m nltk.downloader 'punkt'
python -m nltk.downloader 'wordnet'

Usage

Use from command-line

Generate sentence (sent) or document (doc) level noise samples for a text file as follows:

python -m artext -src source.txt -out output.txt -l sent -er 0.5 -n 10

[or] From source code using inject.py as follows:

python inject.py -src source.txt -out output.txt -l sent -er 0.5 -n 10

Use -h to see all options.

Use as a library

from artext import Artext

artxt = Artext()
artxt.samples = 5
artxt.error_rate = 0.25
sent = 'This is a sample sentence to be noised.'
noises = artxt.noise_sentence(sent)
print(noises)

Examples

python example.py -er 0.5 -n 10

Sentence Level Examples

Input (clean sentence from Lang-8):

So , I think if we have to go somewhere on foot , we must put on our hat .

Human (error example from Lang-8):

So , I think if we have to go somewhere on foot , we must put on our hat .

Output (artext):

  • So , I think if we have to go going somewhere on foot feet , we must put on our hat . ?
  • So , I think thinking if we have to go somewhere on foot , we must put on ! our hat hats .
  • So , I think if we have we to go somewhere on foot feet , we must put on our hat . ;
  • So , I think if we have to go somewhere on foot , we must put must on our hat hats .
  • So , I think if we have to go somewhere on foot feet , we must put on put our hat .
  • So , ; I think if we have take to go somewhere on foot , we must put on our hat hats .
  • So , I think if we have to go somewhere someplace on foot , we must put putting on our hat hats .
  • So , I think if we have to go somewhere on foot , we must put on our hat . chapeau ;
  • So , I think if we have we to go somewhere go on foot , we must put on our hat .
  • So , I think retrieve if we have having to go going somewhere on foot , substructure we must put putting on our hat .

Document Level Examples

Input (clean sentence from Lang-8):

This morning I found out that one of my favourite bands released a new album .
I already forgot about Rise Against and it is a great surprise for me, because I haven't listened to them for 2 years .
I hope this band did n't become worse, like many others big ones did , and I 'll enjoy listening to it .
Well , I just have to get it and check it out .

Human (error example from Lang-8):

This morning I found out that one of my favourite bands band released a his new album . I already forgot about Rise Against and an it is a great surprise for me , because I have did n't listened return to them for 2 years . I hope this band did n't become worse , yet like many others big ones did , and I 'll enjoy listening to it . Well , I just have there remains to get it and check it out .

Output (artext):

  • This morning I found out that one of my favourite favored bands released a new album . I already forgot about Rise Against grow Agianst and it is are a great surprise for me , because I have n't listened listen to them for 2 years . I hope hoping this band did bands serve n't become worse , like many others big ones did , and I 'll enjoy listening to listening it . Well , I just have deliver to get it and check it out .
  • This morning I found out that one of my favourite bands released band a released new album . I already forgot forget about Rise Against Aigniast and it is a great surprise for me , because I beceause have n't listened to them for 2 years geezerhood . I hope hoping this band did bands n't become worse , did becoming wore like many others other big ones did , didding ; and I 'll enjoy listening to it . Well eWll , I just have to get it and check it out .
  • This morning I found out that one that of my favourite bands released a new album albums . I already forgot forgotting about Rise Against Aainst and it is be a great surprise surprisal for me , because I have having n't listened listneed to them tem for 2 years . I hope this band did do n't become worse , like many others big ones did didding , and I 'll enjoy listening to it . Well , I just have to get it and check checking it out .
  • This morning I found out that one of my favourite bands released a new album . I already forgot about abuot Rise Against Agaiinst and it is a great surprise srrpuise for me , because I have n't listened listening to them for 2 years year . I hope this band did n't become worse , like many others big other ones did , and I 'll enjoy listening enjoying litening to it . Well , I just scarce have to get getting it and check it checking out it .
  • This morning mornings I found ground out that hTat one of my favourite bands favorite band released a new album . I already forgot forget about Rise Against arise Agsinat and it is a great surprise surprisal for me , because I have because n't listened have listen to them for 2 years year . I hope this band did n't become worse tough , like many others other big ones did , and I 'll enjoy listening enjoy to it . ? Well , I just hardly have to get it and check it out .
  • This morning I found fnuod out that htat one of my favourite bands released releasing a newalbum . I already forgot about abut Rise Against Aigainst and it is a great surprise surprises for me , because becuasae I have n't listened to them for 2 years year . I hope this band did n't become becoming worse , like many others other big ones one did , and I 'll enjoy listening enjoying to it . Well , I just have to having get to it and check it out . !
  • This morning I found out that one of my favourite my bands released release a new album . I already forgot alraedyy forgotting about Rise Against Aagaianst and it is are a great surprise surprises for me , . because I have n't listened listen to them for 2 years . I hope this band did band n't become worse , like many others big ones did , and I 'll enjoy listening to it . Well , I just have to get it and check it out .
  • This morning I found incur out that one of my favourite favored bands released releaseed a new album albums . I already forgot about Rise Against igAanst and it is a great grat surprisefor me , because I have having n't listened listen to them for 2 years . I hope this band did n't becomeworse , like many others big ones did one do , and I 'll enjoy enjoying listening to it . Well , : I just have having to get getting it and check it out .
  • This morning I found founding out that hTat one of my favourite bands released releasing a new newfangled album . I already forgot block about Rise Against Aganst and it is a great surprise for me , because becuasee I have n't listened to them for 2 years . I hope this tthis band did n't become becoming worse , : like many others big ones did , and I 'll enjoy listening to it . Well , I just have having to get it and check it out . .
  • This morning I found I out that one of my favourite bands released band releasing a new album . I already forgot about Rise Rising Against and it is a great is Greeat surprise for me , because I have n't listened to them for 2 years . I hope desire this band did n't become worse , like many others big ones did didding , and I 'll enjoy enjoying listening to it . Well , ? I just have to get it and check it out .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artext-0.2.9.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

artext-0.2.9-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file artext-0.2.9.tar.gz.

File metadata

  • Download URL: artext-0.2.9.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for artext-0.2.9.tar.gz
Algorithm Hash digest
SHA256 a5f4089b31d1e52b3599a04ba9ec6500bac48f728989f0e8544134c8092eb01e
MD5 4914b0a01ea7dd6d954991416bd78b6e
BLAKE2b-256 a4607155dd4836c217cc780123f31e8a0f5f1e890ace765a0496d396c0cec7ef

See more details on using hashes here.

File details

Details for the file artext-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: artext-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for artext-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 5ba0db1221d9b184300e88a93910b707181b2e226d05ea70db203552ea060790
MD5 8fb075e5720abe5a36e89f8e74cc844e
BLAKE2b-256 f6a5fabcfc5b8fcc43c4ae45b0b569594e56bd47d8571528ba34b90c099c30b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page