Skip to main content

Probabilistic Noising of Natural Language

Project description

LICENSE GitHub issues PyPI CircleCI

Artext: Artificial Text Generation

Probabilistic Noising of Natural Language

Artext is a work on injecting noise into text without affecting the core meaning for a human reader. This kind of data can be useful for many NLP tasks, particulary in making models robust to noisy/erroneous input.

Note: Noising will generally increase the vocabulary size of the data sets, as it introduces word inflections and orthographic variations that may not have existed before. Therefore, it should be used with caution, especially for closed-vocabulary neural network models such as machine translation. In such scenarios, consider using subword based vocabulary (BPE for instance).

This is a work in progress, and the result of our experiments we will published soon. Meanwhile, if you use artext in your research please cite this repository.

Setup

artext's developed and tested with Python 3.6 and can be installed in two ways:

  1. Using pip:
 pip install artext
  1. From source code:
git clone https://github.com/fgaim/artext
cd artext
pip install -r requirements.txt
python setup.py install

Get required resources:

python -m spacy download 'en_core_web_sm'
python -m nltk.downloader 'punkt'
python -m nltk.downloader 'wordnet'

Usage

Use from command-line

Generate sentence (sent) or document (doc) level noise samples for a text file as follows:

python -m artext -src source.txt -out output.txt -l sent -er 0.5 -n 10

[or] From source code using inject.py as follows:

python inject.py -src source.txt -out output.txt -l sent -er 0.5 -n 10

Use -h to see all options.

Use as a library

from artext import Artext

artxt = Artext()
artxt.samples = 5
artxt.error_rate = 0.25
sent = 'This is a sample sentence to be noised.'
noises = artxt.noise_sentence(sent)
print(noises)

Examples

python example.py -er 0.5 -n 10

Sentence Level Examples

Input (clean sentence from Lang-8):

So , I think if we have to go somewhere on foot , we must put on our hat .

Human (error example from Lang-8):

So , I think if we have to go somewhere on foot , we must put on our hat .

Output (artext):

  • So , I think if we have to go going somewhere on foot feet , we must put on our hat . ?
  • So , I think thinking if we have to go somewhere on foot , we must put on ! our hat hats .
  • So , I think if we have we to go somewhere on foot feet , we must put on our hat . ;
  • So , I think if we have to go somewhere on foot , we must put must on our hat hats .
  • So , I think if we have to go somewhere on foot feet , we must put on put our hat .
  • So , ; I think if we have take to go somewhere on foot , we must put on our hat hats .
  • So , I think if we have to go somewhere someplace on foot , we must put putting on our hat hats .
  • So , I think if we have to go somewhere on foot , we must put on our hat . chapeau ;
  • So , I think if we have we to go somewhere go on foot , we must put on our hat .
  • So , I think retrieve if we have having to go going somewhere on foot , substructure we must put putting on our hat .

Document Level Examples

Input (clean sentence from Lang-8):

This morning I found out that one of my favourite bands released a new album .
I already forgot about Rise Against and it is a great surprise for me, because I haven't listened to them for 2 years .
I hope this band did n't become worse, like many others big ones did , and I 'll enjoy listening to it .
Well , I just have to get it and check it out .

Human (error example from Lang-8):

This morning I found out that one of my favourite bands band released a his new album . I already forgot about Rise Against and an it is a great surprise for me , because I have did n't listened return to them for 2 years . I hope this band did n't become worse , yet like many others big ones did , and I 'll enjoy listening to it . Well , I just have there remains to get it and check it out .

Output (artext):

  • This morning I found out that one of my favourite favored bands released a new album . I already forgot about Rise Against grow Agianst and it is are a great surprise for me , because I have n't listened listen to them for 2 years . I hope hoping this band did bands serve n't become worse , like many others big ones did , and I 'll enjoy listening to listening it . Well , I just have deliver to get it and check it out .
  • This morning I found out that one of my favourite bands released band a released new album . I already forgot forget about Rise Against Aigniast and it is a great surprise for me , because I beceause have n't listened to them for 2 years geezerhood . I hope hoping this band did bands n't become worse , did becoming wore like many others other big ones did , didding ; and I 'll enjoy listening to it . Well eWll , I just have to get it and check it out .
  • This morning I found out that one that of my favourite bands released a new album albums . I already forgot forgotting about Rise Against Aainst and it is be a great surprise surprisal for me , because I have having n't listened listneed to them tem for 2 years . I hope this band did do n't become worse , like many others big ones did didding , and I 'll enjoy listening to it . Well , I just have to get it and check checking it out .
  • This morning I found out that one of my favourite bands released a new album . I already forgot about abuot Rise Against Agaiinst and it is a great surprise srrpuise for me , because I have n't listened listening to them for 2 years year . I hope this band did n't become worse , like many others big other ones did , and I 'll enjoy listening enjoying litening to it . Well , I just scarce have to get getting it and check it checking out it .
  • This morning mornings I found ground out that hTat one of my favourite bands favorite band released a new album . I already forgot forget about Rise Against arise Agsinat and it is a great surprise surprisal for me , because I have because n't listened have listen to them for 2 years year . I hope this band did n't become worse tough , like many others other big ones did , and I 'll enjoy listening enjoy to it . ? Well , I just hardly have to get it and check it out .
  • This morning I found fnuod out that htat one of my favourite bands released releasing a newalbum . I already forgot about abut Rise Against Aigainst and it is a great surprise surprises for me , because becuasae I have n't listened to them for 2 years year . I hope this band did n't become becoming worse , like many others other big ones one did , and I 'll enjoy listening enjoying to it . Well , I just have to having get to it and check it out . !
  • This morning I found out that one of my favourite my bands released release a new album . I already forgot alraedyy forgotting about Rise Against Aagaianst and it is are a great surprise surprises for me , . because I have n't listened listen to them for 2 years . I hope this band did band n't become worse , like many others big ones did , and I 'll enjoy listening to it . Well , I just have to get it and check it out .
  • This morning I found incur out that one of my favourite favored bands released releaseed a new album albums . I already forgot about Rise Against igAanst and it is a great grat surprisefor me , because I have having n't listened listen to them for 2 years . I hope this band did n't becomeworse , like many others big ones did one do , and I 'll enjoy enjoying listening to it . Well , : I just have having to get getting it and check it out .
  • This morning I found founding out that hTat one of my favourite bands released releasing a new newfangled album . I already forgot block about Rise Against Aganst and it is a great surprise for me , because becuasee I have n't listened to them for 2 years . I hope this tthis band did n't become becoming worse , : like many others big ones did , and I 'll enjoy listening to it . Well , I just have having to get it and check it out . .
  • This morning I found I out that one of my favourite bands released band releasing a new album . I already forgot about Rise Rising Against and it is a great is Greeat surprise for me , because I have n't listened to them for 2 years . I hope desire this band did n't become worse , like many others big ones did didding , and I 'll enjoy enjoying listening to it . Well , ? I just have to get it and check it out .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artext-0.2.9.tar.gz (35.3 kB view hashes)

Uploaded Source

Built Distribution

artext-0.2.9-py3-none-any.whl (34.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page