Skip to main content

Probabilistic Noising of Natural Language

Project description

Artext: Artificial Text Generation

LICENSE GitHub issues PyPI CircleCI

Probabilistic Noising of Natural Language

Artext is a work on injecting noise into text without affecting the core meaning for a human reader. This kind of data can be useful for many NLP tasks, particulary in making models robust to noisy/erroneous input.

This is a work in progress, and the result of our experiments we will published soon. Meanwhile, if you use artext in your research please cite this repository.

Note: Noising will generally increase the vocabulary size of the data sets, as it introduces word inflections and orthographic variations that may not have existed before. Therefore, use it with caution, especially for closed-vocabulary neural network models such as machine translation. Consider using subword based vocabulary (BPE for instance) in such scenarios.

Setup

artext's developed and tested with Python 3.6 and can be installed in two ways:

  1. Using pip:
 pip install artext
  1. From source code:
git clone https://github.com/fgaim/artext
cd artext
pip install -r requirements.txt
python setup.py install

Get required resources:

python -m spacy download 'en_core_web_sm'
python -m nltk.downloader 'punkt'
python -m nltk.downloader 'wordnet'

Usage

Use from command-line

Generate sentence (sent) or document (doc) level noise samples for a text file as follows:

python -m artext -src source.txt -out output.txt -l sent -er 0.5 -n 10

[or] From source code using inject.py as follows:

python inject.py -src source.txt -out output.txt -l sent -er 0.5 -n 10

Use -h to see all options.

Use as a library

from artext import Artext

artxt = Artext()
artxt.samples = 5
artxt.error_rate = 0.25
sent = 'This is a sample sentence to be noised.'
noises = artxt.noise_sentence(sent)
print(noises)

Examples

python example.py -er 0.5 -n 10

Sentence Level Examples

Input (clean sentence from Lang-8):

So , I think if we have to go somewhere on foot , we must put on our hat .

Human (error example from Lang-8):

So , I think if we have to go somewhere on foot , we must put on our hat .

Output (artext):

  • So , I think if we have to go <ins>going</ins> somewhere on foot <ins>feet</ins> , we must put on our hat . <ins>?</ins>
  • So , I think <ins>thinking</ins> if we have to go somewhere on foot , we must put on <ins>!</ins> our hat <ins>hats</ins> .
  • So , I think if we have <ins>we</ins> to go somewhere on foot <ins>feet</ins> , we must put on our hat . <ins>;</ins>
  • So , I think if we have to go somewhere on foot , we must put <ins>must</ins> on our hat <ins>hats</ins> .
  • So , I think if we have to go somewhere on foot <ins>feet</ins> , we must put on <ins>put</ins> our hat .
  • So , <ins>;</ins> I think if we have <ins>take</ins> to go somewhere on foot , we must put on our hat <ins>hats</ins> .
  • So , I think if we have to go somewhere <ins>someplace</ins> on foot , we must put <ins>putting</ins> on our hat <ins>hats</ins> .
  • So , I think if we have to go somewhere on foot , we must put on our hat . <ins>chapeau ;</ins>
  • So , I think if we have <ins>we</ins> to go somewhere <ins>go</ins> on foot , we must put on our hat .
  • So , I think <ins>retrieve</ins> if we have <ins>having</ins> to go <ins>going</ins> somewhere on foot , <ins>substructure</ins> we must put <ins>putting</ins> on our hat .

Document Level Examples

Input (clean sentence from Lang-8):

This morning I found out that one of my favourite bands released a new album .
I already forgot about Rise Against and it is a great surprise for me, because I haven't listened to them for 2 years .
I hope this band did n't become worse, like many others big ones did , and I 'll enjoy listening to it .
Well , I just have to get it and check it out .

Human (error example from Lang-8):

This morning I found out that one of my favourite bands <ins>band</ins> released a <ins>his</ins> new album . I already forgot about Rise Against and <ins>an</ins> it is a great surprise for me , because I have <ins>did</ins> n't listened <ins>return</ins> to them for 2 years . I hope this band did n't become worse , <ins>yet</ins> like many others big ones did , and I 'll enjoy listening to it . Well , I just have <ins>there remains</ins> to get it and check it out .

Output (artext):

  • This morning I found out that one of my favourite <ins>favored</ins> bands released a new album . I already forgot about Rise Against <ins>grow Agianst</ins> and it is <ins>are</ins> a great surprise for me , because I have n't listened <ins>listen</ins> to them for 2 years . I hope <ins>hoping</ins> this band did <ins>bands serve</ins> n't become worse , like many others big ones did , and I 'll enjoy listening to <ins>listening</ins> it . Well , I just have <ins>deliver</ins> to get it and check it out .
  • This morning I found out that one of my favourite bands released <ins>band</ins> a <ins>released</ins> new album . I already forgot <ins>forget</ins> about Rise Against <ins>Aigniast</ins> and it is a great surprise for me , because I <ins>beceause</ins> have n't listened to them for 2 years <ins>geezerhood</ins> . I hope <ins>hoping</ins> this band did <ins>bands</ins> n't become worse , <ins>did becoming wore</ins> like many others <ins>other</ins> big ones did , <ins>didding ;</ins> and I 'll enjoy listening to it . Well <ins>eWll</ins> , I just have to get it and check it out .
  • This morning I found out that one <ins>that</ins> of my favourite bands released a new album <ins>albums</ins> . I already forgot <ins>forgotting</ins> about Rise Against <ins>Aainst</ins> and it is <ins>be</ins> a great surprise <ins>surprisal</ins> for me , because I have <ins>having</ins> n't listened <ins>listneed</ins> to them <ins>tem</ins> for 2 years . I hope this band did <ins>do</ins> n't become worse , like many others big ones did <ins>didding</ins> , and I 'll enjoy listening to it . Well , I just have to get it and check <ins>checking</ins> it out .
  • This morning I found out that one of my favourite bands released a new album . I already forgot about <ins>abuot</ins> Rise Against <ins>Agaiinst</ins> and it is a great surprise <ins>srrpuise</ins> for me , because I have n't listened <ins>listening</ins> to them for 2 years <ins>year</ins> . I hope this band did n't become worse , like many others big <ins>other</ins> ones did , and I 'll enjoy listening <ins>enjoying litening</ins> to it . Well , I just <ins>scarce</ins> have to get <ins>getting</ins> it and check it <ins>checking</ins> out <ins>it</ins> .
  • This morning <ins>mornings</ins> I found <ins>ground</ins> out that <ins>hTat</ins> one of my favourite bands <ins>favorite band</ins> released a new album . I already forgot <ins>forget</ins> about Rise Against <ins>arise Agsinat</ins> and it is a great surprise <ins>surprisal</ins> for me , because I have <ins>because</ins> n't listened <ins>have listen</ins> to them for 2 years <ins>year</ins> . I hope this band did n't become worse <ins>tough</ins> , like many others <ins>other</ins> big ones did , and I 'll enjoy listening <ins>enjoy</ins> to it . <ins>?</ins> Well , I just <ins>hardly</ins> have to get it and check it out .
  • This morning I found <ins>fnuod</ins> out that <ins>htat</ins> one of my favourite bands released <ins>releasing</ins> a newalbum . I already forgot about <ins>abut</ins> Rise Against <ins>Aigainst</ins> and it is a great surprise <ins>surprises</ins> for me , because <ins>becuasae</ins> I have n't listened to them for 2 years <ins>year</ins> . I hope this band did n't become <ins>becoming</ins> worse , like many others <ins>other</ins> big ones <ins>one</ins> did , and I 'll enjoy listening <ins>enjoying</ins> to it . Well , I just have to <ins>having</ins> get <ins>to</ins> it and check it out . <ins>!</ins>
  • This morning I found out that one of my favourite <ins>my</ins> bands released <ins>release</ins> a new album . I already forgot <ins>alraedyy forgotting</ins> about Rise Against <ins>Aagaianst</ins> and it is <ins>are</ins> a great surprise <ins>surprises</ins> for me , <ins>.</ins> because I have n't listened <ins>listen</ins> to them for 2 years . I hope this band did <ins>band</ins> n't become worse , like many others big ones did , and I 'll enjoy listening to it . Well , I just have to get it and check it out .
  • This morning I found <ins>incur</ins> out that one of my favourite <ins>favored</ins> bands released <ins>releaseed</ins> a new album <ins>albums</ins> . I already forgot about Rise Against <ins>igAanst</ins> and it is a great <ins>grat</ins> surprisefor me , because I have <ins>having</ins> n't listened <ins>listen</ins> to them for 2 years . I hope this band did n't becomeworse , like many others big ones did <ins>one do</ins> , and I 'll enjoy <ins>enjoying</ins> listening to it . Well , <ins>:</ins> I just have <ins>having</ins> to get <ins>getting</ins> it and check it out .
  • This morning I found <ins>founding</ins> out that <ins>hTat</ins> one of my favourite bands released <ins>releasing</ins> a new <ins>newfangled</ins> album . I already forgot <ins>block</ins> about Rise Against <ins>Aganst</ins> and it is a great surprise for me , because <ins>becuasee</ins> I have n't listened to them for 2 years . I hope this <ins>tthis</ins> band did n't become <ins>becoming</ins> worse , <ins>:</ins> like many others big ones did , and I 'll enjoy listening to it . Well , I just have <ins>having</ins> to get it and check it out . <ins>.</ins>
  • This morning I found <ins>I</ins> out that one of my favourite bands released <ins>band releasing</ins> a new album . I already forgot about Rise <ins>Rising</ins> Against and it is a great <ins>is Greeat</ins> surprise for me , because I have n't listened to them for 2 years . I hope <ins>desire</ins> this band did n't become worse , like many others big ones did <ins>didding</ins> , and I 'll enjoy <ins>enjoying</ins> listening to it . Well , <ins>?</ins> I just have to get it and check it out .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for artext, version 0.2.8
Filename, size & hash File type Python version Upload date
artext-0.2.8-py3-none-any.whl (34.4 kB) View hashes Wheel py3
artext-0.2.8.tar.gz (35.2 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page