Probabilistic Noising of Natural Language
Project description
Artext: Artificial Text Generation
Probabilistic Noising of Natural Language
Artext is a work on injecting noise into text without affecting the core meaning for a human reader. This kind of data can be useful for many NLP tasks, particulary in making models robust to noisy/erroneous input.
Note: Noising will generally increase the vocabulary size of the data sets, as it introduces word inflections and orthographic variations that may not have existed before. Therefore, it should be used with caution, especially for closed-vocabulary neural network models such as machine translation. In such scenarios, consider using subword based vocabulary (BPE
for instance).
This is a work in progress, and the result of our experiments we will published soon.
Meanwhile, if you use artext
in your research please cite this repository.
Setup
artext
's developed and tested with Python 3.6
and can be installed in two ways:
- Using
pip
:
pip install artext
- From source code:
git clone https://github.com/fgaim/artext
cd artext
pip install -r requirements.txt
python setup.py install
Get required resources:
python -m spacy download 'en_core_web_sm'
python -m nltk.downloader 'punkt'
python -m nltk.downloader 'wordnet'
Usage
Use from command-line
Generate sentence (sent
) or document (doc
) level noise samples for a text file as follows:
python -m artext -src source.txt -out output.txt -l sent -er 0.5 -n 10
[or] From source code using inject.py
as follows:
python inject.py -src source.txt -out output.txt -l sent -er 0.5 -n 10
Use -h
to see all options.
Use as a library
from artext import Artext
artxt = Artext()
artxt.samples = 5
artxt.error_rate = 0.25
sent = 'This is a sample sentence to be noised.'
noises = artxt.noise_sentence(sent)
print(noises)
Examples
python example.py -er 0.5 -n 10
Sentence Level Examples
Input (clean sentence from Lang-8):
So , I think if we have to go somewhere on foot , we must put on our hat .
Human (error example from Lang-8):
So , I think if we have to go somewhere on foot , we must put on our hat .
Output (artext):
- So , I think if we have to
go<ins>going</ins> somewhere onfoot<ins>feet</ins> , we must put on our hat.<ins>?</ins> - So
,Ithink<ins>thinking</ins> if we have to go somewhere on foot , we must put on <ins>!</ins> ourhat<ins>hats</ins> . - So , I think if
wehave <ins>we</ins> to go somewhere onfoot<ins>feet</ins> , we must put on our hat.<ins>;</ins> - So
,I think if we have to go somewhere on foot , wemustput <ins>must</ins> on ourhat<ins>hats</ins> . - So , I think if we have to go somewhere on
foot<ins>feet</ins> , we mustputon <ins>put</ins> our hat . - So
,<ins>;</ins> I think if wehave<ins>take</ins> to go somewhere on foot,we must put on ourhat<ins>hats</ins> . - So , I think if we have to go
somewhere<ins>someplace</ins> on foot , we mustput<ins>putting</ins> on ourhat<ins>hats</ins> . - So , I think if we have to go somewhere on foot , we must put on our
hat .<ins>chapeau ;</ins> - So
,I think ifwehave <ins>we</ins> togosomewhere <ins>go</ins> on foot , we must put on our hat . - So , I
think<ins>retrieve</ins> if wehave<ins>having</ins> togo<ins>going</ins> somewhere onfoot ,<ins>substructure</ins> we mustput<ins>putting</ins> on our hat .
Document Level Examples
Input (clean sentence from Lang-8):
This morning I found out that one of my favourite bands released a new album .
I already forgot about Rise Against and it is a great surprise for me, because I haven't listened to them for 2 years .
I hope this band did n't become worse, like many others big ones did , and I 'll enjoy listening to it .
Well , I just have to get it and check it out .
Human (error example from Lang-8):
This morning I found out that one of my favourite bands <ins>band</ins> released a <ins>his</ins> new album . I already forgot about Rise Against and <ins>an</ins> it is a great surprise for me , because I have <ins>did</ins> n't listened <ins>return</ins> to them for 2 years . I hope this band did n't become worse , <ins>yet</ins> like many others big ones did , and I 'll enjoy listening to it . Well , I just have <ins>there remains</ins> to get it and check it out .
Output (artext):
- This morning I found out that one of my
favourite<ins>favored</ins> bands released a new album . I already forgot aboutRise Against<ins>grow Agianst</ins> and itis<ins>are</ins> a great surprise for me , because I have n'tlistened<ins>listen</ins> to them for 2 years.Ihope<ins>hoping</ins> thisband did<ins>bands serve</ins> n't become worse , like many others big ones did , and I 'll enjoylisteningto <ins>listening</ins> it . Well , I justhave<ins>deliver</ins> to get it and check it out . - This morning I found out that one of my favourite
bands released<ins>band</ins> a <ins>released</ins> new album . I alreadyforgot<ins>forget</ins> about RiseAgainst<ins>Aigniast</ins> and it is a great surprise for me ,becauseI <ins>beceause</ins> have n't listened to them for 2years<ins>geezerhood</ins> . Ihope<ins>hoping</ins> thisband did<ins>bands</ins> n'tbecome worse ,<ins>did becoming wore</ins> like manyothers<ins>other</ins> big onesdid ,<ins>didding ;</ins> and I 'll enjoy listening to it. Well<ins>eWll</ins> , I just have to get it and check it out . - This morning I found out
thatone <ins>that</ins> of my favourite bands released a newalbum<ins>albums</ins> . I alreadyforgot<ins>forgotting</ins> about RiseAgainst<ins>Aainst</ins> and itis<ins>be</ins> a greatsurprise<ins>surprisal</ins> for me , because Ihave<ins>having</ins> n'tlistened<ins>listneed</ins> tothem<ins>tem</ins> for 2 years . I hope this banddid<ins>do</ins> n't becomeworse ,like many others big onesdid<ins>didding</ins> , and I 'll enjoy listening to it . Well , I just have to get it andcheck<ins>checking</ins> it out . - This morning I found out that one of my favourite bands released a new album . I already forgot
about<ins>abuot</ins> RiseAgainst<ins>Agaiinst</ins> and it is a greatsurprise<ins>srrpuise</ins> for me , because I have n'tlistened<ins>listening</ins> to them for 2years<ins>year</ins> . I hope this band did n't become worse , like manyothersbig <ins>other</ins> ones did , and I 'llenjoy listening<ins>enjoying litening</ins> to it . Well , Ijust<ins>scarce</ins> have toget<ins>getting</ins> it andcheck it<ins>checking</ins> out <ins>it</ins> . - This
morning<ins>mornings</ins> Ifound<ins>ground</ins> outthat<ins>hTat</ins> one of myfavourite bands<ins>favorite band</ins> released a new album . I alreadyforgot<ins>forget</ins> aboutRise Against<ins>arise Agsinat</ins> and it is a greatsurprise<ins>surprisal</ins> for me ,becauseIhave<ins>because</ins> n'tlistened<ins>have listen</ins> to them for 2years<ins>year</ins> . I hope this band did n't becomeworse<ins>tough</ins> , like manyothers<ins>other</ins> big ones did , and I 'llenjoylistening <ins>enjoy</ins> to it.<ins>?</ins> Well , Ijust<ins>hardly</ins> have to get it and check it out . - This morning I
found<ins>fnuod</ins> outthat<ins>htat</ins> one of my favourite bandsreleased<ins>releasing</ins> a newalbum . I already forgotabout<ins>abut</ins> RiseAgainst<ins>Aigainst</ins> and it is a greatsurprise<ins>surprises</ins> for me ,because<ins>becuasae</ins> I have n't listened to them for 2years<ins>year</ins> . I hope this band did n'tbecome<ins>becoming</ins> worse , like manyothers<ins>other</ins> bigones<ins>one</ins> did,and I 'llenjoylistening <ins>enjoying</ins> to it . Well , I justhave to<ins>having</ins> get <ins>to</ins> it and check it out.<ins>!</ins> - This morning I found out that one of
myfavourite <ins>my</ins> bandsreleased<ins>release</ins> a new album . Ialready forgot<ins>alraedyy forgotting</ins> about RiseAgainst<ins>Aagaianst</ins> and itis<ins>are</ins> a greatsurprise<ins>surprises</ins> for me,<ins>.</ins> because I have n'tlistened<ins>listen</ins> to them for 2 years . I hope thisbanddid <ins>band</ins> n't become worse , likemanyothers big ones did , and I 'll enjoy listening to it . Well , I just have to get it and check it out . - This morning I
found<ins>incur</ins> out that one of myfavourite<ins>favored</ins> bandsreleased<ins>releaseed</ins> a newalbum<ins>albums</ins> . I already forgot about RiseAgainst<ins>igAanst</ins> and it is agreat<ins>grat</ins> surprisefor me , because Ihave<ins>having</ins> n'tlistened<ins>listen</ins> to them for 2 years.I hope this band did n't becomeworse , like many others bigones did<ins>one do</ins> , and I 'llenjoy<ins>enjoying</ins> listening to it . Well,<ins>:</ins> I justhave<ins>having</ins> toget<ins>getting</ins> it and check it out . - This morning I
found<ins>founding</ins> outthat<ins>hTat</ins> one of my favourite bandsreleased<ins>releasing</ins> anew<ins>newfangled</ins> album . I alreadyforgot<ins>block</ins> about RiseAgainst<ins>Aganst</ins> and it is a great surprise for me ,because<ins>becuasee</ins> I have n't listened to them for 2 years . I hopethis<ins>tthis</ins> band did n'tbecome<ins>becoming</ins> worse,<ins>:</ins> like many others big ones did , and I 'll enjoy listening to it . Well , I justhave<ins>having</ins> to get it and check it out . <ins>.</ins> - This morning
Ifound <ins>I</ins> out that one of my favouritebands released<ins>band releasing</ins> a new album . I already forgot aboutRise<ins>Rising</ins> Against and itisagreat<ins>is Greeat</ins> surprise for me , because I have n't listened to them for 2 years . Ihope<ins>desire</ins> this band did n't become worse , like many others big onesdid<ins>didding</ins> , and I 'llenjoy<ins>enjoying</ins> listening to it . Well,<ins>?</ins> I just have to get it and check it out .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.