Probabilistic Noising of Natural Language
Project description
Artext: Artificial Text Generation
Probabilistic Noising of Natural Language
Artext is a work on injecting noise into text without affecting the core meaning for a human reader. This kind of data can be useful for many NLP tasks, particulary in making models robust to noisy/erroneous input.
Note: Noising will generally increase the vocabulary size of the data sets, as it introduces word inflections and orthographic variations that may not have existed before. Therefore, it should be used with caution, especially for closed-vocabulary neural network models such as machine translation. In such scenarios, consider using subword based vocabulary (BPE
for instance).
This is a work in progress, and the result of our experiments we will published soon.
Meanwhile, if you use artext
in your research please cite this repository.
Setup
artext
's developed and tested with Python 3.6
and can be installed in two ways:
- Using
pip
:
pip install artext
- From source code:
git clone https://github.com/fgaim/artext
cd artext
pip install -r requirements.txt
python setup.py install
Get required resources:
python -m spacy download 'en_core_web_sm'
python -m nltk.downloader 'punkt'
python -m nltk.downloader 'wordnet'
Usage
Use from command-line
Generate sentence (sent
) or document (doc
) level noise samples for a text file as follows:
python -m artext -src source.txt -out output.txt -l sent -er 0.5 -n 10
[or] From source code using inject.py
as follows:
python inject.py -src source.txt -out output.txt -l sent -er 0.5 -n 10
Use -h
to see all options.
Use as a library
from artext import Artext
artxt = Artext()
artxt.samples = 5
artxt.error_rate = 0.25
sent = 'This is a sample sentence to be noised.'
noises = artxt.noise_sentence(sent)
print(noises)
Examples
python example.py -er 0.5 -n 10
Sentence Level Examples
Input (clean sentence from Lang-8):
So , I think if we have to go somewhere on foot , we must put on our hat .
Human (error example from Lang-8):
So , I think if we have to go somewhere on foot , we must put on our hat .
Output (artext):
- So , I think if we have to
gogoing somewhere onfootfeet , we must put on our hat.? - So
,Ithinkthinking if we have to go somewhere on foot , we must put on ! ourhathats . - So , I think if
wehave we to go somewhere onfootfeet , we must put on our hat.; - So
,I think if we have to go somewhere on foot , wemustput must on ourhathats . - So , I think if we have to go somewhere on
footfeet , we mustputon put our hat . - So
,; I think if wehavetake to go somewhere on foot,we must put on ourhathats . - So , I think if we have to go
somewheresomeplace on foot , we mustputputting on ourhathats . - So , I think if we have to go somewhere on foot , we must put on our
hat .chapeau ; - So
,I think ifwehave we togosomewhere go on foot , we must put on our hat . - So , I
thinkretrieve if wehavehaving togogoing somewhere onfoot ,substructure we mustputputting on our hat .
Document Level Examples
Input (clean sentence from Lang-8):
This morning I found out that one of my favourite bands released a new album .
I already forgot about Rise Against and it is a great surprise for me, because I haven't listened to them for 2 years .
I hope this band did n't become worse, like many others big ones did , and I 'll enjoy listening to it .
Well , I just have to get it and check it out .
Human (error example from Lang-8):
This morning I found out that one of my favourite bands band released a his new album . I already forgot about Rise Against and an it is a great surprise for me , because I have did n't listened return to them for 2 years . I hope this band did n't become worse , yet like many others big ones did , and I 'll enjoy listening to it . Well , I just have there remains to get it and check it out .
Output (artext):
- This morning I found out that one of my
favouritefavored bands released a new album . I already forgot aboutRise Againstgrow Agianst and itisare a great surprise for me , because I have n'tlistenedlisten to them for 2 years.Ihopehoping thisband didbands serve n't become worse , like many others big ones did , and I 'll enjoylisteningto listening it . Well , I justhavedeliver to get it and check it out . - This morning I found out that one of my favourite
bands releasedband a released new album . I alreadyforgotforget about RiseAgainstAigniast and it is a great surprise for me ,becauseI beceause have n't listened to them for 2yearsgeezerhood . Ihopehoping thisband didbands n'tbecome worse ,did becoming wore like manyothersother big onesdid ,didding ; and I 'll enjoy listening to it. WelleWll , I just have to get it and check it out . - This morning I found out
thatone that of my favourite bands released a newalbumalbums . I alreadyforgotforgotting about RiseAgainstAainst and itisbe a greatsurprisesurprisal for me , because Ihavehaving n'tlistenedlistneed tothemtem for 2 years . I hope this banddiddo n't becomeworse ,like many others big onesdiddidding , and I 'll enjoy listening to it . Well , I just have to get it andcheckchecking it out . - This morning I found out that one of my favourite bands released a new album . I already forgot
aboutabuot RiseAgainstAgaiinst and it is a greatsurprisesrrpuise for me , because I have n'tlistenedlistening to them for 2yearsyear . I hope this band did n't become worse , like manyothersbig other ones did , and I 'llenjoy listeningenjoying litening to it . Well , Ijustscarce have togetgetting it andcheck itchecking out it . - This
morningmornings Ifoundground outthathTat one of myfavourite bandsfavorite band released a new album . I alreadyforgotforget aboutRise Againstarise Agsinat and it is a greatsurprisesurprisal for me ,becauseIhavebecause n'tlistenedhave listen to them for 2yearsyear . I hope this band did n't becomeworsetough , like manyothersother big ones did , and I 'llenjoylistening enjoy to it.? Well , Ijusthardly have to get it and check it out . - This morning I
foundfnuod outthathtat one of my favourite bandsreleasedreleasing a newalbum . I already forgotaboutabut RiseAgainstAigainst and it is a greatsurprisesurprises for me ,becausebecuasae I have n't listened to them for 2yearsyear . I hope this band did n'tbecomebecoming worse , like manyothersother bigonesone did,and I 'llenjoylistening enjoying to it . Well , I justhave tohaving get to it and check it out.! - This morning I found out that one of
myfavourite my bandsreleasedrelease a new album . Ialready forgotalraedyy forgotting about RiseAgainstAagaianst and itisare a greatsurprisesurprises for me,. because I have n'tlistenedlisten to them for 2 years . I hope thisbanddid band n't become worse , likemanyothers big ones did , and I 'll enjoy listening to it . Well , I just have to get it and check it out . - This morning I
foundincur out that one of myfavouritefavored bandsreleasedreleaseed a newalbumalbums . I already forgot about RiseAgainstigAanst and it is agreatgrat surprisefor me , because Ihavehaving n'tlistenedlisten to them for 2 years.I hope this band did n't becomeworse , like many others bigones didone do , and I 'llenjoyenjoying listening to it . Well,: I justhavehaving togetgetting it and check it out . - This morning I
foundfounding outthathTat one of my favourite bandsreleasedreleasing anewnewfangled album . I alreadyforgotblock about RiseAgainstAganst and it is a great surprise for me ,becausebecuasee I have n't listened to them for 2 years . I hopethistthis band did n'tbecomebecoming worse,: like many others big ones did , and I 'll enjoy listening to it . Well , I justhavehaving to get it and check it out . . - This morning
Ifound I out that one of my favouritebands releasedband releasing a new album . I already forgot aboutRiseRising Against and itisagreatis Greeat surprise for me , because I have n't listened to them for 2 years . Ihopedesire this band did n't become worse , like many others big onesdiddidding , and I 'llenjoyenjoying listening to it . Well,? I just have to get it and check it out .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file artext-0.2.9.tar.gz
.
File metadata
- Download URL: artext-0.2.9.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5f4089b31d1e52b3599a04ba9ec6500bac48f728989f0e8544134c8092eb01e |
|
MD5 | 4914b0a01ea7dd6d954991416bd78b6e |
|
BLAKE2b-256 | a4607155dd4836c217cc780123f31e8a0f5f1e890ace765a0496d396c0cec7ef |
File details
Details for the file artext-0.2.9-py3-none-any.whl
.
File metadata
- Download URL: artext-0.2.9-py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ba0db1221d9b184300e88a93910b707181b2e226d05ea70db203552ea060790 |
|
MD5 | 8fb075e5720abe5a36e89f8e74cc844e |
|
BLAKE2b-256 | f6a5fabcfc5b8fcc43c4ae45b0b569594e56bd47d8571528ba34b90c099c30b6 |