Skip to main content

This project is about the NLP task for text perturbation using different methods.

Project description

Text Perturbation

This project is about the NLP task for text perturbation using different perturbation methods. There are around 11 different methods.

  1. delete_random_word : deletes randomly words
  2. replace_synonyms : replaces a word with a synonym
  3. backtranslation: translates english to german and back to english
  4. paraphrase_using_bart: paraphrase the sentence
  5. replace_with_hypernyms: replaces a word with hypernym
  6. random_german_word: replace random word with a german word
  7. predict_masked_word: masks a word from the sentence, then predicts using MLM
  8. misspelling: simple typoes
  9. random_char_insertion: randomly inserion character in a word
  10. random_char_swaps: swaps character in a word
  11. ocr_augmentation: augment word same as ocr.

Example to text perturbation:

# Example
from text_perturbation import perturbation
text = "hello, how are you?"

perturbate = perturbation.Perturbate(text)
perturbated_text = perturbate.random_german_word()

print(f"originial::: {text}")
print(f"perturbated_text::: {perturbated_text}")

Note: Please ensure the nltk data is downloaded. If not, use the following code to download.

# Example installation command
>>> import nltk
>>> nltk.download()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_perturbation-0.1.5.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text_perturbation-0.1.5-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file text_perturbation-0.1.5.tar.gz.

File metadata

  • Download URL: text_perturbation-0.1.5.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for text_perturbation-0.1.5.tar.gz
Algorithm Hash digest
SHA256 3e736220ecce5cce04a1796a5f1dc1fdad1cfaddedf91e9f86f9375616c9d251
MD5 0172a96e3728f739cd9a3a2cd34b9497
BLAKE2b-256 66d609fb83920205788b3e6407de75a972d7a549918801c7ffcb8c2889cbbdba

See more details on using hashes here.

File details

Details for the file text_perturbation-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for text_perturbation-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 dc4d0e046f529cd389135f4220496f84d78ed68e818f9c42ea8f525de3f2aa98
MD5 4642f060979043f05ff0489076694dfa
BLAKE2b-256 46c34eca919276b87a0884ae6992d9a4ea203151b8f95929292dbb828bf555b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page