Skip to main content

Data augmentation techniques for text data using back translation and synonym replacement

Project description

Introduction

This package provides data augmentation techniques for text data using back translation and synonym replacement. Data augmentation is a technique to increase the size of the training data by generating new samples from the existing ones. This can be helpful in improving the performance of machine learning models by making them more robust and generalizable.

Installation

You can install the package using pip:

pip install TextBooster

Usage

The package provides two functions for data augmentation: back_Translation() and synonym_Replacement().

Back Translation:

This function performs back translation of the given text to a randomly chosen target language and then back to the source language. This can be useful in generating new samples with different sentence structures and word choices.

This function performs back translation of the given text to a randomly chosen target language and then back to the source language. This can be useful in generating new samples with different sentence structures and word choices.

from TextBooster import back_Translation



text = "The quick brown fox jumps over the lazy dog."

augmented_texts = back_Translation(text, nbre_samples=3)



print(augmented_texts)

Output:

['The quick brown fox jumps over the indolent dog.',

 'The quick brown fox jumped over the lazy dog.',

 'The quick brown fox jumps over the torpid dog.']

Synonym Replacement:

This function replaces each word in the given text with one of its synonyms, chosen randomly. This can be useful in generating new samples with different word choices.

from TextBooster import synonym_Replacement



text = "The quick brown fox jumps over the lazy dog."

augmented_texts = synonym_Replacement(text)



print(augmented_texts)

Output:

'The speedy brown slyboots jump over the slothful frank.'

Note that the function returns a single augmented sample. To generate multiple samples, you can call the function multiple times or specify the number of samples to generate using the nbre_samples parameter:

from textaugment import synonym_Replacement



text = "The quick brown fox jumps over the lazy dog."

augmented_texts = synonym_Replacement(text, nbre_samples=3)



print(augmented_texts)

Output:

['The speedy brown fox jumps over the lazy dog.',

 'The quick brown fox jumps over the indolent dog.',

 'The quick brown fox jumps over the slothful dog.']

Dependencies:

The package has the following dependencies:

  • deep_translator

  • nltk

  • spacy

  • textblob

You can install them using pip:

pip install deep_translator nltk spacy textblob

Additionally, you need to download the following data files for nltk and spacy:

nltk.download("wordnet")

nltk.download("punkt")

nltk.download("omw-1.4")



python -m spacy download en

License:

This package is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TextBooster-0.2.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

TextBooster-0.2-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file TextBooster-0.2.tar.gz.

File metadata

  • Download URL: TextBooster-0.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for TextBooster-0.2.tar.gz
Algorithm Hash digest
SHA256 8ea8afecd2ebc8667ac62be259c9f3bf1eee8028f0d17b5ee9f41ecfb171ad55
MD5 e0a859470baa7685d97b9c7543c2e437
BLAKE2b-256 f18aadb429090b9e0290e319fcc9a920c7f10a1e133af554362e3e284772a975

See more details on using hashes here.

File details

Details for the file TextBooster-0.2-py3-none-any.whl.

File metadata

  • Download URL: TextBooster-0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for TextBooster-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a37a329d1fbd2b14fb703e86602a1db0b2e575f8ed34eca28da8c7ebbc634032
MD5 7640d983166ae5e2d3b80ff6a7d9f58b
BLAKE2b-256 76edc2f9e5c1b2bc13b31e0cdae106423e17a380dc9a3a32ea16047b32045aff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page