Skip to main content

bnaug is a text augmentation tool for Bangla text.

Project description

bnaug (Bangla Text Augmentation)

bnaug is a text augmentation tool for Bangla text.

Installation

pip install bnaug
  • Dependencies
    • pytorch >=1.7.0

Necessary Model Links

Sentence Augmentation

Token Replacement

  • Mask generation based augmentation

    from bnaug.sentence import TokenReplacement
    
    tokr = TokenReplacement()
    text = "আমি ঢাকায় বাস করি।"
    output = tokr.masking_based(text, sen_n=5)
    
  • Word2Vec based augmentation

    from bnaug.sentence import TokenReplacement
    
    tokr = TokenReplacement()
    text = "আমি ঢাকায় বাস করি।"
    model = "msc/bangla_word2vec/bnwiki_word2vec.model"
    output = tokr.word2vec_based(text, model=model, sen_n=5, word_n=5)
    print(output)
    
  • Glove based augmentation

    from bnaug.sentence import TokenReplacement
    
    tokr = TokenReplacement()
    text = "আমি ঢাকায় বাস করি।"
    vector = "msc/bn_glove.300d.txt"
    output = tokr.glove_based(text, vector_path=vector, sen_n=5, word_n=5)
    print(output)
    

Back Translation

Back translation based augmentation fist translate Bangla sentence to English and then again translate the English to Bangla.

from bnaug.sentence import BackTranslation

bt = BackTranslation()
text = "বাংলা ভাষা আন্দোলন তদানীন্তন পূর্ব পাকিস্তানে সংঘটিত একটি সাংস্কৃতিক ও রাজনৈতিক আন্দোলন। "
output = bt.get_augmented_sentences(text)
print(output)

Text Generation

  • Paraphrase generation
from bnaug.sentence import TextGeneration

tg = TextGeneration()
text = "বিমানটি যখন মাটিতে নামার জন্য এয়ারপোর্টের কাছাকাছি আসছে, তখন ল্যান্ডিং গিয়ারের খোপের ঢাকনাটি খুলে যায়।"
output = tg.parapharse_generation(text)
print(output)

Inspired from

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnaug-1.0.0.dev1.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

bnaug-1.0.0.dev1-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file bnaug-1.0.0.dev1.tar.gz.

File metadata

  • Download URL: bnaug-1.0.0.dev1.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for bnaug-1.0.0.dev1.tar.gz
Algorithm Hash digest
SHA256 f1818a50e75a67ddc1d2dab6329afcbb1bf25e15819f04a6bb2deb7f1ee1d4ef
MD5 68bde63e303e6ea554f4e7c17e3e51e7
BLAKE2b-256 60ca05f6ffa4a7f8cdb849a1db8a87f9618047f19182d4fdb907ea6f40428944

See more details on using hashes here.

File details

Details for the file bnaug-1.0.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: bnaug-1.0.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.15

File hashes

Hashes for bnaug-1.0.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 9bec1898b4a51ff05b228e79166b011c4d98fbb7a2f51752b2f57ec8f8f67b68
MD5 b704b8046b224f2b5018541d7c3f4403
BLAKE2b-256 b84c39afe3ae3fde4f192ca77045ed287d40e164e3da3b9e0317e5e082077d59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page