bnaug is a text augmentation tool for Bangla text.
Project description
bnaug (Bangla Text Augmentation)
bnaug is a text augmentation tool for Bangla text.
Installation
pip install bnaug
- Dependencies
- pytorch >=1.7.0
Demo Notebook
Necessary Model Links
Sentence Augmentation
Token Replacement
-
Mask generation based augmentation
from bnaug.sentence import TokenReplacement tokr = TokenReplacement() text = "আমি ঢাকায় বাস করি।" output = tokr.masking_based(text, sen_n=5)
-
Word2Vec based augmentation
from bnaug.sentence import TokenReplacement tokr = TokenReplacement() text = "আমি ঢাকায় বাস করি।" model = "msc/bangla_word2vec/bnwiki_word2vec.model" output = tokr.word2vec_based(text, model=model, sen_n=5, word_n=5) print(output)
-
Glove based augmentation
from bnaug.sentence import TokenReplacement tokr = TokenReplacement() text = "আমি ঢাকায় বাস করি।" vector = "msc/bn_glove.300d.txt" output = tokr.glove_based(text, vector_path=vector, sen_n=5, word_n=5) print(output)
Back Translation
Back translation based augmentation first translate Bangla sentence to English and then again translate the English to Bangla.
from bnaug.sentence import BackTranslation
bt = BackTranslation()
text = "বাংলা ভাষা আন্দোলন তদানীন্তন পূর্ব পাকিস্তানে সংঘটিত একটি সাংস্কৃতিক ও রাজনৈতিক আন্দোলন। "
output = bt.get_augmented_sentences(text)
print(output)
Text Generation
- Paraphrase generation
from bnaug.sentence import TextGeneration
tg = TextGeneration()
text = "বিমানটি যখন মাটিতে নামার জন্য এয়ারপোর্টের কাছাকাছি আসছে, তখন ল্যান্ডিং গিয়ারের খোপের ঢাকনাটি খুলে যায়।"
output = tg.parapharse_generation(text)
print(output)
Random Augmentation
-
Random remove part and generate new sentence
At present it's removing word, stopwords, punctuations, numbers and generate new sentences
from bnaug.sentence import RandomAugmentation raug = RandomAugmentation() sentence = "আমি ১০০ বাকি দিলাম" output = raug.random_remove(sentence) print(output)
or apply individually
from bnaug import randaug text = "১০০ বাকি দিলাম" output = randaug.remove_digits(text) print(output) text = "১০০! বাকি দিলাম?" output = randaug.remove_punctuations(text) print(output) text = "আমি ১০০ বাকি দিলাম" randaug.remove_stopwords(text) print(output) text = "আমি ১০০ বাকি দিলাম" randaug.remove_random_word(text) print(output) text = "আমি ১০০ বাকি দিলাম" randaug.remove_random_char(text) print(output)
Inspired from
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bnaug-1.1.2.tar.gz
(5.1 kB
view details)
Built Distribution
bnaug-1.1.2-py3-none-any.whl
(4.8 kB
view details)
File details
Details for the file bnaug-1.1.2.tar.gz
.
File metadata
- Download URL: bnaug-1.1.2.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c524078fceb1b2edbef5b0a7e7a4cccc912333e6d5248412ef1a59a6ee18d2f9 |
|
MD5 | a7ca049f0e36bde4f944b882849bd3d9 |
|
BLAKE2b-256 | 15efba3f00852c102db73029c8cec8ae96c10a2cada28a8b2db4610b114db91c |
File details
Details for the file bnaug-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: bnaug-1.1.2-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb069f70fa0f7af3fcf2f58d5a9d6f1cf8f0c18fe3b1030630bb75795e0e9fe5 |
|
MD5 | 6ce6e580f6e55c0a43501a73bcfd5766 |
|
BLAKE2b-256 | 198a7a0e8389bec7c8694270d13979d93e946261f2497465af5c9fc6f53782c9 |