Korean Translation and Augmentation with fine-tuned NLLB
Project description
KoTAN: Korean Translation and Augmentation with fine-tuned NLLB
A KoTAN package can exercise korean data augmentation task and en->ko, ko->en translation task.
In case of translation model, we are fine-tuning facebook NLLB model. About data augmentation task, we processe backtranslation task.
In addition, we also provide speech-style conversion options.
Package install
torch=2.0.0 (cuda 12.0)andpython>=3.8are avaliable.- You can install the package with below command.
pip3 install kotan
Usage
- You can use
KoTANwith below command. - Import package.
>>> from kotan import KoTAN
- Avaliable tasks
>>> KoTAN.available_tasks()
- Avaliable languages
>>> KoTAN.available_lang()
- Data augmentation options
>>> KoTAN.available_level()
- origin: Before fine-tuning nllb model.
- fine: After fine-tuning nllb model.
- Speech-style conversion options
>>> KoTAN.available_style()
- formal: 문어체
- informal: 구어체
- android: 안드로이드
- azae: 아재
- chat: 채팅
- choding: 초등학생
- emoticon: 이모티콘
- enfp: enfp
- gentle: 신사
- halbae: 할아버지
- halmae: 할머니
- joongding: 중학생
- king: 왕
- naruto: 나루토
- seonbi: 선비
- sosim: 소심한
- translator: 번역기
Translation
>>> from kotan import KoTAN
>>> mt = KoTAN(task="translation", tgt="en")
>>> inputs = ['나는 온 세상 사람들이 행복해지길 바라', '나는 선한 영향력을 펼치는 사람이 되고 싶어']
>>> mt.predict(inputs)
Data Augmentation
Origin nllb model (before fine-tuning)
>>> from kotan import KoTAN
>>> aug = KoTAN(task="augmentation", level="origin")
>>> inputs = ['나는 온 세상 사람들이 행복해지길 바라', '나는 선한 영향력을 펼치는 사람이 되고 싶어']
>>> aug.predict(inputs)
Fine-tuned nllb model with Aihub datasets.
>>> from kotan import KoTAN
>>> aug = KoTAN(task="augmentation", level="fine")
>>> inputs=['나는 온 세상 사람들이 행복해지길 바라', '나는 선한 영향력을 펼치는 사람이 되고 싶어']
>>> aug.predict(inputs)
Apply style-convert option.
>>> from kotan import KoTAN
>>> aug = KoTAN(task="augmentation", style="chat")
>>> inputs=['나는 온 세상 사람들이 행복해지길 바라', '나는 선한 영향력을 펼치는 사람이 되고 싶어']
>>> aug.predict(inputs)
Speech-style conversion
>>> from kotan import KoTAN
>>> style = KoTAN(task="augmentation", style="king")
>>> inputs=['나는 온 세상 사람들이 행복해지길 바라', '나는 선한 영향력을 펼치는 사람이 되고 싶어']
>>> style.predict(inputs)
Demo
Citation
@misc{KoTAN,
author = {Juhwan Lee and Jisu Kim},
title = {KoTAN: Korean Translation and Augmentation with fine-tuned NLLB},
howpublished = {\url{https://github.com/KoJLabs/KoTAN}},
year = {2023},
}
Contributors
License
KoTAN project follow Apache License 2.0 lisence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kotan-1.0.0.tar.gz.
File metadata
- Download URL: kotan-1.0.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.8.16 Linux/4.15.0-206-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0a64693df8c8bb89b3e82489ed6e6af88a7301480c1341bea7c203a7d40c448
|
|
| MD5 |
1c9b7df82d9cf103a5b4a3471eead79e
|
|
| BLAKE2b-256 |
6aed67c81b0c2c98982c288733468c928a9725c85a791dede3c797c9ca691cf8
|
File details
Details for the file kotan-1.0.0-py3-none-any.whl.
File metadata
- Download URL: kotan-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.8.16 Linux/4.15.0-206-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b14ef2581a0ca3a28d92d12d361435127b402cf55514297cf842a8cf58985c1
|
|
| MD5 |
85796edb39753ecadce82f844058e12a
|
|
| BLAKE2b-256 |
3af6b2f1c0a3dd7d3a63bcdb8bec606ae3ae0189bbe82cf0016c6849096de9f2
|