TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).
Project description
Turjuman, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA) that described in our OSACT5 2022 paper “TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation”.
TURJUMAN exploits the recently-introduced our text-to-text Transformer AraT5 model , endowing it with a powerful ability to decode into Arabic. The toolkit offers the possibility of employing a number of diverse decoding methods, making it suited for acquiring paraphrases for the MSA translations as an added value. To train TURJUMAN, we sample from publicly available parallel data employing a simple semantic similarity method to ensure data quality.
GitHub link: https://github.com/UBC-NLP/turjuman
Online demo link: https://demos.dlnlp.ai/turjuman
Getting Started
The full documentation contains instructions for getting started, translation using diffrent methods, intergrate Turjuman with your code, and provides more examples.
License
turjuman(-py) is Apache-2.0 licensed. The license applies to the pre-trained models as well.
Citation
If you use TURJUMAN toolkit or the pre-trained models for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows:
@inproceedings{nagoudi-osact5-2022-turjuman, title={TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation}, author={Nagoudi, El Moatez Billah and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad}, booktitle = "Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5)", month = "June", year = "2022", address = "Marseille, France", publisher = "European Language Resource Association", }
7. Acknowledgments
We gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2018-04267), the Social Sciences and Humanities Research Council of Canada (SSHRC; 435-2018-0576; 895-2020-1004; 895-2021-1008), ComputeCanada (CC) and UBC ARC-Sockeye and Advanced Micro Devices, Inc. (AMD). Any opinions, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSERC, SSHRC, CFI, CC, AMD, or UBC ARC-Sockeye.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file turjuman-1.0.5.tar.gz
.
File metadata
- Download URL: turjuman-1.0.5.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95b2e5b7e500b800ceb4c6d64291011547e67b92ef3392be8672ef35189e0e99 |
|
MD5 | 63d2f8d7c7f51f7734be543f7731ebee |
|
BLAKE2b-256 | 30cf664bcefade9841cd7258d49f5e3a15b499e23a7f7eb4f9243d21440c095a |