Skip to main content

TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).

Project description

Turjuman, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA) that described in our OSACT5 2022 paper “TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation”.

TURJUMAN exploits the recently-introduced our text-to-text Transformer AraT5 model , endowing it with a powerful ability to decode into Arabic. The toolkit offers the possibility of employing a number of diverse decoding methods, making it suited for acquiring paraphrases for the MSA translations as an added value. To train TURJUMAN, we sample from publicly available parallel data employing a simple semantic similarity method to ensure data quality.

GitHub link: https://github.com/UBC-NLP/turjuman

Online demo link: https://demos.dlnlp.ai/turjuman

Getting Started

The full documentation contains instructions for getting started, translation using diffrent methods, intergrate Turjuman with your code, and provides more examples.

License

turjuman(-py) is Apache-2.0 licensed. The license applies to the pre-trained models as well.

Citation

If you use TURJUMAN toolkit or the pre-trained models for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows:

@inproceedings{nagoudi-osact5-2022-turjuman,
  title={TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation},
  author={Nagoudi, El Moatez Billah and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad},
  booktitle = "Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5)",
  month = "June",
  year = "2022",
  address = "Marseille, France",
  publisher = "European Language Resource Association",
}

7. Acknowledgments

We gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2018-04267), the Social Sciences and Humanities Research Council of Canada (SSHRC; 435-2018-0576; 895-2020-1004; 895-2021-1008), ComputeCanada (CC) and UBC ARC-Sockeye and Advanced Micro Devices, Inc. (AMD). Any opinions, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSERC, SSHRC, CFI, CC, AMD, or UBC ARC-Sockeye.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turjuman-1.0.5.tar.gz (9.6 kB view details)

Uploaded Source

File details

Details for the file turjuman-1.0.5.tar.gz.

File metadata

  • Download URL: turjuman-1.0.5.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.10

File hashes

Hashes for turjuman-1.0.5.tar.gz
Algorithm Hash digest
SHA256 95b2e5b7e500b800ceb4c6d64291011547e67b92ef3392be8672ef35189e0e99
MD5 63d2f8d7c7f51f7734be543f7731ebee
BLAKE2b-256 30cf664bcefade9841cd7258d49f5e3a15b499e23a7f7eb4f9243d21440c095a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page