Skip to main content

Neural Machine Translation for African Languages

Project description


"Ukuxhumana" means "Communicate" in Zulu. This project is aimed at exploring ideas for using Neural Machine Translation for low-resource languages - specifically for the official languages of South Africa.


Parallel Corpuses

Our parallel corpuses are from the Autshumato project. The datasets contain data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents


Two main architectures are used throughout this project, namely Convolutional Sequence to Sequence by Gehring et. al. and Transformer by Vaswani et. al. Fairseq(-py) and Tensor2Tensor were used in modeling these techniques respectively.


Results are given in BLEU.


English -> Language

Model Setswana isiZulu Northern Sotho Xitsonga Afrikaans
Convolutional Seq2Seq 27.77 (24.18) 0.62 (0.28) 15.35 (7.41) 36.96 16.17
Convolutional Seq2Seq (40K BPE) 23.83 1.44 4.89 34.28 21.06
Convolutional Seq2Seq (8K BPE) 2.19 15.45 26.78
Transformer (uncased) 33.53 4.55 29.23 47.37 35.26
Transformer (cased) 33.12 4.45 28.71 46.95 34.81
Transformer (40k BPE) (uncased) 4.29
Transformer (40k BPE) (cased) 4.14
Transformer (8k BPE) (uncased)
Transformer (8k BPE) (cased)

Language -> English

Model Setswana isiZulu Northern Sotho Xitsonga Afrikaans
Convolutional Seq2Seq
Transformer (uncased)
Transformer (cased)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ukuxhumana, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size ukuxhumana-0.0.1-py3-none-any.whl (36.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size ukuxhumana-0.0.1.tar.gz (10.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page