Skip to main content

Neural Machine Translation for African Languages

Project description

Ukuxhumana

"Ukuxhumana" means "Communicate" in Zulu. This project is aimed at exploring ideas for using Neural Machine Translation for low-resource languages - specifically for the official languages of South Africa.

Data

Parallel Corpuses

Our parallel corpuses are from the Autshumato project. The datasets contain data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents

Models

Two main architectures are used throughout this project, namely Convolutional Sequence to Sequence by Gehring et. al. and Transformer by Vaswani et. al. Fairseq(-py) and Tensor2Tensor were used in modeling these techniques respectively.

Results

Results are given in BLEU.

Baseline

English -> Language

Model Setswana isiZulu Northern Sotho Xitsonga Afrikaans
Convolutional Seq2Seq 27.77 (24.18) 0.62 (0.28) 15.35 (7.41) 36.96 16.17
Convolutional Seq2Seq (40K BPE) 23.83 1.44 4.89 34.28 21.06
Convolutional Seq2Seq (8K BPE) 2.19 15.45 26.78
Transformer (uncased) 33.53 4.55 29.23 47.37 35.26
Transformer (cased) 33.12 4.45 28.71 46.95 34.81
Transformer (40k BPE) (uncased) 4.29
Transformer (40k BPE) (cased) 4.14
Transformer (8k BPE) (uncased)
Transformer (8k BPE) (cased)

Language -> English

Model Setswana isiZulu Northern Sotho Xitsonga Afrikaans
Convolutional Seq2Seq
Transformer (uncased)
Transformer (cased)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukuxhumana-0.0.1.tar.gz (10.4 kB view hashes)

Uploaded Source

Built Distribution

ukuxhumana-0.0.1-py3-none-any.whl (36.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page