Skip to main content

Neural Machine Translation for African Languages

Project description

Ukuxhumana

"Ukuxhumana" means "Communicate" in Zulu. This project is aimed at exploring ideas for using Neural Machine Translation for low-resource languages - specifically for the official languages of South Africa.

Data

Parallel Corpuses

Our parallel corpuses are from the Autshumato project. The datasets contain data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents

Models

Two main architectures are used throughout this project, namely Convolutional Sequence to Sequence by Gehring et. al. and Transformer by Vaswani et. al. Fairseq(-py) and Tensor2Tensor were used in modeling these techniques respectively.

Results

Results are given in BLEU.

Baseline

English -> Language

Model Setswana isiZulu Northern Sotho Xitsonga Afrikaans
Convolutional Seq2Seq 27.77 (24.18) 0.62 (0.28) 15.35 (7.41) 36.96 16.17
Convolutional Seq2Seq (40K BPE) 23.83 1.44 4.89 34.28 21.06
Convolutional Seq2Seq (8K BPE) 2.19 15.45 26.78
Transformer (uncased) 33.53 4.55 29.23 47.37 35.26
Transformer (cased) 33.12 4.45 28.71 46.95 34.81
Transformer (40k BPE) (uncased) 4.29
Transformer (40k BPE) (cased) 4.14
Transformer (8k BPE) (uncased)
Transformer (8k BPE) (cased)

Language -> English

Model Setswana isiZulu Northern Sotho Xitsonga Afrikaans
Convolutional Seq2Seq
Transformer (uncased)
Transformer (cased)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukuxhumana-0.0.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

ukuxhumana-0.0.1-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file ukuxhumana-0.0.1.tar.gz.

File metadata

  • Download URL: ukuxhumana-0.0.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.3

File hashes

Hashes for ukuxhumana-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1382e77684453dd0d6f0dfb2d11a47cc81ab2dee7b037ce5219393e5e6ee4c7d
MD5 6f0ebd22c542dacb01da8b80abfe83a4
BLAKE2b-256 10257b1110c63e4cbb0b6a73e5fee079d121aa555265c0160dfb39414a9672f1

See more details on using hashes here.

File details

Details for the file ukuxhumana-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ukuxhumana-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.3

File hashes

Hashes for ukuxhumana-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9c3fa48ee692245aecbecfdd0990cc6694f08345f3823d3d048321a77917e83f
MD5 d0b055d4f1e0cb0f34fc592242b37286
BLAKE2b-256 ad6fcb42168db2fcb14899b77f2091a515530b12b2b50a2c5b5a9f43f3575591

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page