Skip to main content

Automatic Speech Recognition Library for African Languages

Project description

Okwugbe

Automatic Speech Recognition Library for (low-resource) African Languages

Motivation

Our aim is to foster ASR for African languages by making the whole process--from dataset gathering and preprocessing to training--as easy as possible. This library follows our work Okwugbé on ASR for Fon and Igbo. Based on the architecture of the network described in our paper, it aims at easing the training process of ASR for other languages. The primary targets are African languages, but it supports other languages as well

Usage

pip install okwugbe

#Import the trainer instance
from train_eval import Train_Okwugbe 

train_path = '/path/to/training_file.csv'
test_path = '/path/to/testing_file.csv'
characters_set = '/path/to/character_set.txt'
 
"""
 /path/to/training_file.csv and /path/to/testing_file.csv are meant to be csv files with two columns:
    the first one containing the full paths to audio wav files
    the second one containing the textual transcription of audio contents
"""

#Initialize the trainer instance
train = Train_Okwugbe(train_path, test_path, characters_set)

#Start the training
train.run()

Parameters

Here are the parameters for the package, as well as their default values.

The default values have been chosen so that you only have to make minimal changes to get a good ASR model going.

Parameter Description default
use_common_voice Whether or not to use common voice False
lang language to use from Common Voice. Must be specified if use_common_voice is set to True. None
rnn_dim RNN Dimension & Hidden Size 512
num_layers Number of Layers 1
n_cnn Number of CNN components 5
n_rnn Number of RNN components 3
n_feats Number of features for the ResCNN 128
in_channels Number of input channels of the ResCNN 1
out_channels Number of output channels of the ResCNN 32
kernel Kernel Size for the ResCNN 3
stride Stride Size for the ResCNN 2
padding Padding Size for the ResCNN 1
dropout Dropout (kept unique for all components) 0.1
with_attention True to use attention mechanism, False else False
batch_multiplier Batch multiplier for Gradient Accumulation) 1 (no Gradient Accumulation)
grad_acc Gradient Accumulation Option False
model_path Path for the saved model './okwugbe_model'
characters_set Path to the .txt file containing unique characters required
validation_set Validation set size 0.2
train_path Path to training set required
test_path Path to testing set required
learning_rate Learning rate 3e-5
batch_size Batch Size 20
patience Early Stopping Patience 20
epochs Training epochs 500
optimizer Optimizer 'adamw'
freq_mask frequency masking (for speech augmentation) 30
time_mask time masking (for speech augmentation) 100
display_plot whether or not to plot metrics during training True

Integration with Common Voicee

You easily train on Common Voice data set with Okwugbe by specifying use_common_voice=True and setting lang to the language code of your choice. This language must be hosted on Common Voice.

#Initialize the trainer instance
train = Train_Okwugbe(use_common_voice=True, lang='mn') # for mongolian

#Start the training
train.run()

Here is the list of our current supported languages in Common Voice.

tt en de fr cy br cv tr ky ga-IE kab ca zh-TW sl it nl cnh eo et fa pt eu es zh-CN mn sah dv rw sv-SE ru id ar ta ia lv ja vot ab zh-HK rm-sursilv
tatar english german french welsh breton chuvash turkish kyrgyz irish kabyle catalan taiwanese slovenian italian dutch hakha chin esperanto estonian persian portuguese basque spanish chinese mongolian sakha dhivehi kinyarwanda swedish russian indonesian arabic tamil interlingua latvian japanese votic abkhaz cantonese romansh sursilvan

Tutorials

  • Open In Colab on using OkwuGbe
  • Open In Colab on using OkwuGbe with Common Voice

ASR Data for African languages

Wondering where to find dataset for your African language? Here are some resources to check:

Debugging

Open In Colab is strictly for debugging!

Citation

Please cite our paper using the citation below if you use our work in anyway:

@inproceedings{dossou-emezue-2021-okwugbe,
    title = "{O}kwu{G}b{\'e}: End-to-End Speech Recognition for {F}on and {I}gbo",
    author = "Dossou, Bonaventure F. P.  and
      Emezue, Chris Chinenye",
    booktitle = "Proceedings of the Fifth Workshop on Widening Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.winlp-1.1",
    pages = "1--4",
    abstract = "Language is a fundamental component of human communication. African low-resourced languages have recently been a major subject of research in machine translation, and other text-based areas of NLP. However, there is still very little comparable research in speech recognition for African languages. OkwuGb{\'e} is a step towards building speech recognition systems for African low-resourced languages. Using Fon and Igbo as our case study, we build two end-to-end deep neural network-based speech recognition models. We present a state-of-the-art automatic speech recognition (ASR) model for Fon, and a benchmark ASR model result for Igbo. Our findings serve both as a guide for future NLP research for Fon and Igbo in particular, and the creation of speech recognition models for other African low-resourced languages in general. The Fon and Igbo models source code have been made publicly available. Moreover, Okwugbe, a python library has been created to make easier the process of ASR model building and training.",
}```


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okwugbe-0.1.8.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

okwugbe-0.1.8-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file okwugbe-0.1.8.tar.gz.

File metadata

  • Download URL: okwugbe-0.1.8.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.6.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.9.5

File hashes

Hashes for okwugbe-0.1.8.tar.gz
Algorithm Hash digest
SHA256 669d875abf6f97939f75b0dbe47ce811bba43e4d4e5a731c843e2243b3637752
MD5 1a8cd759ad772e2c6cd57a70fb93bdd6
BLAKE2b-256 138b692a1f20f16cd4c9eb421e97566e849f53c45882f7a9f0905fe3a679a5af

See more details on using hashes here.

File details

Details for the file okwugbe-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: okwugbe-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.6.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.9.5

File hashes

Hashes for okwugbe-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8d3cce6f5bbbb4b5a7383376480c0bc4c04c94bc35304e2929ab83d80e291abf
MD5 0e5ffc44392b005b674493c66cce0dac
BLAKE2b-256 caf65b8a888dcdf5405b4b00a911bd8fecc5dee1b527a294b2e54ed1697115c7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page