Skip to main content

Pronunciation and Transliteration module trained on CMU pronouncing dictionary, IIT Bombay and IIT Kharagpur text corpora

Project description

Code style License Maintenance versions

Transly

Transly is a sequence to sequence Bi-directional LSTM Encoder-Decoder model with Bahdanau Attention that’s trained on the CMU pronouncing dictionary, IIT Bombay English-Hindi Parallel Corpus and IIT Kharagpur transliteration corpus.

The pronunciation module in Transly can predict pronunciation of any given word (with an American accent of course!)

Take any word of any language - just transliterate the word in English (all capitals) and you are good to go. Be it a new or old, seen or unseen, sensible or insensible word - Transly can catch’em all!

Another module in Transly is the transliteration module. It currently supports Hindi to English and English to Hindi transliterations.

Pre-trained models can be found inside the respective trained_models folders. New models can also be trained on custom data.

Installation

Use the package manager pip to install transly

pip install transly

Usage

Pronunciation

Using the pre-trained pronunciation model

import transly.pronunciation as tp

# let's try a hindi word
# the prediction accent would be American
QUERY = 'MAKAAN'
a = tp.load_model(model_path='cmu')
a.infer(QUERY, separator=" ")
# use infer_batch function to infer batches
# use beamsearch function to perform a beam search

>> 'M AH0 K AA1 N'

Training a new model on custom data

from transly.seq2seq.config import SConfig
from transly.seq2seq.version0 import Seq2Seq

config = SConfig(training_data_path=training_data_path, input_mode='character_level', output_mode='word_level')
s2s = Seq2Seq(config)
s2s.fit()
s2s.save_model(path_to_model=model_path, model_file_name=model_file_name)

Training data file should be a csv with two columns, the input and the output

Input

Output

AA

AA1

AABERG

AA1 B ER0 G

AACHEN

AA1 K AH0 N

AACHENER

AA1 K AH0 N ER0

Transliteration

Hindi to English

Using the pre-trained model

import transly.transliteration as tl

QUERY = 'निखिल'
a = tl.load_model(model_path='hi2en')
a.infer(QUERY)
# use infer_batch function to infer batches
# use beamsearch function to perform a beam search

>> 'NIKHIL'

English to Hindi

Using the pre-trained model

import transly.transliteration as tl

QUERY = 'NIKHIL'
a = tl.load_model(model_path='en2hi')
a.infer(QUERY)
# use infer_batch function to infer batches
# use beamsearch function to perform a beam search

>> 'निखिल'

Training a new model on custom data

from transly.seq2seq.config import SConfig
from transly.seq2seq.version0 import Seq2Seq

config = SConfig(training_data_path=training_data_path)
s2s = Seq2Seq(config)
s2s.fit()
s2s.save_model(path_to_model=model_path, model_file_name=model_file_name)

License

The Python code in this module is distributed with Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transly-0.1.3.tar.gz (32.0 MB view hashes)

Uploaded Source

Built Distribution

transly-0.1.3-py3-none-any.whl (32.0 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page