Transliterations to/from Indian languages
Project description
Transliterations to/from Indian languages are still generally low quality. One problem is access to data. Another is that there is no standard transliteration. For Hindi–English, we build novel dataset for names using the ESPNcricinfo. For instance, see here for hindi version of the english scorecard. We also create a dataset from election affidavits We also exploit the Google Dakshina dataset.
To overcome the fact that there isn’t one standard way of transliteration, we provide k-best transliterations.
Install
We strongly recommend installing indicate inside a Python virtual environment (see venv documentation)
pip install indicate
General API
transliterate.hindi2english will take Hindi text and translate into English.
Examples
from indicate import transliterate english_translated = transliterate.hindi2english("हिंदी") print(english_translated)
output - hindi
Functions
We expose 1 function, which will take Hindi text and transliterate it to English.
transliterate.hindi2english(input)
What it does:
Converts given hindi text into English alphabet
Output
Returns text in English
Data
The datasets used to train the model:
ESPN Cric Info for hindi version of the english scorecard.
Evaluation
Model was evaluated on test dataset of Google Dakshina dataset, Model predicted 73.64% exact matches. Indic-trans predicted 63.12% exact matches on Google Dakshina dataset. Below is the edit distance metrics on test dataset (0.0 mean exact match, the farther away from 0.0, the difference is more between predicted text and actual text)
Contributor Code of Conduct
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.
License
The package is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file indicate-0.1.0.tar.gz
.
File metadata
- Download URL: indicate-0.1.0.tar.gz
- Upload date:
- Size: 66.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2a9d9fc4d772bdd5d6eae78959c9ecd88b912ee189445aa2e90e3874d617a0d |
|
MD5 | 6257b1ffcd0aef1b227b2c8614ea5ce0 |
|
BLAKE2b-256 | 42c7b9f6897a1de8aaecc6b85d47cce12b560fc7990ad0006bc032863c756d7a |
File details
Details for the file indicate-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: indicate-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 66.8 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d168e534d7cf2d158925f873791f4108d554178a2fbb3d8c676997695a549799 |
|
MD5 | 6a6cf1afc4844f3efb7bcd0cb52fcc63 |
|
BLAKE2b-256 | 8eeac9ac6b868e6f91b15d401f53ce1a9ec22143595d5d70f27a290e2e711f70 |