State of the art translation for Indic languages.

These details have not been verified by PyPI

Project links

Homepage

Project description

Anuvaad

State of the art open-source translation models for Indic languages.

Installation

# CPU pytorch will be installed if torch is not installed
pip install --upgrade anuvaad

Usage

As a Python module

from anuvaad import Anuvaad
anu = Anuvaad("english-telugu")

# Single sentence translation
# beam_size is optional and defaults to 4
anu.anuvaad("YS Jagan is the chief minister of Andhra Pradesh.")
# "వైఎస్ జగన్ ఆంధ్రప్రదేశ్ ముఖ్యమంత్రి."

# Batch translation
anu.anuvaad(["YS Jagan is the chief minister of Andhra Pradesh.",
            "Nara Lokesh suffered a humiliating defeat in Mangalagiri."])
# ['వైఎస్ జగన్ ఆంధ్రప్రదేశ్ ముఖ్యమంత్రి.', 'మంగళగిరిలో నారా లోకేష్కు అవమానకరమైన ఓటమి ఎదురైంది.']

As a service

# Starting the api service
docker run -it -e BATCH_SIZE=1 -p 8080:8080 notaitech/anuvaad:english-telugu

# Running a prediction
curl -d '{"data": ["YS Jagan is the chief minister of Andhra Pradesh."]}' -H "Content-Type: application/json" -X POST http://localhost:8080/sync

Available Models	Anuvaad BLEU	Google BLEU
english-telugu	12.721173743764009	6.841437460383768
english-tamil	12.737036149214694	5.558450942590664
english-malayalam	17.785746646721996	19.569069412553812
english-kannada	7.888886041933815	3.2803251953567893
english-marathi	23.02755955392518	12.888112016722792
english-hindi	29.175892213216954	18.130893478614375
english-bengali
english-punjabi
english-gujarati

Google BLEU is calculated from translations generated by the GOOGLETRANSLATE() function on google sheets.
The testing scripts and data from Tatoeba is present at https://github.com/notAI-tech/Anuvaad-testing-scripts
https://docs.google.com/spreadsheets/d/1tYYZObELj-k6mJCnM6uf7xg3JChbSkjs8YvZsOgHacQ/edit?usp=sharing is the sheet containing the predictions from Anuvaad and GOOGLETRANSLATE function on the Tatoeba data from which the above scores are calculated.

My thoughts on the evaluation/accuracy of the model(s):

Unlike classification/ sequence labelling tasks, for open-domain translation or summarization systems it is very hard to quantify the accuracy through numbers.
This is because, most accuracy measurements actually measure the overlap of character/word n-grams between the expected output and predicted output.
These scores definitely help when evaluating/comparing multiple models on a particular dataset, but the number don't translate well for open-domain models.
For example, Anuvaad translates the sentence An advance is placed with the Medical Superintendents of such hospitals who then provide assistance on a case to case basis. (taken from http://data.statmt.org/pmindia/v1/parallel corpus) to ऐसे अस्पतालों के चिकित्सा अधीक्षकों के साथ एडवांस रखा जाता है, जिसके बाद मामले के आधार पर सहायता प्रदान की जाती है। where as the expected translation of the sentence from the dataset is अग्रिम धन राशि इन अस्पतालों को चिकित्सा निरीक्षकों को दी जाएगी, जो हर मामले को देखते हुए सहायता प्रदान करेंगे।.
In the above example, Although Anuvaad's translation is correct (in the sense that translation conveys the same thing as the original sentence), the BLEU score with n=3 will be 0.
Similarly, a model trained on the pmindia dataset will have bad score on a different dataset which uses a different style of writing, even if the translation is semantically correct.
Our aim in building Anuvaad is to build a general purpose, open-domain translation module that can flexibly translate text from various domains.
https://docs.google.com/spreadsheets/d/1_TTtBEvVgemQfGbRBSZYkECMMt5r7L9-dt0FGVUbmOY/edit?usp=sharing is a sheet comparing translations from Anuvaad, ilmulti (https://github.com/jerinphilip/ilmulti) and Google Translate (=GOOGLETRANSLATE(text, "en", "language") function on google sheets) on 100 randomly selected English sentences from Tatoeba.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.6

Apr 11, 2021

1.0.5

Dec 12, 2020

1.0.4

Dec 1, 2020

1.0.3

Nov 30, 2020

1.0.2

Nov 28, 2020

1.0.1

Nov 21, 2020

1.0.0

Nov 21, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anuvaad-1.0.6.tar.gz (6.6 kB view details)

Uploaded Apr 11, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anuvaad-1.0.6-py2.py3-none-any.whl (17.6 kB view details)

Uploaded Apr 11, 2021 Python 2Python 3

File details

Details for the file anuvaad-1.0.6.tar.gz.

File metadata

Download URL: anuvaad-1.0.6.tar.gz
Upload date: Apr 11, 2021
Size: 6.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for anuvaad-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`6e647ba7aa29f0d501777c2951f1cb6746f3a182a935108f225f8f717b28676b`
MD5	`139fd68a165f369f4aca2f00de7de29f`
BLAKE2b-256	`3f1cc445f65fabec1f31e5d1b46cb7c562d5d2423eb6aadfaffaa5b78bc403d8`

See more details on using hashes here.

File details

Details for the file anuvaad-1.0.6-py2.py3-none-any.whl.

File metadata

Download URL: anuvaad-1.0.6-py2.py3-none-any.whl
Upload date: Apr 11, 2021
Size: 17.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for anuvaad-1.0.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`56f4d6efd9ef890dba65fec0a547ac7195e957d74366f23daf140484b9c8438b`
MD5	`0da940a216385e8dd87dac2d189272dc`
BLAKE2b-256	`c5b9ac235b904de10b2e10d2ae6d2a7de62b57e765e05cc84b6671d39c4d3111`

See more details on using hashes here.

anuvaad 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Anuvaad

Installation

Usage

My thoughts on the evaluation/accuracy of the model(s):

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes