Skip to main content

State of the art translation for Indic languages.

Project description

Anuvaad

State of the art open-source translation models for Indic languages.

Installation

# CPU pytorch will be installed if torch is not installed
pip install --upgrade anuvaad

Usage

As a Python module

from anuvaad import Anuvaad
anu = Anuvaad("english-telugu")

# Single sentence translation
# beam_size is optional and defaults to 4
anu.anuvaad("YS Jagan is the chief minister of Andhra Pradesh.")
# "వైఎస్ జగన్ ఆంధ్రప్రదేశ్ ముఖ్యమంత్రి."

# Batch translation
anu.anuvaad(["YS Jagan is the chief minister of Andhra Pradesh.",
            "Nara Lokesh suffered a humiliating defeat in Mangalagiri."])
# ['వైఎస్ జగన్ ఆంధ్రప్రదేశ్ ముఖ్యమంత్రి.', 'మంగళగిరిలో నారా లోకేష్కు అవమానకరమైన ఓటమి ఎదురైంది.']

As a service

# Starting the api service
docker run -it -e BATCH_SIZE=1 -p 8080:8080 notaitech/anuvaad:english-telugu

# Running a prediction
curl -d '{"data": ["YS Jagan is the chief minister of Andhra Pradesh."]}' -H "Content-Type: application/json" -X POST http://localhost:8080/sync
Available Models
english-telugu
english-tamil
english-malayalam
english-kannada
english-marathi
english-hindi

My thoughts on the evaluation/accuracy of the model(s):

  1. Unlike classification/ sequence labelling tasks, for open-domain translation or summarization systems it is very hard to quantify the accuracy through numbers.
  2. This is because, most accuracy measurements actually measure the overlap of character/word n-grams between the expected output and predicted output.
  3. These scores definitely help when evaluating/comparing multiple models on a particular dataset, but the number don't translate well for open-domain models.
  4. For example, Anuvaad translates the sentence An advance is placed with the Medical Superintendents of such hospitals who then provide assistance on a case to case basis. (taken from http://data.statmt.org/pmindia/v1/parallel corpus) to ऐसे अस्पतालों के चिकित्सा अधीक्षकों के साथ एडवांस रखा जाता है, जिसके बाद मामले के आधार पर सहायता प्रदान की जाती है। where as the expected translation of the sentence from the dataset is अग्रिम धन राशि इन अस्पतालों को चिकित्सा निरीक्षकों को दी जाएगी, जो हर मामले को देखते हुए सहायता प्रदान करेंगे।.
  5. In the above example, Although Anuvaad's translation is correct (in the sense that translation conveys the same thing as the original sentence), the BLEU score with n=3 will be 0.
  6. Similarly, a model trained on the pmindia dataset will have bad score on a different dataset which uses a different style of writing, even if the translation is semantically correct.
  7. Our aim in building Anuvaad is to build a general purpose, open-domain translation module that can flexibly translate text from various domains.
  8. https://docs.google.com/spreadsheets/d/1_TTtBEvVgemQfGbRBSZYkECMMt5r7L9-dt0FGVUbmOY/edit?usp=sharing is a sheet comparing translations from Anuvaad, ilmulti (https://github.com/jerinphilip/ilmulti) and Google Translate (=GOOGLETRANSLATE(text, "en", "language") function on google sheets) on 100 randomly selected English sentences from Tatoeba.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anuvaad-1.0.2.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anuvaad-1.0.2-py2.py3-none-any.whl (17.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file anuvaad-1.0.2.tar.gz.

File metadata

  • Download URL: anuvaad-1.0.2.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.6.8

File hashes

Hashes for anuvaad-1.0.2.tar.gz
Algorithm Hash digest
SHA256 752f69a2f1cbb7650d5abb1eab6d01e3fb7ab2d55276222aa921ab11db1910f2
MD5 d65fa088a7b8d515caf49f4546e185b7
BLAKE2b-256 c0ecfb3e59d9fd9f1fa7b06bb3aaa5a5d2433c6c99325ffcacd65929057d88de

See more details on using hashes here.

File details

Details for the file anuvaad-1.0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: anuvaad-1.0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.6.8

File hashes

Hashes for anuvaad-1.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a06dfe01de606d7d0dfd885a8d59e7997e1da2bb08f990663232e00d9454f92a
MD5 8afe2d01d45018eb5fd0b4ae09f9e4a8
BLAKE2b-256 bdd18ed1a4bc9a289015777253d92d12d0c2cd91c6add3b733b1434e74c3a672

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page