Skip to main content

Natural Language Toolkit for Indian Languages (iNLTK)

Project description

Natural Language Toolkit for Indic Languages (iNLTK)

Gitter

iNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Indic languages.

Alt Text

Documentation

Checkout detailed docs at https://inltk.readthedocs.io

Supported languages

Language Code
Hindi hi
Punjabi pa
Sanskrit sa
Gujarati gu
Kannada kn
Malayalam ml
Nepali ne
Odia or
Marathi mr
Bengali bn
Tamil ta
Urdu ur

Repositories containing models used in iNLTK

Language Repository Perplexity of Language model Wikipedia Articles Dataset Classification accuracy Classification Kappa score Embeddings visualization on Embedding projector
Hindi NLP for Hindi ~36 55,000 articles ~79 (News Classification) ~30 (Movie Review Classification) Hindi Embeddings projection
Punjabi NLP for Punjabi ~13 44,000 articles ~89 (News Classification) ~60 (News Classification) Punjabi Embeddings projection
Sanskrit NLP for Sanskrit ~6 22,273 articles ~70 (Shloka Classification) ~56 (Shloka Classification) Sanskrit Embeddings projection
Gujarati NLP for Gujarati ~34 31,913 articles ~91 (News Classification) ~85 (News Classification) Gujarati Embeddings projection
Kannada NLP for Kannada ~70 32,997 articles ~94 (News Classification) ~90 (News Classification) Kannada Embeddings projection
Malayalam NLP for Malayalam ~26 12,388 articles ~94 (News Classification) ~91 (News Classification) Malayalam Embeddings projection
Nepali NLP for Nepali ~32 38,757 articles ~97 (News Classification) ~96 (News Classification) Nepali Embeddings projection
Odia NLP for Odia ~27 17,781 articles ~95 (News Classification) ~92 (News Classification) Odia Embeddings Projection
Marathi NLP for Marathi ~18 85,537 articles ~91 (News Classification) ~84 (News Classification) Marathi Embeddings projection
Bengali NLP for Bengali ~41 72,374 articles ~94 (News Classification) ~92 (News Classification) Bengali Embeddings projection
Tamil NLP for Tamil ~20 >127,000 articles ~97 (News Classification) ~95 (News Classification) Tamil Embeddings projection
Urdu NLP for Urdu ~13 >150,000 articles ~94 (News Classification) ~90 (News Classification) Urdu Embeddings projection

Contributing

Add a new language support

If you would like to add support for language of your own choice to iNLTK, please start with checking/raising a issue here

Please checkout the steps I'd mentioned here for Telugu to begin with. They should be almost similar for other languages as well.

Improving models/using models for your own research

If you would like to take iNLTK's models and refine them with your own dataset or build your own custom models on top of it, please check out the repositories in the above table for the language of your choice. The repositories above contain links to datasets, pretrained models, classifiers and all of the code for that.

Add new functionality

If you wish for a particular functionality in iNLTK - Start by checking/raising a issue here

What's next

..and being worked upon

Shout out if you want to help :)

  • Add Telugu and Maithili support
  • Add NER support
  • Add Textual Entailment support
  • Add English to iNLTK

..and NOT being worked upon

Shout out if you want to lead :)

iNLTK's Appreciation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inltk-0.7.4.tar.gz (8.2 kB view hashes)

Uploaded Source

Built Distribution

inltk-0.7.4-py3-none-any.whl (9.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page