Natural Language Toolkit for Indian Languages (iNLTK)
Project description
Natural Language Toolkit for Indian Languages (iNLTK)
Installation
pip install http://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl
pip install inltk
iNLTK runs on CPU and NOT on GPU, as is the desired behaviour for most of the Deep Learning models in production.
The first command above will install pytorch-cpu, which, as the name suggests, does not have cuda support.
Supported languages
Language | Code |
---|---|
Hindi | hi |
Punjabi | pa |
Sanskrit | sa |
Gujarati | gu |
Kannada | kn |
Malyalam | ml |
Nepali | ne |
Odia | or |
Marathi | mr |
Bengali | bn |
Usage
Setup the language
from inltk.inltk import setup
setup('<code-of-language>') // if you wanted to use hindi, then setup('hi')
Note: You need to run setup('<code-of-language>') when you use a language for the FIRST TIME ONLY. This will download all the necessary models required to do inference for that language.
Tokenize
from inltk.inltk import tokenize
tokenize(text ,'<code-of-language>') // where text is string in <code-of-language>
Predict Next 'n' words
from inltk.inltk import predict_next_words
predict_next_words(text , n, '<code-of-language>')
// text --> string in <code-of-language>
// n --> number of words you want to predict (integer)
Note: You can also pass a fourth parameter, randomness, to predict_next_words. It has a default value of 0.8
Repositories containing models used in iNLTK
Language | Repository | Perplexity of Language model | Wikipedia Articles Dataset | Classification accuracy | Classification Kappa score |
---|---|---|---|---|---|
Hindi | NLP for Hindi | ~36 | 55,000 articles | ~79 (News Classification) | ~30 (Movie Review Classification) |
Punjabi | NLP for Punjabi | ~13 | 44,000 articles | ~89 (News Classification) | ~60 (News Classification) |
Sanskrit | NLP for Sanskrit | ~6 | 22,273 articles | ~70 (Shloka Classification) | ~56 (Shloka Classification) |
Gujarati | NLP for Gujarati | ~34 | 31,913 articles | ~91 (News Classification) | ~85 (News Classification) |
Kannada | NLP for Kannada | ~70 | 32,997 articles | ~94 (News Classification) | ~90 (News Classification) |
Malyalam | NLP for Malyalam | ~26 | 12,388 articles | ~94 (News Classification) | ~91 (News Classification) |
Nepali | NLP for Nepali | ~32 | 38,757 articles | ~97 (News Classification) | ~96 (News Classification) |
Odia | NLP for Odia | ~27 | 17,781 articles | ~95 (News Classification) | ~92 (News Classification) |
Marathi | NLP for Marathi | ~18 | 85,537 articles | ~91 (News Classification) | ~84 (News Classification) |
Bengali | NLP for Bengali | ~41 | 72,374 articles | ~94 (News Classification) | ~92 (News Classification) |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
inltk-0.0.8.tar.gz
(5.4 kB
view hashes)
Built Distribution
inltk-0.0.8-py3-none-any.whl
(7.3 kB
view hashes)