Tokenizer by Anuvaad
Project description
Anuvaad Tokenizer
Anuvaad Tokenizer is a python package, which can be used to tokenize paragraphs into sentences. It supports most of the Indian languages including English. This Tokenizer is built using regular expressions.
Prerequisites
- python >= 3.6
Installation
pip install Anuvaad_Tokenizer==0.0.2
Author
Anuvaad (nlp-nmt@tarento.com)
Usage Example
For English
from Anuvaad_Tokenizer.AnuvaadEnTokenizer import AnuvaadEnTokenizer
para=" "
tokenized_text = AnuvaadEnTokenizer().tokenize(para)
For Hindi
from Anuvaad_Tokenizer.AnuvaadHiTokenizer import AnuvaadHiTokenizer
para=" "
tokenized_text = AnuvaadHiTokenizer().tokenize(para)
For Kannada
from Anuvaad_Tokenizer.AnuvaadKnTokenizer import AnuvaadKnTokenizer
para=" "
tokenized_text = AnuvaadKnTokenizer().tokenize(para)
For Telugu
from Anuvaad_Tokenizer.AnuvaadTeTokenizer import AnuvaadTeTokenizer
para=" "
tokenized_text = AnuvaadTeTokenizer().tokenize(para)
LICENSE
MIT License 2021 Developer - Anuvaad
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Anuvaad_Tokenizer-0.0.2-py3.9.egg
(33.5 kB
view hashes)
Close
Hashes for Anuvaad_Tokenizer-0.0.2-py3.9.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 485810ecebd5c6c3709a6e86b941b73ccb5eb7d94b8b99b3a96c918846a87e6a |
|
MD5 | 96613ea207835ffd979d1142d5f89820 |
|
BLAKE2b-256 | 9df021b5d0ac4274907c2c1e2bfdc07947d2209d35eaed51b9508f108cc13e82 |
Close
Hashes for Anuvaad_Tokenizer-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dc7b8322d47416243c51afe440a489f7d40c6bdf290f81b21b7cf25b72796af |
|
MD5 | 2190f03c887e8d7285070a66f8a5b13e |
|
BLAKE2b-256 | feab44fd4a9de3811d5b3b5a3aa1e9d341b8d6acd59d261c6daff01fe8e8504b |