Tokenizer by Anuvaad
Project description
Anuvaad Tokenizer
Anuvaad Tokenizer is a python package, which can be used to tokenize paragraphs into sentences. It supports most of the Indian languages including English. This Tokenizer is built using regular expressions.
Prerequisites
- python >= 3.6
Installation
pip install Anuvaad_Tokenizer==0.0.3
Author
Anuvaad (nlp-nmt@tarento.com)
Usage Example
For English
from Anuvaad_Tokenizer.AnuvaadEnTokenizer import AnuvaadEnTokenizer
para=" "
tokenized_text = AnuvaadEnTokenizer().tokenize(para)
For Hindi
from Anuvaad_Tokenizer.AnuvaadHiTokenizer import AnuvaadHiTokenizer
para=" "
tokenized_text = AnuvaadHiTokenizer().tokenize(para)
For Kannada
from Anuvaad_Tokenizer.AnuvaadKnTokenizer import AnuvaadKnTokenizer
para=" "
tokenized_text = AnuvaadKnTokenizer().tokenize(para)
For Telugu
from Anuvaad_Tokenizer.AnuvaadTeTokenizer import AnuvaadTeTokenizer
para=" "
tokenized_text = AnuvaadTeTokenizer().tokenize(para)
For Tamil
from Anuvaad_Tokenizer.AnuvaadTaTokenizer import AnuvaadTaTokenizer
para=" "
tokenized_text = AnuvaadTaTokenizer().tokenize(para)
LICENSE
MIT License 2021 Developer - Anuvaad
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Anuvaad_Tokenizer-0.0.3-py3.9.egg
(41.1 kB
view hashes)
Close
Hashes for Anuvaad_Tokenizer-0.0.3-py3.9.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 792e6817f4dff4124247252190834e99b76ca332594431f1095bb55890cf175a |
|
MD5 | 7686aa7ec6630456828290055279a595 |
|
BLAKE2b-256 | bf8665dfaeddfad234db2ae055234dd015343db01b927145c2fc86f3bfd24fdf |
Close
Hashes for Anuvaad_Tokenizer-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2241d65f4ce496c7a8530d2e7160fe371e67230e281f1172bf00de537fb51bf1 |
|
MD5 | 6c9e14fed07a6f8d1b5f96d41811366a |
|
BLAKE2b-256 | 76d2bdd4fbb7be22ef4f940ad1a07d39e6e5e488605278580466a0c9e273e687 |