Tokenizer by Anuvaad
Project description
Anuvaad Tokenizer
Anuvaad Tokenizer is a python package, which can be used to tokenize paragraphs into sentences. It supports most of Indian languages including English. This Tokenizer is built using regular expresions.
Prerequisites
- python >= 3.6
Installation
pip install Anuvaad_Tokenizer==0.0.1
Author
Anuvaad (nlp-nmt@tarento.com)
Usage Example
For English
from Anuvaad_Tokenizer.AnuvaadEnTokenizer import AnuvaadEnTokenizer
para=" "
tokenized_text = AnuvaadEnTokenizer().tokenize(para)
For Hindi
from Anuvaad_Tokenizer.AnuvaadHiTokenizer import AnuvaadHiTokenizer
para=" "
tokenized_text = AnuvaadHiTokenizer().tokenize(para)
For Kannada
from Anuvaad_Tokenizer.AnuvaadKnTokenizer import AnuvaadKnTokenizer
para=" "
tokenized_text = AnuvaadKnTokenizer().tokenize(para)
LICENSE
MIT License 2021 Developer - Anuvaad
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Anuvaad_Tokenizer-0.0.1-py3.9.egg
(25.4 kB
view hashes)
Close
Hashes for Anuvaad_Tokenizer-0.0.1-py3.9.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89e900fe8cd9544e21b10c698f29ef44c38958d3b75f2ea0b83ac6f5dd757334 |
|
MD5 | 62db588c928d594dabcc18ad31bd4a4e |
|
BLAKE2b-256 | 392015027405adb3a1e1586fdda140d9fdb36dcbe09f3cb8e4363ac12ce5097a |
Close
Hashes for Anuvaad_Tokenizer-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0f8a36174dc395ec348acad980abf33669e5f01aefa34d358695c07f041b62c |
|
MD5 | bbcef8a1f9de83a7a63bf609734d8dc6 |
|
BLAKE2b-256 | 9436a3850131fb18a9f37ea8e15575d19dd0d3d6099aeeb2fabf8f059d4c4730 |