Skip to main content

Tokenizer by Anuvaad

Project description

Anuvaad Tokenizer

Anuvaad Tokenizer is a python package, which can be used to tokenize paragraphs into sentences. It supports most of the Indian languages including English. This Tokenizer is built using regular expressions.

Prerequisites

  • python >= 3.6

Installation

pip install Anuvaad_Tokenizer==0.0.3

Author

Anuvaad (nlp-nmt@tarento.com)

Usage Example

For English

from Anuvaad_Tokenizer.AnuvaadEnTokenizer import AnuvaadEnTokenizer 

para=" "  
tokenized_text = AnuvaadEnTokenizer().tokenize(para)

For Hindi

from Anuvaad_Tokenizer.AnuvaadHiTokenizer import AnuvaadHiTokenizer

para=" "
tokenized_text = AnuvaadHiTokenizer().tokenize(para)

For Kannada

from Anuvaad_Tokenizer.AnuvaadKnTokenizer import AnuvaadKnTokenizer

para=" "
tokenized_text = AnuvaadKnTokenizer().tokenize(para)

For Telugu

from Anuvaad_Tokenizer.AnuvaadTeTokenizer import AnuvaadTeTokenizer

para=" "
tokenized_text = AnuvaadTeTokenizer().tokenize(para)

For Tamil

from Anuvaad_Tokenizer.AnuvaadTaTokenizer import AnuvaadTaTokenizer

para=" "
tokenized_text = AnuvaadTaTokenizer().tokenize(para)

LICENSE

MIT License 2021 Developer - Anuvaad

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Anuvaad_Tokenizer-0.0.3.tar.gz (9.6 kB view hashes)

Uploaded Source

Built Distributions

Anuvaad_Tokenizer-0.0.3-py3.9.egg (41.1 kB view hashes)

Uploaded Source

Anuvaad_Tokenizer-0.0.3-py3-none-any.whl (17.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page