Skip to main content

Tokenizer by Anuvaad

Project description

Anuvaad Tokenizer

Anuvaad Tokenizer is a python package, which can be used to tokenize paragraphs into sentences. It supports most of the Indian languages including English. This Tokenizer is built using regular expressions.

Prerequisites

  • python >= 3.6

Installation

pip install Anuvaad_Tokenizer==0.0.3

Author

Anuvaad (nlp-nmt@tarento.com)

Usage Example

For English

from Anuvaad_Tokenizer.AnuvaadEnTokenizer import AnuvaadEnTokenizer 

para=" "  
tokenized_text = AnuvaadEnTokenizer().tokenize(para)

For Hindi

from Anuvaad_Tokenizer.AnuvaadHiTokenizer import AnuvaadHiTokenizer

para=" "
tokenized_text = AnuvaadHiTokenizer().tokenize(para)

For Kannada

from Anuvaad_Tokenizer.AnuvaadKnTokenizer import AnuvaadKnTokenizer

para=" "
tokenized_text = AnuvaadKnTokenizer().tokenize(para)

For Telugu

from Anuvaad_Tokenizer.AnuvaadTeTokenizer import AnuvaadTeTokenizer

para=" "
tokenized_text = AnuvaadTeTokenizer().tokenize(para)

For Tamil

from Anuvaad_Tokenizer.AnuvaadTaTokenizer import AnuvaadTaTokenizer

para=" "
tokenized_text = AnuvaadTaTokenizer().tokenize(para)

LICENSE

MIT License 2021 Developer - Anuvaad

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Anuvaad_Tokenizer-0.0.3.tar.gz (9.6 kB view details)

Uploaded Source

Built Distributions

Anuvaad_Tokenizer-0.0.3-py3.9.egg (41.1 kB view details)

Uploaded Source

Anuvaad_Tokenizer-0.0.3-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file Anuvaad_Tokenizer-0.0.3.tar.gz.

File metadata

  • Download URL: Anuvaad_Tokenizer-0.0.3.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Anuvaad_Tokenizer-0.0.3.tar.gz
Algorithm Hash digest
SHA256 53d8bf61f930ec445fc72ab24f660ee6a614959e66c6211d25fbc4c1a6c6e225
MD5 c1448d6a79166e83a78f0b1896c19ab5
BLAKE2b-256 af03daee4d302126faaa71f0dc0c058166441d4ddf3472db35bbd6757ff92c55

See more details on using hashes here.

File details

Details for the file Anuvaad_Tokenizer-0.0.3-py3.9.egg.

File metadata

  • Download URL: Anuvaad_Tokenizer-0.0.3-py3.9.egg
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Anuvaad_Tokenizer-0.0.3-py3.9.egg
Algorithm Hash digest
SHA256 792e6817f4dff4124247252190834e99b76ca332594431f1095bb55890cf175a
MD5 7686aa7ec6630456828290055279a595
BLAKE2b-256 bf8665dfaeddfad234db2ae055234dd015343db01b927145c2fc86f3bfd24fdf

See more details on using hashes here.

File details

Details for the file Anuvaad_Tokenizer-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: Anuvaad_Tokenizer-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Anuvaad_Tokenizer-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2241d65f4ce496c7a8530d2e7160fe371e67230e281f1172bf00de537fb51bf1
MD5 6c9e14fed07a6f8d1b5f96d41811366a
BLAKE2b-256 76d2bdd4fbb7be22ef4f940ad1a07d39e6e5e488605278580466a0c9e273e687

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page