Skip to main content

Khodnevis Normalizer

Project description

Khoshnevis (خوشنويس)

Python package for normalizing Persian text.

  • Text Cleaning
  • URL Remover
  • Emoji Remover
  • Text Tokenization
  • Punctuation Space Correction
  • Half Space Correction (using Parsivar)
  • Standardize Alphabet
  • NLTK compatible
  • Python 3 support

Usage

>>> from khoshnevis import Normalizer

>>> normalizer = Normalizer()

>>> normalizer.normalize(text="استفاده از نیم‌فاصله متن را زیبا مي كند", zwnj="\u200c", 
                         clean_url=False, remove_emoji=False)
text (str): input text
zwnj (str, optional): Zero-width non-joiner character. Defaults to "\u200c".
clean_url (bool, optional): removes all URLs from text. Defaults to True.
remove_emoji (bool, optional): removes all emojis from the text. Defaults to True.

Installation

The latest stable version of Hazm can be installed through pip:

pip install khoshnevis

Citation info

@misc{khoshnevis,
  author = {HamidReza Attar, Milad Lotfi, Saied Alimoradi},
  title = {Khoshnevis, a Python library for Persian text preprocessing},
  year = {2022},
  url= {https://www.khodnevisai.com/},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khoshnevis-0.1.5.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

khoshnevis-0.1.5-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page