Khodnevis Normalizer
Project description
Khoshnevis (خوشنويس)
Python package for normalizing Persian text.
- Text Cleaning
- URL Remover
- Emoji Remover
- Text Tokenization
- Punctuation Space Correction
- Half Space Correction (using Parsivar)
- Standardize Alphabet
- NLTK compatible
- Python 3 support
Usage
>>> from khoshnevis import Normalizer
>>> normalizer = Normalizer()
>>> normalizer.normalize(text="استفاده از نیمفاصله متن را زیبا مي كند", zwnj="\u200c",
clean_url=False, remove_emoji=False)
text (str): input text
zwnj (str, optional): Zero-width non-joiner character. Defaults to "\u200c".
clean_url (bool, optional): removes all URLs from text. Defaults to True.
remove_emoji (bool, optional): removes all emojis from the text. Defaults to True.
Installation
The latest stable version of Hazm can be installed through pip
:
pip install khoshnevis
Citation info
@misc{khoshnevis,
author = {HamidReza Attar, Milad Lotfi, Saied Alimoradi},
title = {Khoshnevis, a Python library for Persian text preprocessing},
year = {2022},
url= {https://www.khodnevisai.com/},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
khoshnevis-0.1.5.tar.gz
(6.0 kB
view hashes)
Built Distribution
Close
Hashes for khoshnevis-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88531d48aa4bf1251dbdc9817e3aeca06d5bbdb41614192ff2df19851001ab39 |
|
MD5 | 284cf9d6596634c9f20d47f867dc2e80 |
|
BLAKE2b-256 | 22b32a70654ca6a6c25e24ce131983db3f7bc3c6135a9da4a04da083a3a7cbfc |