Persian normalizer for text processing
Project description
Pormalizer
There are different unicode for lots of persian characters and for computers do not understand this. So before any NLP task first we need to normalize our text and come to singular form for any characters. We also remove any non-alphabet characters and all change all white-space characters into a single space.
Installation
Simply you can install it from PyPi by following command:
pip install -U pormalizer
or if you prefer the latest development version, you can install it from the source:
git clone https://github.com/xurvan/pormalizer.git
cd pormalizer
python setup.py install
Quickstart
A very simple usage could be like:
from pormalizer import Pormalizer
pormalizer = Pormalizer()
pormalizer.normalize("متن امتحانی")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pormalizer-0.1.0.tar.gz
(8.7 kB
view details)
Built Distribution
File details
Details for the file pormalizer-0.1.0.tar.gz
.
File metadata
- Download URL: pormalizer-0.1.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86c9f5818b9a2a65c79ed7c89b62955e4f6661af2a0afbd2877236f31e6ab39d |
|
MD5 | 9ece367a1e61ec8a1c85a1b9cef00655 |
|
BLAKE2b-256 | 442c41222d9fb378b862fd203c9211a37735661f43f3c5695cc202dd6032590e |
File details
Details for the file pormalizer-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pormalizer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3861e4062b00c0da62852deb171f6576ca08e338aa62d0833dda71382d2499d0 |
|
MD5 | 481911fe7c643d68568d25b356b41dd7 |
|
BLAKE2b-256 | 69c077a66e63f070a3132c1b82575f30f4071bd2c29fc447a64edd94090077ca |