Skip to main content

Persian normalizer for text processing

Project description

Pormalizer

There are different unicode for lots of persian characters and for computers do not understand this. So before any NLP task first we need to normalize our text and come to singular form for any characters. We also remove any non-alphabet characters and all change all white-space characters into a single space.

Installation

Simply you can install it from PyPi by following command:

pip install -U pormalizer

or if you prefer the latest development version, you can install it from the source:

git clone https://github.com/xurvan/pormalizer.git
cd pormalizer
python setup.py install

Quickstart

A very simple usage could be like:

from pormalizer import Pormalizer

pormalizer = Pormalizer()

pormalizer.normalize("متن امتحانی")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pormalizer-0.1.0.tar.gz (8.7 kB view hashes)

Uploaded Source

Built Distribution

pormalizer-0.1.0-py3-none-any.whl (8.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page