Skip to main content

No project description provided

Project description

Piraye: NLP Utils

A utility for normalizing persian, arabic and english texts

Requirements

  • Python 3.8+
  • spacy 3.1.3+

Installation

Install the latest version with pip pip install piraye

Usage

Create an instance of Normalizer with NormalizerBuilder and then call normalize function. Also see list of all available configs in configs section.

from piraye import NormalizerBuilder
from piraye.normalizer_builder import Config

text = "این یک متن تسة اسﺘ       , 24/12/1400 "
normalizer = NormalizerBuilder(
    [Config.PUNCTUATION_FA]).alphabet_fa().digit_fa().tokenizing().remove_extra_spaces().build()
normalizer.normalize(text)  # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"

Configs

Config Function Description
ALPHABET_AR alphabet_ar mapping alphabet characters to arabic
ALPHABET_EN alphabet_en mapping alphabet characters to english
ALPHABET_FA alphabet_fa mapping alphabet characters to persian
DIGIT_AR digit_ar convert digits to arabic digits
DIGIT_EN digit_en convert digits to english digits
DIGIT_FA digit_fa convert digits to persian digits
DIACRITIC_DELETE diacritic_delete remove all diacritics
SPACE_DELETE space_delete remove all spaces
SPACE_NORMAL space_normal normal spaces ( like NO-BREAK SPACE , Tab and etc...)
SPACE_KEEP space_keep mapping spaces and not normal them
PUNCTUATION_AR punctuation_ar mapping punctuations to arabic punctuations
PUNCTUATION_Fa punctuation_fa mapping punctuations to persian punctuations
PUNCTUATION_EN punctuation_en mapping punctuations to english punctuations

Tests

Pylint Unit Test

Versions

  • 0.0.2
    • fix pylint errors
    • update normalizer builder
  • 0.0.1
    • piraye released

License

GNU Lesser General Public License v2.1

Primarily used for software libraries, the GNU LGPL requires that derived works be licensed under the same license, but works that only link to it do not fall under this restriction. There are two commonly used versions of the GNU LGPL.

See LICENSE

About

Arusha

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piraye-0.0.3.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

piraye-0.0.3-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file piraye-0.0.3.tar.gz.

File metadata

  • Download URL: piraye-0.0.3.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for piraye-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f452483125e48e11fa92bda449866dcc089f99db98c32cde1f68fb419ae35f53
MD5 7d63638ce32cc384adec33d29357c81e
BLAKE2b-256 7cc501ffa2ea204952503c78e4266f5daac9330dc138361b0b5efd1f5f13c75b

See more details on using hashes here.

File details

Details for the file piraye-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: piraye-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for piraye-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a86ab2230775ccf218f3ff08523e9f01e3ddfc77adbe5761a4b23486b9b1f8be
MD5 3691aa14bc280bf2e1a86d9d600f7b97
BLAKE2b-256 f609353d08a3470a072010b1c884e45918e2760740c9fc241c2203051134935b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page