Skip to main content

Olchiki Unicode Normalization Toolkit

Project description

olunicodenormalizer

ᱚᱞ-ᱪᱦᱤᱠᱤ Unicode Normalization for word normalization

install

pip install olunicodenormalizer

useage

initialization and cleaning

# import
from olunicodenormalizer import Normalizer 
from pprint import pprint
# initialize
bnorm=Normalizer()
# normalize
word = 'ᱡᱚᱦᱟᱨ'
result=bnorm(word)
print(f"Non-norm:{word}; Norm:{result['normalized']}")
print("--------------------------------------------------")
pprint(result)

output

Non-norm:ᱡᱚᱦᱟᱨ; Norm:ᱡᱚᱦᱟᱨ
--------------------------------------------------
{'given': 'ᱡᱚᱦᱟᱨ', 'normalized': 'ᱡᱚᱦᱟᱨ', 'ops': []}
# initialize without english (default)
norm=Normalizer()
print("without english:",norm("ASD123")["normalized"])
# --> returns None
norm=Normalizer(allow_english=True)
print("with english:",norm("ASD123")["normalized"])

output

without english: None
with english: ASD123

Change Log

0.0.5 (9/03/2022)

  • added details for execution map
  • checkop typo correction

0.0.6 (9/03/2022)

  • broken diacritics op addition

0.0.7 (11/03/2022)

  • assemese replacement
  • word op and unicode op mapping
  • modifier list modification
  • doc string for call and initialization
  • verbosity removal
  • typo correction for operation
  • unit test updates
  • 'এ' replacement correction
  • NonGylphUnicodes
  • Legacy symbols option
  • legacy mapper added
  • added bn:bd declaration

0.0.8 (14/03/2022)

  • MultipleConsonantDiacritics handling change
  • to+hosonto correction
  • invalid hosonto correction

0.0.9 (15/04/2022)

  • base normalizer
  • language class
  • olchiki extension
  • complex root normalization

0.0.10 (15/04/2022)

  • added conjucts
  • exception for english words

0.0.11 (15/04/2022)

  • fixed no space char issue for olchiki

0.0.12 (26/04/2022)

  • fixed consonants orders

0.0.13 (26/04/2022)

  • fixed non char followed by diacritics

0.0.14 (01/05/2022)

  • word based normalization
  • encoding fix

0.0.15 (02/05/2022)

  • import correction

0.0.16 (02/05/2022)

  • local variable issue

0.0.17 (17/05/2022)

  • nukta mod break

0.0.18 (08/06/2022)

  • no space chars fix

0.0.19 (15/06/2022)

  • no space chars further fix
  • base_olchiki_compose to avoid false op flags
  • added foreign conjuncts

0.0.20 (01/08/2022)

  • এ্যা replacement correction

0.0.21 (01/08/2022)

  • "য","ব" + hosonto combination correction
  • added 'ব্ল্য' in conjuncts

0.0.22 (22/08/2022)

  • \u200d combination limiting

0.0.23 (23/08/2022)

  • \u200d condition change

0.0.24 (26/08/2022)

  • \u200d error handling

0.0.25 (10/09/22)

  • removed unnecessary operations: fixRefOrder,fixOrdersForCC
  • added conjuncts: 'র্ন্ত','ঠ্য','ভ্ল'

0.1.0 (20/10/22)

  • added indic parser
  • fixed language class

0.1.1 (21/10/22)

  • added nukta and diacritic maps for indics
  • cleaned conjucts for now
  • fixed issues with no-space and connector

0.1.2 (10/12/22)

  • allow halant ending for indic language except olchiki

0.1.3 (10/12/22)

  • broken char break cases for halant

0.1.4 (01/01/23)

  • added sylhetinagri

0.1.5 (01/01/23)

  • cleaned panjabi double quotes in diac map

0.0.1 (26/08/23)

  • added olchiki punctuations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olunicodenormalizer-1.0.0.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

olunicodenormalizer-1.0.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file olunicodenormalizer-1.0.0.tar.gz.

File metadata

  • Download URL: olunicodenormalizer-1.0.0.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for olunicodenormalizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 de9eb75611d7315f58af3401d7c12f1b9387e9cb699d5a4ebeab90b11b1bd8ab
MD5 4b06157b41c7c12385554e9c5ca5e89b
BLAKE2b-256 9c782214a475860a52c6f50e492dc7ed6ddfe993f6a31766b339399e961fa2d5

See more details on using hashes here.

File details

Details for the file olunicodenormalizer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for olunicodenormalizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9fbf46c8cd3d1d4f8beaf78aff0a0c5eea1732174f4c0d8c9d891384da0c0126
MD5 91b5a26f5175b996a9ecd29a90357193
BLAKE2b-256 93f0464ee6d8c35dc4b7ab1928c999b9f918fc233fda44e6c6d85a92c6f548c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page