Skip to main content

Python library for digesting Persian text.

Project description

Python library for digesting Persian text.

  • Text cleaning

  • Sentence and word tokenizer

  • Word lemmatizer

  • POS tagger

  • Dependency parser

  • Interfaces for Persian corpora

  • NLTK compatible

  • Python 2.7, 3.2, 3.3 and 3.4 support

  • Build Status

Usage

>>> from __future__ import unicode_literals

>>> from hazm import Normalizer
>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیم‌فاصله پردازش را آسان مي كند')
'اصلاح نویسه‌ها و استفاده از نیم‌فاصله پردازش را آسان می‌کند'

>>> from hazm import sent_tokenize, word_tokenize
>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']

>>> from hazm import Stemmer, Lemmatizer
>>> stemmer = Stemmer()
>>> stemmer.stem('کتاب‌ها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('می‌روم')
'رفت#رو'

>>> from hazm import POSTagger
>>> tagger = POSTagger()
>>> tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))
[('ما', 'PRO'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('می‌خوانیم', 'V')]

>>> from hazm import DependencyParser
>>> parser = DependencyParser(tagger=POSTagger())
>>> parser.parse(word_tokenize('زنگ‌ها برای که به صدا درمی‌آید؟'))
<DependencyGraph with 8 nodes>

Installation

pip install hazm

We have also trained tagger and parser models. You may put these models in the resources folder of your project.

Extensions

Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hazm-0.4.tar.gz (145.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hazm-0.4.linux-x86_64.exe (210.6 kB view details)

Uploaded Source

File details

Details for the file hazm-0.4.tar.gz.

File metadata

  • Download URL: hazm-0.4.tar.gz
  • Upload date:
  • Size: 145.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.4.tar.gz
Algorithm Hash digest
SHA256 5c9defbbf914af91008a59a2aae98ca3c52e674046f8a07db46c1ded855d41ba
MD5 6dc58a77c7ae034c8c7632a7c7515b61
BLAKE2b-256 f8093169698f8da06d1ca07be1b8733dc9bdf1febc8ad00348fc66c8ff6e45c2

See more details on using hashes here.

File details

Details for the file hazm-0.4.linux-x86_64.exe.

File metadata

  • Download URL: hazm-0.4.linux-x86_64.exe
  • Upload date:
  • Size: 210.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.4.linux-x86_64.exe
Algorithm Hash digest
SHA256 420c987fcd9bb846c621f42655844e5a8c30f19e2f68212ce1cefb83e5ec38be
MD5 977ac596af85d509ead96b1d4f445345
BLAKE2b-256 0ea85f049b877d8fed50669f04818e6285085c30e723ed2ab68643a0aa0ed74e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page