Skip to main content

Python library for digesting Persian text.

Project description

Python library for digesting Persian text.

  • Text cleaning

  • Sentence and word tokenizer

  • Word lemmatizer

  • POS tagger

  • Dependency parser

  • Interfaces for Persian corpora

  • NLTK compatible

  • Python 2.7, 3.2, 3.3 and 3.4 support

  • Build Status

Usage

>>> from __future__ import unicode_literals

>>> from hazm import Normalizer
>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیم‌فاصله پردازش را آسان مي كند')
'اصلاح نویسه‌ها و استفاده از نیم‌فاصله پردازش را آسان می‌کند'

>>> from hazm import sent_tokenize, word_tokenize
>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']

>>> from hazm import Stemmer, Lemmatizer
>>> stemmer = Stemmer()
>>> stemmer.stem('کتاب‌ها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('می‌روم')
'رفت#رو'

>>> from hazm import POSTagger
>>> tagger = POSTagger()
>>> tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))
[('ما', 'PRO'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('می‌خوانیم', 'V')]

>>> from hazm import DependencyParser
>>> parser = DependencyParser(tagger=POSTagger())
>>> parser.parse(word_tokenize('زنگ‌ها برای که به صدا درمی‌آید؟'))
<DependencyGraph with 8 nodes>

Installation

pip install hazm

We have also trained tagger and parser models. You may put these models in the resources folder of your project.

Extensions

Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hazm-0.3.tar.gz (142.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hazm-0.3.linux-x86_64.exe (208.2 kB view details)

Uploaded Source

File details

Details for the file hazm-0.3.tar.gz.

File metadata

  • Download URL: hazm-0.3.tar.gz
  • Upload date:
  • Size: 142.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.3.tar.gz
Algorithm Hash digest
SHA256 48568a8f014c475db3f393e9f423ebd8b092a9b33b20b7211a89d53523ff6dc1
MD5 cca1486ccf9005033722e784003c2db0
BLAKE2b-256 19127044c8629e7b4038f8f1812e82335f0203e28a9dcd7fdac3f288df9d12db

See more details on using hashes here.

File details

Details for the file hazm-0.3.linux-x86_64.exe.

File metadata

  • Download URL: hazm-0.3.linux-x86_64.exe
  • Upload date:
  • Size: 208.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.3.linux-x86_64.exe
Algorithm Hash digest
SHA256 5793b4f1ce2284e2a187ad6bde5d9b3373625bfac2f83e4a575e01fe9cd444ca
MD5 f4cb022899fba4e4d160dd78bd8fab01
BLAKE2b-256 36ffd09662889fbdd253d5be0bc3a066c02685731da853c612502f0717bc6122

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page