Skip to main content

Python library for digesting Persian text.

Project description

Python library for digesting Persian text.

  • Text cleaning

  • Sentence and word tokenizer

  • Word lemmatizer

  • POS tagger

  • Dependency parser

  • Corpus readers for Hamshahri and Bijankhan

  • NLTK compatible

  • Python 3.3 and 2.7 support

  • |Build Status|

Usage

>>> from hazm import Normalizer
>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیم‌فاصله پردازش را آسان مي كند')
'اصلاح نویسه‌ها و استفاده از نیم‌فاصله پردازش را آسان می‌کند'

>>> from hazm import sent_tokenize, word_tokenize
>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']

>>> from hazm import Stemmer, Lemmatizer
>>> stemmer = Stemmer()
>>> stemmer.stem('کتاب‌ها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('می‌روم')
'رفت#رو'

>>> from hazm import POSTagger
>>> tagger = POSTagger()
>>> tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))
[('ما', 'PR'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('می‌خوانیم', 'V')]

>>> from hazm import DependencyParser
>>> parser = DependencyParser(tagger=POSTagger())
>>> parser.parse(word_tokenize('زنگ‌ها برای که به صدا درمی‌آید ؟'))
<DependencyGraph with 8 nodes>

Installation

pip install hazm

We also trained tagger and parser models which you may put them in resources folder of your project.

Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hazm-0.1.tar.gz (134.9 kB view details)

Uploaded Source

Built Distribution

hazm-0.1.linux-x86_64.exe (198.6 kB view details)

Uploaded Source

File details

Details for the file hazm-0.1.tar.gz.

File metadata

  • Download URL: hazm-0.1.tar.gz
  • Upload date:
  • Size: 134.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.1.tar.gz
Algorithm Hash digest
SHA256 1101cf8b66884e9f64c8fa88c129f66150a466551578812d5da7b876213c862e
MD5 29bd9c844d18547b3163271b6d61c4dc
BLAKE2b-256 da4d7065524c9cede2f2a0e57f0a1d6663af036495f084fc60834da31bce33b3

See more details on using hashes here.

File details

Details for the file hazm-0.1.linux-x86_64.exe.

File metadata

  • Download URL: hazm-0.1.linux-x86_64.exe
  • Upload date:
  • Size: 198.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.1.linux-x86_64.exe
Algorithm Hash digest
SHA256 53548952bd94091256e3bb1cd9e7adc6b615bfeae2e83d83cb9ec4305828adf5
MD5 2b6ea27cdd0b84153007884132efe995
BLAKE2b-256 942e5aa15812f9ca87c20210c1a94645770dd3e2fdb0b4a91279c75ab22381ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page