Persian NLP Toolkit

These details have not been verified by PyPI

Project links

Project description

Hazm - Persian NLP Toolkit

Tests PyPI - Downloads PyPI - Python Version GitHub

Hazm is a python library to perform natural language processing tasks on Persian text. It offers various features for analyzing, processing, and understanding Persian text. You can use Hazm to normalize text, tokenize sentences and words, lemmatize words, assign part-of-speech tags, identify dependency relations, create word and sentence embeddings, or read popular Persian corpora.

Features

Normalization: Converts text to a standard form (diacritics removal, ZWNJ correction, etc).
Tokenization: Splits text into sentences and words.
Lemmatization: Reduces words to their base forms.
POS tagging: Assigns a part of speech to each word.
Dependency parsing: Identifies the syntactic relations between words.
Embedding: Creates vector representations of words and sentences.
Hugging Face Integration: Automatically download and cache pretrained models from the Hub.
Persian corpora reading: Easily read popular Persian corpora with ready-made scripts.

Installation

To install the latest version of Hazm (requires Python 3.12+), run:

pip install hazm

To use the pretrained models from Hugging Face, ensure you have the huggingface-hub package:

pip install huggingface-hub

Pretrained-Models

Hazm supports automatic downloading of pretrained models. You can find all available models (POS Tagger, Chunker, Embeddings, etc.) on our official Hugging Face page:

👉 Roshan Research on Hugging Face

When using Hazm, simply provide the repo_id and model_filename as shown in the examples below, and the library will handle the rest.

Usage

from hazm import *

# ===============================
# Stemming
# ===============================
stemmer = Stemmer()
stem = stemmer.stem('کتاب‌ها')
print(stem) # کتاب

# ===============================
# Normalizing
# ===============================
normalizer = Normalizer()
normalized_text = normalizer.normalize('من کتاب های زیــــادی دارم .')
print(normalized_text) # من کتاب‌های زیادی دارم.

# ===============================
# Lemmatizing
# ===============================
lemmatizer = Lemmatizer()
lem = lemmatizer.lemmatize('می‌نویسیم')
print(lem) # نوشت#نویس

# ===============================
# Sentence tokenizing
# ===============================
sentence_tokenizer = SentenceTokenizer()
sent_tokens = sentence_tokenizer.tokenize('ما کتاب می‌خوانیم. یادگیری خوب است.')
print(sent_tokens) # ['ما کتاب می\u200cخوانیم.', 'یادگیری خوب است.']

# ===============================
# Word tokenizing
# ===============================
word_tokenizer = WordTokenizer()
word_tokens = word_tokenizer.tokenize('ما کتاب می‌خوانیم')
print(word_tokens) # ['ما', 'کتاب', 'می\u200cخوانیم']

# ===============================
# Part of speech tagging
# ===============================
tagger = POSTagger(repo_id="roshan-research/hazm-postagger", model_filename="pos_tagger.model")
tagged_words = tagger.tag(word_tokens)
print(tagged_words) # [('ما', 'PRON'), ('کتاب', 'NOUN'), ('می\u200cخوانیم', 'VERB')]

# ===============================
# Chunking
# ===============================
chunker = Chunker(repo_id="roshan-research/hazm-chunker", model_filename="chunker.model")
chunked_tree = tree2brackets(chunker.parse(tagged_words))
print(chunked_tree) # [ما NP] [کتاب NP] [می‌خوانیم VP]

# ===============================
# Word embedding
# ===============================
word_embedding = WordEmbedding.load(repo_id='roshan-research/hazm-word-embedding', model_filename='fasttext_skipgram_300.bin', model_type='fasttext')
odd_word = word_embedding.doesnt_match(['کتاب', 'دفتر', 'قلم', 'پنجره'])
print(odd_word) # پنجره

# ===============================
# Sentence embedding
# ===============================
sent_embedding = SentEmbedding.load(repo_id='roshan-research/hazm-sent-embedding', model_filename='sent2vec-naab.model')
sentence_similarity = sent_embedding.similarity('او شیر میخورد','شیر غذا می‌خورد')
print(sentence_similarity) # 0.4643607437610626

# ===============================
# Dependency parsing
# ===============================
parser = DependencyParser(tagger=tagger, lemmatizer=lemmatizer, repo_id="roshan-research/hazm-dependency-parser", model_filename="langModel.mco")
dependency_graph = parser.parse(word_tokens)
print(dependency_graph)
"""
{0:  {'address': 0,
      'ctag': 'TOP',
      'deps': defaultdict(<class 'list'>, {'root': [3]}),
      'feats': None,
      'head': None,
      'lemma': None,
      'rel': None,
      'tag': 'TOP',
      'word': None},
  1: {'address': 1,
      'ctag': 'PRON',
      'deps': defaultdict(<class 'list'>, {}),
      'feats': '_',
      'head': 3,
      'lemma': 'ما',
      'rel': 'SBJ',
      'tag': 'PRON',
      'word': 'ما'},
  2: {'address': 2,
      'ctag': 'NOUN',
      'deps': defaultdict(<class 'list'>, {}),
      'feats': '_',
      'head': 3,
      'lemma': 'کتاب',
      'rel': 'OBJ',
      'tag': 'NOUN',
      'word': 'کتاب'},
  3: {'address': 3,
      'ctag': 'VERB',
      'deps': defaultdict(<class 'list'>, {'SBJ': [1], 'OBJ': [2]}),
      'feats': '_',
      'head': 0,
      'lemma': 'خواند#خوان',
      'rel': 'root',
      'tag': 'VERB',
      'word': 'می\u200cخوانیم'}})

"""

Documentation

Visit https://roshan-ai.ir/hazm to view the full documentation.

Evaluation

Module name
DependencyParser	85.6%
POSTagger	98.8%
Chunker	93.4%
Lemmatizer	89.9%

	Metric	Value
SpacyPOSTagger	Precision	0.99250
	Recall	0.99249
	F1-Score	0.99249
EZ Detection in SpacyPOSTagger	Precision	0.99301
	Recall	0.99297
	F1-Score	0.99298
SpacyChunker	Accuracy	96.53%
	F-Measure	95.00%
	Recall	95.17%
	Precision	94.83%
SpacyDependencyParser	TOK Accuracy	99.06
	UAS	92.30
	LAS	89.15
	SENT Precision	98.84
	SENT Recall	99.38
	SENT F-Measure	99.11

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.12.1

Apr 1, 2026

This version

0.12.0

Apr 1, 2026

0.11.0

Dec 20, 2025

0.10.0

Jan 16, 2024

0.9.4

Oct 1, 2023

0.9.3

Jul 19, 2023

0.9.2

Jul 8, 2023

0.9.1

Jun 30, 2023

0.7.0

Oct 12, 2018

0.6.0.1

Oct 12, 2018

0.5.2

Oct 7, 2015

0.5.1

Jun 29, 2015

0.5

Mar 20, 2015

0.4

Dec 16, 2014

0.3

Aug 29, 2014

0.2

Jul 11, 2014

0.1

Dec 14, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hazm-0.12.0.tar.gz (866.7 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hazm-0.12.0-py3-none-any.whl (887.2 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file hazm-0.12.0.tar.gz.

File metadata

Download URL: hazm-0.12.0.tar.gz
Upload date: Apr 1, 2026
Size: 866.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.12.3 Linux/6.17.0-1008-azure

File hashes

Hashes for hazm-0.12.0.tar.gz
Algorithm	Hash digest
SHA256	`d7776d792f7bba4b96fa031b968e8dff74c2cfdca74ac29c18f1ca694e08dc98`
MD5	`6d53545452ec62d2f391e2e03ebb0dc5`
BLAKE2b-256	`b0aa03aa8a4beaa481a2fc1b503eeccdbe36ad4432356a117ebc40211f5916dc`

See more details on using hashes here.

File details

Details for the file hazm-0.12.0-py3-none-any.whl.

File metadata

Download URL: hazm-0.12.0-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 887.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.12.3 Linux/6.17.0-1008-azure

File hashes

Hashes for hazm-0.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27dadf2c1c4ef9bdeb77f7a66eb0d75822c49cd50b9cc6e66e0328e33e531abf`
MD5	`c81086378fe51c995cbdf7a076c96402`
BLAKE2b-256	`0deadb66be243ff7f3f8047d3e461377a5136b20594fa0a628bd0326f9875447`

See more details on using hashes here.

hazm 0.12.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hazm - Persian NLP Toolkit

Features

Installation

Pretrained-Models

Usage

Documentation

Evaluation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes