Skip to main content

This project is a collection of Natural Language Processing tools for Kurdish Language.

Project description

Aamraz - Kurdish NLP collection

Overview

Aamraz which is written "ئامراز" in kurdish script means "instrument". This project is a collection of Natural Language Processing tools for Kurdish Language. Despite being spoken by millions, Kurdish remains an under-resourced language in Natural Language Processing (NLP). Recognizing the rich cultural heritage and historical significance of the Kurdish people, we—regardless of ethnicity—are committed to advancing tools and pre-trained models that empower the Kurdish language in modern research and technology. Our work aims to foster further development and provide a foundation for future research and applications in NLP. see github repository

Installation

pip install aamraz

Base Features

  • Normalization
  • Tokenization
  • Stemming
  • Word Embedding: Creates vector representations of words.
  • Sentences Embedding: Creates vector representations of sentences.

Usage

import aamraz

# Normalization
normalizer= aamraz.Normalizer()
sample_sentence="قڵبە‌کە‌م‌ بە‌  کوردی‌  قسە‌ دە‌کات‌."
normalized_sentence=normalizer.normalize(sample_sentence)
print(normalized_sentence)

# Tokenization
tokenizer = aamraz.WordTokenizer()
sample_sentence="زوانی له دربره"
tokens = tokenizer.tokenize(sample_sentence)
print(tokens)

# Embedding by fasttext
model_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, dim=50)

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

# Embedding by word2vec
model_path = 'kurdish_word2vec_model_dim100_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, type='word2vec')

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

# Stemming
stemmer=aamraz.Stemmer(method='simple')
stemmed=stemmer.stem("کتێبەکانمان")
print(stemmed)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aamraz-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

aamraz-0.1.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file aamraz-0.1.0.tar.gz.

File metadata

  • Download URL: aamraz-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for aamraz-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2fa62ce66711a0436d39fa2a07140dc2b9a92eb3e939157f2f54849a2705154d
MD5 01389600a3d83739f531dc985e3e5acf
BLAKE2b-256 803ae215f33c52e27b0223400bbd6c9c0466de944566b9ec22801b9e234aa6e1

See more details on using hashes here.

File details

Details for the file aamraz-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aamraz-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for aamraz-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e261f08030a0a64391fccdd140fc2ba5321a183b7d3bf7f1aea4a843a1d46046
MD5 4077e8efbc989d24986b4d4b53066aba
BLAKE2b-256 6afe4003eec358ec3754bf161ee4774d18f2560346d5ad300d67d129155289ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page