Skip to main content

This project is a collection of Natural Language Processing tools for Kurdish Language.

Project description

Aamraz - Kurdish NLP collection

Overview

Aamraz which is written "ئامراز" in kurdish script means "instrument". This project is a collection of Natural Language Processing tools for Kurdish Language.

Base Features

  • Normalization
  • Tokenization
  • Word Embedding: Creates vector representations of words.
  • Sentences Embedding: Creates vector representations of sentences.

Tools

Installation

pip install aamraz

PretrainedModels

some useful pre-trained Models:

Model Description Size
FastText WordEmbedding Model trained using FastText method on our own Corpus.
This is bot the fasttext & skip-gram model itself (fasttext model.
~ 2.3 GB
FastText WordEmbedding - Lite Model trained using FastText method on our own Corpus.
This is bot the fasttext & skip-gram model itself (fasttext model.
~ 800 MB
Word2vec Model Including needed .bin and .npy files ~ 92 MB

Usage

import aamraz

# Normalization
normalizer= aamraz.Normalizer()
sample_sentence="قڵبە‌کە‌م‌ بە‌  کوردی‌  قسە‌ دە‌کات‌."
normalized_sentence=normalizer.normalize(sample_sentence)
print(normalized_sentence)

# Tokenization
tokenizer = aamraz.WordTokenizer()
sample_sentence="زوانی له دربره"
tokens = tokenizer.tokenize(sample_sentence)
print(tokens)

# Embedding by fasttext
model_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, dim=50)

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

# Embedding by word2vec
model_path = 'kurdish_word2vec_model_dim100_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, type='word2vec')

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

License

This project is licensed under the MIT License. You are free to use, distribute, modify, and build upon this work, even for commercial purposes, as long as you include a copy of the original MIT License and provide proper attribution.

To view a copy of this license, visit: https://opensource.org/licenses/MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aamraz-0.0.6.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

aamraz-0.0.6-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file aamraz-0.0.6.tar.gz.

File metadata

  • Download URL: aamraz-0.0.6.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for aamraz-0.0.6.tar.gz
Algorithm Hash digest
SHA256 b21ed9f218cf7606187dd4aff70841625504d6c1ff499f2259e069deda3f5490
MD5 12e6751ff7999b5f470e335b91e79fd2
BLAKE2b-256 d401ad9b8e28d441a2d768903eb9e785d7199d93d8ccb1fcd26586dea797ec9d

See more details on using hashes here.

File details

Details for the file aamraz-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: aamraz-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for aamraz-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 589a21d8105c666b53a7ac68be939152aac0417d27f50a9e921d92edbe4f91e4
MD5 bf3fbe3d78a53ebc7584a1ece461eed0
BLAKE2b-256 6d33e6d1b8559aacb856f80feaa445edf3464678bcc7dac8d9024ec8f3af86f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page