Skip to main content

This project is a collection of Natural Language Processing tools for Kurdish Language.

Project description

Aamraz - Kurdish NLP collection

Overview

Aamraz which is written "ئامراز" in kurdish script means "instrument". This project is a collection of Natural Language Processing tools for Kurdish Language.

Base Features

  • Normalization
  • Tokenization
  • Word Embedding: Creates vector representations of words.
  • Sentences Embedding: Creates vector representations of sentences.

Tools

Installation

pip install aamraz

PretrainedModels

some useful pre-trained Models:

Model Description Size
FastText WordEmbedding Model trained using FastText method on our own Corpus.
This is bot the fasttext & skip-gram model itself (fasttext model.
~ 2.3 GB
FastText WordEmbedding - Lite Model trained using FastText method on our own Corpus.
This is bot the fasttext & skip-gram model itself (fasttext model.
~ 800 MB

Usage

import aamraz

# Normalization
normalizer= aamraz.Normalizer()
sample_sentence="قڵبە‌کە‌م‌ بە‌  کوردی‌  قسە‌ دە‌کات‌."
normalized_sentence=normalizer.normalize(sample_sentence)
print(normalized_sentence)

# Tokenization
tokenizer = aamraz.WordTokenizer()
sample_sentence="زوانی له دربره"
tokens = tokenizer.tokenize(sample_sentence)
print(tokens)

# Embedding
model_path = 'kurdish_fasttext_skipgram_dim300_v1.bin'
embedding_model = aamraz.EmbeddingModel(model_path, dim=50)

sample_word="ئامراز"
sample_sentence="زوانی له دربره"

word_vector = embedding_model.word_embedding(sample_word)
sentence_vector = embedding_model.sentence_embedding(sample_sentence)

print(word_vector)
print(sentence_vector)

License

This project is licensed under the MIT License. You are free to use, distribute, modify, and build upon this work, even for commercial purposes, as long as you include a copy of the original MIT License and provide proper attribution.

To view a copy of this license, visit: https://opensource.org/licenses/MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aamraz-0.0.5.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

aamraz-0.0.5-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file aamraz-0.0.5.tar.gz.

File metadata

  • Download URL: aamraz-0.0.5.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for aamraz-0.0.5.tar.gz
Algorithm Hash digest
SHA256 fca302fa11006bc7c12fba526d12a34f6e95e88cf02e09be88ef44050f0bad72
MD5 3361cc632e734073bd5b2552a2e00461
BLAKE2b-256 679d327925c061c6218a6593a5ee6458d98a80cfea80765d18410a376cbe144d

See more details on using hashes here.

File details

Details for the file aamraz-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: aamraz-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for aamraz-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 270e2487e8aea33ba8eca24608de8031d5ed92129a74a7f5e17900d661469440
MD5 edd84a030179bfb19140d8f81b3e37dc
BLAKE2b-256 c1a00b5fbe733d59fcd8c5f7bd1e5abdd8b9d142622103b2fdb0cf9cdb226397

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page