Skip to main content

Adawat: Arabic Language Toolkit

Project description

Adawat: Arabic Language Toolkit

مكتبة أدوات اللغة العربية

Adawat: Arabic Language Toolkit

adawat logo

adawat logo

PyPI - Downloads

PyPI - Downloads

Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features

value

Authors

Authors.md

Release

0.1

License

GPL

Tracker

linuxscout/adawat/Issues

Source

Github

Feedbacks

Comments

Accounts

[@Twitter](https://twitter.com/linuxscout))

Description

Adawat: Arabic Language Toolkit

مزايا:

تجمع هذه المكتبة كل الأدوات المستعملة في معالجة النص العربي مثل:

  • التشكيل
    • تشكيل النص العربي، يستحسن استعمال مكتبة مشكال، أو برنامج مشكال

    • تشكيل مع اقتراحات تشكيلات أخرى لكل كلمة

    • اختزال الحركات من النص المشكول

    • إزالة التشكيل

    • مقارنة جملة مشكولة يدويا مع ما ينتج عن برنامج التشكيل

    • وظائف التحويل

  • نقحرة النص العربي بحروف لاتينية
    • تعريب نص مكتوب بحروف لاتينية

    • قلب نص

    • تفقيط: تحويل عدد إلى نص

    • تنميط النص: توحيد الهمزات والألفات

    • فك تشابك الحروف العربية

    • التحليل والتوليد

  • تحليل صرفي للنص
    • تفريق النص إلى كلمات وعلامات

    • تصنيف الكلمات إلى اسم وفعل وحرف

    • توليد كل الأشكال المختلفة للكلمة

    • استخلاص

  • استخلاص المتلازمات اللفظية
    • كشف اللغات المختلفة

    • استخلاص المسميات

    • استخلاص العبارات العددية

  • متفرقات
    • ضبط قصيدة شعرية عمودية

    • توليد نص عشوائي

Features

  • Tashkeel
    • tashkeel : vocalize text, we recomand to use mishkal-console instead.

    • tashkeel with suggestions for every word.

    • reduce : strip unnecessary tashkeel from avocalized text

    • strip : remove all harakat and shadda

    • compare : Compare Tashkeel between input text and the automatic vocalized text

  • Transformation and Converion
    • romanize : convert an arabic script text to latin representation

    • arabize : convert an transliterated arabic script text to arabic

    • inverse : inverse text

    • numbers to words : convert numeric value to words

    • normalize : normalize letters in arabic text

    • unshape : unshape arabic letters

  • Analysis and generation
    • stem : morphology analysis of given texts

    • tokenize : tokenize a text to words

    • wordtag : classify words into (nouns, verbs, stopwords)

    • affixate : generate all word forms by affixation

  • Extraction
    • collocation : extract collocations from text

    • language : detect arabic and latin clauses in text

    • named : extract named enteties from text

    • numbered : extarct numbred clauses from text

  • Divers
    • affixate : generate all word forms by affixation

    • poetry : format poetry texts to columns poetry

    • random : get a random text

Citation

@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}

Usage

install

pip install adawat
import
>>> import adawat.adaat

Examples

Detailed examples and features in Features

Tashkeel

  • tashkeel : vocalize text, we recomand to use mishkal-console instead.

  • tashkeel with suggestions for every word.

  • reduce : strip unnecessary tashkeel from avocalized text

  • strip : remove all harakat and shadda

  • compare : Compare Tashkeel between input text and the automatic vocalized text

>>> lastmark = True
>>> text = u"تطلع الشمس صباحا"
>>> adawat.adaat.tashkeel_text(text, lastmark)
' تَطْلُعُ الشَّمْسُ صَبَاحًا'
[requirement]
asmai>=0.1
mishkal>=0.3
naftawayh>=0.4
pyarabic>=0.6.8
qalsadi>=0.3.6
repr>=0.3.1
spellcheck>=1.0.2
sylajone>=0.2
tashaphyne>=0.3.4.1

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adawat-0.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distributions

adawat-0.1-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

adawat-0.1-py2-none-any.whl (25.0 kB view details)

Uploaded Python 2

File details

Details for the file adawat-0.1.tar.gz.

File metadata

  • Download URL: adawat-0.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12

File hashes

Hashes for adawat-0.1.tar.gz
Algorithm Hash digest
SHA256 1740a0aa34a9b56ef385b846a2e79db4aa17a73d90cb9e0250f0fafeb467eb35
MD5 432bf2d4bebb22470fb719e8e979199e
BLAKE2b-256 35650c8252cc3da395fdf271933fd7fc8c41d8b1d008978c35330a50596cd4cf

See more details on using hashes here.

File details

Details for the file adawat-0.1-py3-none-any.whl.

File metadata

  • Download URL: adawat-0.1-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12

File hashes

Hashes for adawat-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd1f6574357e9cf3df359b7891052a4542cc459a2608f2ff7d79214e68417a5b
MD5 053aa55e4a15552e4520c2dbdc860cdf
BLAKE2b-256 2ba9c5ec2da0424cde5cbd79a77540df87dc598b523484d42c76dc76d2cf134f

See more details on using hashes here.

File details

Details for the file adawat-0.1-py2-none-any.whl.

File metadata

  • Download URL: adawat-0.1-py2-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12

File hashes

Hashes for adawat-0.1-py2-none-any.whl
Algorithm Hash digest
SHA256 3f47ff869db87ae1ab41202057ba5568d23ec7c619e29f763262df42422229bd
MD5 7cde2cc021a204a2bf79746a0a0c1a5b
BLAKE2b-256 aae9753bad792624601e11bd28758148e0e215bc89d9369433806c75ed773ece

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page