Adawat: Arabic Language Toolkit
Project description
Adawat: Arabic Language Toolkit
مكتبة أدوات اللغة العربية
Adawat: Arabic Language Toolkit
Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com
Features |
value |
---|---|
Authors |
|
Release |
0.1 |
License |
|
Tracker |
|
Source |
|
Feedbacks |
|
Accounts |
[@Twitter](https://twitter.com/linuxscout)) |
Description
Adawat: Arabic Language Toolkit
مزايا:
تجمع هذه المكتبة كل الأدوات المستعملة في معالجة النص العربي مثل:
- التشكيل
تشكيل النص العربي، يستحسن استعمال مكتبة مشكال، أو برنامج مشكال
تشكيل مع اقتراحات تشكيلات أخرى لكل كلمة
اختزال الحركات من النص المشكول
إزالة التشكيل
مقارنة جملة مشكولة يدويا مع ما ينتج عن برنامج التشكيل
وظائف التحويل
- نقحرة النص العربي بحروف لاتينية
تعريب نص مكتوب بحروف لاتينية
قلب نص
تفقيط: تحويل عدد إلى نص
تنميط النص: توحيد الهمزات والألفات
فك تشابك الحروف العربية
التحليل والتوليد
- تحليل صرفي للنص
تفريق النص إلى كلمات وعلامات
تصنيف الكلمات إلى اسم وفعل وحرف
توليد كل الأشكال المختلفة للكلمة
استخلاص
- استخلاص المتلازمات اللفظية
كشف اللغات المختلفة
استخلاص المسميات
استخلاص العبارات العددية
- متفرقات
ضبط قصيدة شعرية عمودية
توليد نص عشوائي
Features
- Tashkeel
tashkeel : vocalize text, we recomand to use mishkal-console instead.
tashkeel with suggestions for every word.
reduce : strip unnecessary tashkeel from avocalized text
strip : remove all harakat and shadda
compare : Compare Tashkeel between input text and the automatic vocalized text
- Transformation and Converion
romanize : convert an arabic script text to latin representation
arabize : convert an transliterated arabic script text to arabic
inverse : inverse text
numbers to words : convert numeric value to words
normalize : normalize letters in arabic text
unshape : unshape arabic letters
- Analysis and generation
stem : morphology analysis of given texts
tokenize : tokenize a text to words
wordtag : classify words into (nouns, verbs, stopwords)
affixate : generate all word forms by affixation
- Extraction
collocation : extract collocations from text
language : detect arabic and latin clauses in text
named : extract named enteties from text
numbered : extarct numbred clauses from text
- Divers
affixate : generate all word forms by affixation
poetry : format poetry texts to columns poetry
random : get a random text
Citation
@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}
Usage
install
pip install adawat
import
>>> import adawat.adaat
Examples
Detailed examples and features in Features
Tashkeel
tashkeel : vocalize text, we recomand to use mishkal-console instead.
tashkeel with suggestions for every word.
reduce : strip unnecessary tashkeel from avocalized text
strip : remove all harakat and shadda
compare : Compare Tashkeel between input text and the automatic vocalized text
>>> lastmark = True
>>> text = u"تطلع الشمس صباحا"
>>> adawat.adaat.tashkeel_text(text, lastmark)
' تَطْلُعُ الشَّمْسُ صَبَاحًا'
[requirement]
asmai>=0.1 mishkal>=0.3 naftawayh>=0.4 pyarabic>=0.6.8 qalsadi>=0.3.6 repr>=0.3.1 spellcheck>=1.0.2 sylajone>=0.2 tashaphyne>=0.3.4.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file adawat-0.1.tar.gz
.
File metadata
- Download URL: adawat-0.1.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1740a0aa34a9b56ef385b846a2e79db4aa17a73d90cb9e0250f0fafeb467eb35 |
|
MD5 | 432bf2d4bebb22470fb719e8e979199e |
|
BLAKE2b-256 | 35650c8252cc3da395fdf271933fd7fc8c41d8b1d008978c35330a50596cd4cf |
File details
Details for the file adawat-0.1-py3-none-any.whl
.
File metadata
- Download URL: adawat-0.1-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd1f6574357e9cf3df359b7891052a4542cc459a2608f2ff7d79214e68417a5b |
|
MD5 | 053aa55e4a15552e4520c2dbdc860cdf |
|
BLAKE2b-256 | 2ba9c5ec2da0424cde5cbd79a77540df87dc598b523484d42c76dc76d2cf134f |
File details
Details for the file adawat-0.1-py2-none-any.whl
.
File metadata
- Download URL: adawat-0.1-py2-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f47ff869db87ae1ab41202057ba5568d23ec7c619e29f763262df42422229bd |
|
MD5 | 7cde2cc021a204a2bf79746a0a0c1a5b |
|
BLAKE2b-256 | aae9753bad792624601e11bd28758148e0e215bc89d9369433806c75ed773ece |