Qalsadi Arabic Morphological Analyzer and lemmatizer for Python
Project description
Qalsadi Arabic Morphological Analyzer for Python
Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com
Feature s |
value |
---|---|
Authors |
|
Release |
0.3.6 |
License |
|
Tracker |
|
Website |
|
Doc |
|
Source |
|
Downloa d |
|
Feedbac ks |
|
Account s |
[@Twitter](https://twitter.com/linuxscout) [@Sourceforge](http://sourceforge.net/projects/qalsadi/) |
Citation
If you would cite it in academic work, can you use this citation
T. Zerrouki, Qalsadi, Arabic mophological analyzer Library for python., https://pypi.python.org/pypi/qalsadi/
Features مزايا
Arabic word Light Stemming.
Features:
Lemmatization
Vocalized Text Analyzer,
Use Qutrub library to analyze verbs.
give word frequency in arabic modern use.
Requirement:
libQutrub: Qutrub verb conjugation library: http://pypi.pyton/LibQutrub
PyArabic: Arabic language tools library : http://pypi.pyton/pyarabic
Tashaphyne;Arabic Light Stemmer library : http://pypi.python.org/pypi/Tashaphyne/
Applications
Stemming texts
Text Classification and categorization
Sentiment Analysis
Named Entities Recognition
Installation
pip install qalsadi
Requirements
pip install -r requirements.txt
libQutrub: Qutrub verb conjugation library: http://pypi.pyton/LibQutrub
PyArabic: Arabic language tools library : http://pypi.pyton/pyarabic
Tashaphyne;Arabic Light Stemmer library : http://pypi.python.org/pypi/Tashaphyne/
Naftawayh : Arabic words tagger: : http://pypi.python.org/pypi/Naftawayh/
Arramooz-pysqlite : Arabic dictionary
CodernityDB : No Sql native python database
Usage
Example
>>> import qalsadi.lemmatizer
>>> text = u"""هل تحتاج إلى ترجمة كي تفهم خطاب الملك؟ اللغة "الكلاسيكية" (الفصحى) موجودة في كل اللغات وكذلك اللغة "الدارجة" .. الفرنسية التي ندرس في المدرسة ليست الفرنسية التي يستخدمها الناس في شوارع باريس .. وملكة بريطانيا لا تخطب بلغة شوارع لندن .. لكل مقام مقال"""
>>> lemmer = qalsadi.lemmatizer.Lemmatizer()
>>> # lemmatize a word
... lemmer.lemmatize("يحتاج")
'احتاج'
>>> # lemmatize a word with a specific pos
>>> lemmer.lemmatize("وفي")
'في'
>>> lemmer.lemmatize("وفي", pos="v")
'وفى'
>>>
>>> lemmas = lemmer.lemmatize_text(text)
>>> print(lemmas)
['هل', 'احتاج', 'إلى', 'ترجمة', 'كي', 'تفهم', 'خطاب', 'ملك', '؟', 'لغة', '"', 'كلاسيكي', '"(', 'فصحى', ')', 'موجود', 'في', 'كل', 'لغة', 'ذلك', 'لغة', '"', 'دارج', '"..', 'فرنسي', 'التي', 'درس', 'في', 'مدرسة', 'ليست', 'فرنسي', 'التي', 'استخدم', 'ناس', 'في', 'شوارع', 'باريس', '..', 'ملك', 'بريطانيا', 'لا', 'خطب', 'بلغة', 'شوارع', 'دنو', '..', 'كل', 'مقام', 'مقالي']
>>> # lemmatize a text and return lemma pos
... lemmas = lemmer.lemmatize_text(text, return_pos=True)
>>> print(lemmas)
[('هل', 'stopword'), ('احتاج', 'verb'), ('إلى', 'stopword'), ('ترجمة', 'noun'), ('كي', 'stopword'), ('تفهم', 'noun'), ('خطاب', 'noun'), ('ملك', 'noun'), '؟', ('لغة', 'noun'), '"', ('كلاسيكي', 'noun'), '"(', ('فصحى', 'noun'), ')', ('موجود', 'noun'), ('في', 'stopword'), ('كل', 'stopword'), ('لغة', 'noun'), ('ذلك', 'stopword'), ('لغة', 'noun'), '"', ('دارج', 'noun'), '"..', ('فرنسي', 'noun'), ('التي', 'stopword'), ('درس', 'verb'), ('في', 'stopword'), ('مدرسة', 'noun'), ('ليست', 'stopword'), ('فرنسي', 'noun'), ('التي', 'stopword'), ('استخدم', 'verb'), ('ناس', 'noun'), ('في', 'stopword'), ('شوارع', 'noun'), ('باريس', 'all'), '..', ('ملك', 'noun'), ('بريطانيا', 'noun'), ('لا', 'stopword'), ('خطب', 'verb'), ('بلغة', 'noun'), ('شوارع', 'noun'), ('دنو', 'verb'), '..', ('كل', 'stopword'), ('مقام', 'noun'), ('مقالي', 'noun')]
>>>
filename="samples/text.txt"
import qalsadi.analex as qa
try:
myfile=open(filename)
text=(myfile.read()).decode('utf8');
if text == None:
text=u"السلام عليكم"
except:
text=u"أسلم"
print " given text"
debug=False;
limit=500
analyzer = qa.Analex()
analyzer.set_debug(debug);
result = analyzer.check_text(text);
print '----------------python format result-------'
print result
for i in range(len(result)):
# print "--------تحليل كلمة ------------", word.encode('utf8');
print "-------------One word detailed case------";
for analyzed in result[i]:
print "-------------one case for word------";
print repr(analyzed);
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file qalsadi-0.4.1.tar.gz
.
File metadata
- Download URL: qalsadi-0.4.1.tar.gz
- Upload date:
- Size: 240.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f10280db2dccbf3797608e4937fd001e948c49e89d7f933e39816cfd65e53e58 |
|
MD5 | 2e5c9922925e591aa7ce0a40a5204971 |
|
BLAKE2b-256 | 9fc7d6805c3f67f925ce5b62804feac779cecf2dae2b3cf4c3c066fa75d90edf |
File details
Details for the file qalsadi-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: qalsadi-0.4.1-py3-none-any.whl
- Upload date:
- Size: 261.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93339bbe61596ae4074159d15986526dc30c8ce281266fe4c58115615c8420ce |
|
MD5 | a0b3ba117277de8c799c421158d89a6b |
|
BLAKE2b-256 | 721e9e4447c9c003d1a2d63b98d14b9daf6dfb92cc3905cea509a287cc4f3888 |