Arabic language processing toolkit
Project description
Arabic Natural Language Toolkit (ANLTK)
ANLTK is a set of Arabic natural language processing tools. developed with focus on performance.
ANLTK is a C++ library, with python bindings.
Installation
for python :
pip install pybind11
pip install anltk
Building
Note: Currently only tested on Linux
The Library depends on https://github.com/nemtrif/utfcpp.git, which is cloned automatically.
you also need a modern C++ Compiler, which supports C++17
also meson and ninja needs to be installed.
simply with pip
pip install meson
pip install ninja
git clone --recurse-submodules https://github.com/Abdullah-AlAttar/anltk.git \
&& cd anltk/anltk \
&& meson build --buildtype=release -Dbuild_tests=false \
&& cd build \
&& ninja \
&& cd ../../ \
&& python3 setup.py install
Usage Examples:
C++ API :
#include "anltk/anltk.hpp"
#include <iostream>
#include <string>
int main()
{
std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";
std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
// >bjd hwz HTy klmn sEfS qr$t vx* DZg
std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";
std::cout << anltk::remove_tashkeel(text) << '\n';
// فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
// Third paramters is a stop_list, charactres in this list won't be removed
std::cout << anltk::remove_non_alpha(text, " ") << '\n';
// فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان
}
Python API
import anltk
ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg
print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))
# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
For list of features see Features.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
anltk-0.4.3.tar.gz
(173.2 kB
view hashes)
Built Distributions
anltk-0.4.3-py3.6-linux-x86_64.egg
(216.6 kB
view hashes)
Close
Hashes for anltk-0.4.3-py3.6-linux-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59bdda8b0845abefd28aaba7332e0e99cbf2e0e8e1537006875c9f01c4fed55c |
|
MD5 | 0f277b7551c03c915257a1c935cc6b26 |
|
BLAKE2b-256 | f3e9fa4d89eea605a48f910c6a20d92475519ea41e4b3032793f3455127bac6e |
Close
Hashes for anltk-0.4.3-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c1b60cf04c2fe3a7a168d1125ca9acc7a2c26c918d0a0d8662a445cdbf658e3 |
|
MD5 | 7092aa212fcc7ad73772c8bc092948ec |
|
BLAKE2b-256 | 5d6f49de242a0cc5aa082afe0e2e63bb8b6709c707f6faeb502977c582ca51e0 |
Close
Hashes for anltk-0.4.3-cp310-cp310-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47166072ed491624c12a11427a2c7aa60b070e356bbe6d04d470671512cfa171 |
|
MD5 | 8124dd3408b9356238047f63c1045f94 |
|
BLAKE2b-256 | e2b23a4398a2fe2bad4876be3a803c82d772cb335f371f7bf38f6711f7aeb3ea |
Close
Hashes for anltk-0.4.3-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 727d4ebe3b567ed68959080bfc5c6d5b3697415c47b3309288b581407a2e0ec3 |
|
MD5 | 800874006708c69138dfaa0969c73554 |
|
BLAKE2b-256 | 04fcf96be8bab6a8533ae8229879d24b04b44593a564a932729e7bb8690b4b6e |
Close
Hashes for anltk-0.4.3-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8a4a094a69e622f4ac71a23bdc821cd689bdcebdf8bc0456c38e400478e472d |
|
MD5 | 1971dcf3eb30311ff4d3645fb15ae849 |
|
BLAKE2b-256 | 285d6c2511fc6c86c90ccbb0f525f85626eb466d1b6e22da24a93ccbeaf5d7c9 |
Close
Hashes for anltk-0.4.3-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0d0d2a9b250da40d99a6f3c2ac4b328287cb20f8e47495f44074818d58566c3 |
|
MD5 | 84144f0b175eee78099a0d87c5bb4c68 |
|
BLAKE2b-256 | a142c3449f030cdaa86b79c2d36094c6a6ee98651a69b74bd296a23079ce6293 |
Close
Hashes for anltk-0.4.3-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1540392f3b89a4903da2eef3cfc4062c201b78cf76e9f29431b3617167628f65 |
|
MD5 | 8cd538d9190f9018a6cd19503afab141 |
|
BLAKE2b-256 | 509f7f405f2a9c3f4705b7bd2aec30b636f2f15016a165a48e0eee1d24c39ff6 |