Arabic language processing toolkit
Project description
Arabic Natural Language Toolkit (ANLTK)
ANLTK is a set of Arabic natural language processing tools. developed with focus on simplicity and performance.
ANLTK is a C++ library, with python bindings.
Installation
for python :
pip install anltk
Building
Note: Currently only tested on Linux, prebuilt python wheels are available for Linux, Windows, Macos on pypi
Dependencies:
- utfcpp, automatically downloaded.
- utf8proc, automatically downlaoded.
- C++ Compiler that supports c++17.
- Python3, meson, ninja
pip install meson
pip install ninja
git clone https://github.com/Abdullah-AlAttar/anltk.git \
&& cd anltk/ \
&& meson build --buildtype=release -Dbuild_tests=false \
&& cd build \
&& ninja \
&& cd ../ \
&& pip install -e .
Usage Examples:
C++ API :
#include "anltk/anltk.hpp"
#include <iostream>
#include <string>
int main()
{
std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";
std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
// >bjd hwz HTy klmn sEfS qr$t vx* DZg
std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";
std::cout << anltk::remove_tashkeel(text) << '\n';
// فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
// Third paramters is a stop_list, charactres in this list won't be removed
std::cout << anltk::remove_non_alpha(text, " ") << '\n';
// فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان
anltk::TafqitOptions opts;
std::cout<< anltk::tafqit(15000120, opts) <<'\n';
// خمسة عشر مليونًا ومائة وعشرون
}
Python API
import anltk
ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg
print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))
# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
print(anltk.tafqit(15000120))
# خمسة عشر مليونًا ومائة وعشرون
For list of features see Features.md
Benchmarks
Processing a file containing 500000 Line, 6787731 Word, 112704541 Character. the task is to remove diacritics / transliterate to buckwalter
Buckwatler transliteration
Method | Time | ||
---|---|---|---|
anltk python-api | 1.379 seconds | ||
python camel_tools | 11.46 seconds |
Remove Diacritics
Method | Time | ||
---|---|---|---|
anltk python-api | 0.989 seconds | ||
python camel_tools | 4.892 seconds |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
anltk-1.0.3.tar.gz
(163.9 kB
view hashes)
Built Distributions
anltk-1.0.3-cp311-cp311-win_amd64.whl
(170.0 kB
view hashes)
anltk-1.0.3-cp310-cp310-win_amd64.whl
(169.8 kB
view hashes)
anltk-1.0.3-cp39-cp39-win_amd64.whl
(166.1 kB
view hashes)
anltk-1.0.3-cp38-cp38-win_amd64.whl
(169.9 kB
view hashes)
anltk-1.0.3-cp37-cp37m-win_amd64.whl
(168.7 kB
view hashes)
anltk-1.0.3-cp36-cp36m-win_amd64.whl
(168.7 kB
view hashes)
Close
Hashes for anltk-1.0.3-pp39-pypy39_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5ab784e444b7c32ed9337dcf279a00b2eae662d6a776dbfe1ab64d4abcecd76 |
|
MD5 | 90fbb2a70d9e03036be58604073f76a7 |
|
BLAKE2b-256 | b2f32fed607b8d22521fbd2f058613e23f7d6b3c7d703018511c675bce2e2ccf |
Close
Hashes for anltk-1.0.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8785d3d41ca8e65eba0fe63724baec1543187f4bd5eea8288fd7ab5328e0dd83 |
|
MD5 | 9feea39dfd9eea4b9e4accde7b27c6ff |
|
BLAKE2b-256 | 9d89ece559384c4a66609bc3a302a5c1154b14ffb66caf778b6ab72bcffe57eb |
Close
Hashes for anltk-1.0.3-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5827dc226475b1411652ab71cb76335843fd9e60bf7050f11a7c5c5614f70f3 |
|
MD5 | ecd01876548c512459229e5f7bf9d071 |
|
BLAKE2b-256 | 8c3d33e8a2a14ef5d141e7b4aaa5973d10a9682fbc09327cb126ddc68d3a13d0 |
Close
Hashes for anltk-1.0.3-pp38-pypy38_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 955ae88482e325d0f8dac780811319b31cd8ad88831c646b4da39b072151159a |
|
MD5 | 8840bf613702fe47ffd5323237902864 |
|
BLAKE2b-256 | 6699242652c9da0fd58353cf3d743ba85af6f3c5b984524e14058aa70abe8f7e |
Close
Hashes for anltk-1.0.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da539e1759e58c41d7dddb765e73ad93ed3e005508a38f3bf9eed1729981d003 |
|
MD5 | 946b838b19c7efaf076007cc7da176a7 |
|
BLAKE2b-256 | 3b4db781f9c28390953132480a603ecef800676684ddac6c44364e1bbfbed9f3 |
Close
Hashes for anltk-1.0.3-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc44f0c14b6a726d5ef983c9745ed1c429145ba049ee1859c150dddf4be8dbbb |
|
MD5 | a2fba4a38f76021d269568962d7a271c |
|
BLAKE2b-256 | 09c1ce5f7d60fe1fd381b21cd1cadf7e4f5b23dab4eb91603fceaaa29d2a6aa3 |
Close
Hashes for anltk-1.0.3-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3e38418907feec5e9dc69f9493b555ad42a5d3d477badf2dcd965ee41569ffb |
|
MD5 | 7112e831249e48ca1d32ef673856209b |
|
BLAKE2b-256 | 529daa61e6ad48bc8f1c8ab1aecfe53049101108d088379821212d2d403a67b2 |
Close
Hashes for anltk-1.0.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6489c585a06bb20f885a7dc07493aad2d4bab830311be4fc43edb877da705f57 |
|
MD5 | 8cc967aae1650e5389165dae7650c14f |
|
BLAKE2b-256 | ce055b6cde2aa7e75516a599b7af80f466d794f198ee6c9fe98df0a3de875730 |
Close
Hashes for anltk-1.0.3-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f26734b409d306b7b5132ab321edca2be1c3fad11aeb0cf7fb142e9fe3203a3f |
|
MD5 | 8cd85cbe1014346f4ad2cfdaf8d51fed |
|
BLAKE2b-256 | 46564d5be90f2ce287f554daca952de47025614ea16481052438c8c4f5c9b876 |
Close
Hashes for anltk-1.0.3-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cec81d44bd1f833390d196f4716ad533056e05c1120d15fb65d22bb7f3ef5b44 |
|
MD5 | 088e20f06dea630897d481c625ba65ba |
|
BLAKE2b-256 | 56d1cebce6ba9dd40b8982311772666357310cdbb61b2b546534c08afe8a73b4 |
Close
Hashes for anltk-1.0.3-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6309ea85a4c452647fb9f5bfac86619ef825ea1bf878009151d78ed346d9227 |
|
MD5 | 2e53a6745ea609433cfb0b1683cab161 |
|
BLAKE2b-256 | 00da70f97412003d4c00fb77f04256ec138ae1413c48be6c9a050c9bfb60c656 |
Close
Hashes for anltk-1.0.3-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eaab6995a4308572dfbb6b543c1f42459284488f71bf60a570b9845a9f70bdc |
|
MD5 | f03aac10b2dc281b69bac4e165d73a23 |
|
BLAKE2b-256 | 859e8e46ae91ed7122d2c7726197748a8f63ec60affea917b47ad479b50ea1f3 |
Close
Hashes for anltk-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ec48f5f326fd4d29dabeec7ec3f487e588c07fb4275f9244d5b0968f0a13483 |
|
MD5 | ceec13b273a9840fd34c72208b687ddf |
|
BLAKE2b-256 | f1a68d94e97890cfbac271b69efe78e7348371d12ade738072c6b636d8834c5c |
Close
Hashes for anltk-1.0.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02ed345b00fd85c0dade434b5e1d3b559e5a147ee5fa404096a73eb279158472 |
|
MD5 | cf4ba4627edbef360cfaab63ef4b5f15 |
|
BLAKE2b-256 | aba0bcd20a0d3ca33c328342c0739c277727cf9c646627b5b69c4aabdedcf484 |
Close
Hashes for anltk-1.0.3-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3472fee954845ff1d65c5900c9f21a813635451d4fe360aa2e2a7ff4ea69c11e |
|
MD5 | 2bf608a8a8cef034c34f41a1a89ed51c |
|
BLAKE2b-256 | 19df95ca8245b6aa834ee009be2ff1c87a9f6abe05af8d3a0ac296ae07bac55d |
Close
Hashes for anltk-1.0.3-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b665f2068d763149069a36ee628537262af3c8c5b10cde0f361ba921246de75b |
|
MD5 | 3c385bf57ef92cb9c6e8118dd4add0b1 |
|
BLAKE2b-256 | 33eddb7e5b9a26c1b0618cfae0bdebba1ff2fa791e9e8ecb89e998327f7b8697 |
Close
Hashes for anltk-1.0.3-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e047f3d358ea27002c05ddeefa8c7214d29e748dc1f8b7b91c08e60ee782df3 |
|
MD5 | 5d132357f0843d26b7d8603fbfad43ac |
|
BLAKE2b-256 | 1445649e0161e02e98b938a042303e9d0b11a68c15fe13e5d06afafbf289e249 |
Close
Hashes for anltk-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b74978569e0dc89695fde922f15e74e0f08242a2e25702816dd4af77f4b37b13 |
|
MD5 | 17a7717505ea48806fc0e53bbfaf7b2d |
|
BLAKE2b-256 | 481660cb4cc339bcf74cb15647c9ee708e8a8d3edcf526919bc7aee27afe92ac |
Close
Hashes for anltk-1.0.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1661e1e3fb87ba87ea2606b6c56696796bbb5e4e3fb31f16ce1e948293a62026 |
|
MD5 | 0b363df51ea3480c2c6ec410079f5652 |
|
BLAKE2b-256 | 99c85690484cd7b6ed082e225f387c57b92a85f1dc4536c82ba649c44d07055c |
Close
Hashes for anltk-1.0.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa54d53d309cf39f6a65bda40caacf634d7cddb38d389bc03869911b47b047d5 |
|
MD5 | 486ac2fbe4d20ddb0845dfbdccd6d987 |
|
BLAKE2b-256 | 05c5841bab4a6ef58a0142b5f2bcfe28ac0d569d7e27f8c595c7a5a3db78452f |
Close
Hashes for anltk-1.0.3-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaea6290add7de9037fc5df1007d776ec66e87fdc934918f10489d3e3c3fefdc |
|
MD5 | 33baa6cc9bb7a70302775d9b4059e8a2 |
|
BLAKE2b-256 | 144d76071fd2fbb638b3f7d07b134146f5c03c4aba84fc01304eaf9f358ca1ca |
Close
Hashes for anltk-1.0.3-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a3d690147f4e7b25ae704a4d2772691ac381e44305e1772341d7c2e54e04a8f |
|
MD5 | 009b3fa464815f7f3899fb4347782735 |
|
BLAKE2b-256 | f131336e195be9ffe0f19ba3ed4ac7c1061893fa117c0d1884aea481ae2b58f9 |
Close
Hashes for anltk-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 032e65f197af94783bfd6e9b9ad17bef66c137212b182ed63911265693988358 |
|
MD5 | d4fa9d343ba981a771731bfd4b3245dd |
|
BLAKE2b-256 | 7e42c5bdf4aad9d6ae8527130d7b85b7fff7c2f4649a643f199bd3191f2a87ae |
Close
Hashes for anltk-1.0.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa462767bbb1d6104ebfc554f3973cac30d4a361c855e6f45652ec8e07c338a4 |
|
MD5 | b863d79f9c2afba6be222cad6ce28a47 |
|
BLAKE2b-256 | 0ef5d786eb34b43448dfbd65d9fd1f098a7c426edaf39ac937ac148b07cef12f |
Close
Hashes for anltk-1.0.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4e89bc12ef2b75e3fa7aa2146474234d9b93d66e57ea40f408ef77449afbb7e |
|
MD5 | 71dfa3498c40376b1415b323c9d8ed69 |
|
BLAKE2b-256 | fdeb06be691d446150a2a4d7b1d104f371bd8684be04ce9693a5024e56baa79b |
Close
Hashes for anltk-1.0.3-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e098ea74af40cbe868b3497ff03a330a6d1013209a8d4ad7005a7f240dc4eab |
|
MD5 | 48e7d612627010b73096d56847f58c33 |
|
BLAKE2b-256 | 70d2b983e74bea68f13ca88baa16692f293b3aebc7903905a8856a7b3063e46b |
Close
Hashes for anltk-1.0.3-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cf85f1da4864b3785568e35ce8cf9234ba6c0b5a93d1856ea2590fbd3d5fa07 |
|
MD5 | 7ea641d8ad2f47f84f9a5c063fcdf2e0 |
|
BLAKE2b-256 | 595e0faf25db221b45ac1f4441874a6aec1bf6af1f8780faf55275a342e3d72e |
Close
Hashes for anltk-1.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d47daf444c33600731b474f9bb7142f144a5c9ed36566d57644af9ba851f9f20 |
|
MD5 | 0d7c239d47f86b2c14f9a82c3b9394f9 |
|
BLAKE2b-256 | e58b21218315a04952182680b4d6c0c7e8501b07462046c2d085e9f52587d305 |
Close
Hashes for anltk-1.0.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a583b9074f8a35c9bcba2fb01a3f4ddd0aa4fa58cfa776cfce6b0ab6a19a2e03 |
|
MD5 | b7c4118c73d7f3240b657487c9d7a9c2 |
|
BLAKE2b-256 | 1f02c3a7df761b24aed67507fa94cd3099d764a6a13a23289ba667f4b14f483f |
Close
Hashes for anltk-1.0.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05c3e7f25f8c65c867abddc9ef9367ca5bca967029a5b2733bd00eaf995b2395 |
|
MD5 | 743ddd8d5222ab4cef8e51c17e04028b |
|
BLAKE2b-256 | 66b60d60dc1be8c74b56705e64c8bed74cc8fef60dd4a1709e4e18f2a2afbcc4 |
Close
Hashes for anltk-1.0.3-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7ef70302e52d451eaa8bc19b7c8bddd3532301880aa99f0bac042f3b13c0fc1 |
|
MD5 | 16185421b2468ebf42a026b2167aee25 |
|
BLAKE2b-256 | 22fc828017de827701955574d7f9b7bd30c1c3638dbf694da41c29a600dd229b |
Close
Hashes for anltk-1.0.3-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0fff95e44fc351932111955ca99021de740fa02b6c3ebde338c5dc6f8d7f5a6 |
|
MD5 | 8c82ee48954010eafd3a85bece5c073e |
|
BLAKE2b-256 | a2f60382c4067183dc731110e1be122230f550379237f67680c7e45cd7c10469 |
Close
Hashes for anltk-1.0.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83e1b64e6d6af5a47c23fd4035ad9070c3d1417a1b3bd8531dc1c816474e5563 |
|
MD5 | 0e52fab5958852a1322612258ba02ca5 |
|
BLAKE2b-256 | 52bee7634471d430d470948bc83b09138caa92f1186a4944c9d16b30160645ea |
Close
Hashes for anltk-1.0.3-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f08db2132fd9f2b67d3e81dd7373d6d782581ab1cf321a8638c2408e655c1e0 |
|
MD5 | 9a958203760df76ef127b2da5004755a |
|
BLAKE2b-256 | 08359dfc322f23d0c654ffabd7307796f4e00e2913d25a6384f4583d1a3a24fd |
Close
Hashes for anltk-1.0.3-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6099f8eea06e41ea3e04fa96e3dab9053e606a7e6618376454fae28f2c092350 |
|
MD5 | bf39099996c1466ecabb6a61ed312dec |
|
BLAKE2b-256 | abe736c70c7447821c17b4e6ddc962632d9ba1d81f1f1bb1dc6978cb3612f6ab |
Close
Hashes for anltk-1.0.3-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12a0bc6a518b6624fce6f6ff08a2e5fa743f01d746a19cd00cd7b360693883a9 |
|
MD5 | 934cbc9f1e85f1868a02d0b7817f4590 |
|
BLAKE2b-256 | c0bb7e7d02679f9d04dfa422c78a6e84765ecb98a404ff3aaac86d6c4a027cee |
Close
Hashes for anltk-1.0.3-cp36-cp36m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aab208e84a13bbfa298fcc7b665c83fb0034e2fe9f78ed91b25b0c20ff7b6627 |
|
MD5 | e53042b7bd4aea809ebc9f4cc7b43b85 |
|
BLAKE2b-256 | 63e90a6a2ffab3aebfc44ce0d7fbd26fff8998251e445e74b2c4598f504e615e |
Close
Hashes for anltk-1.0.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b4d023df08fd9ec62346fba894f78c964301ee2c0625e7ee161fbe38759c528 |
|
MD5 | 7ea046525dcf1d97eec76d7dd68ce1b8 |
|
BLAKE2b-256 | 4fc12af60bb2f37dba80b736eec4fd60072ad997f423de53afa062a0d0fb3d1b |
Close
Hashes for anltk-1.0.3-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dca98681d406ddf637f2cf3b0edea15eb3e8cc910a649a50e719eae40c568525 |
|
MD5 | 3d1ac11517488852e163c5e8e0bca11f |
|
BLAKE2b-256 | ebaece77ec949cf42d0f80cf0601798789403c5581392e4f55a2aaba26645a81 |