Arabic language processing toolkit
Project description
Arabic Natural Language Toolkit (ANLTK)
ANLTK is a set of Arabic natural language processing tools. developed with focus on simplicity and performance.
ANLTK is a C++ library, with python bindings.
Installation
for python :
pip install anltk
Building
Note: Currently only tested on Linux, prebuilt python wheels are available for Linux, Windows, Macos on pypi
Dependencies:
- utfcpp, automatically downloaded.
- utf8proc, automatically downlaoded.
- C++ Compiler that supports c++17.
- Python3, meson, ninja
pip install meson
pip install ninja
git clone https://github.com/Abdullah-AlAttar/anltk.git \
&& cd anltk/ \
&& meson build --buildtype=release -Dbuild_tests=false \
&& cd build \
&& ninja \
&& cd ../ \
&& pip install -e .
Usage Examples:
C++ API :
#include "anltk/anltk.hpp"
#include <iostream>
#include <string>
int main()
{
std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";
std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
// >bjd hwz HTy klmn sEfS qr$t vx* DZg
std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";
std::cout << anltk::remove_tashkeel(text) << '\n';
// فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
// Third paramters is a stop_list, charactres in this list won't be removed
std::cout << anltk::remove_non_alpha(text, " ") << '\n';
// فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان
anltk::TafqitOptions opts;
std::cout<< anltk::tafqit(15000120, opts) <<'\n';
// خمسة عشر مليونًا ومائة وعشرون
}
Python API
import anltk
ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg
print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))
# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
print(anltk.tafqit(15000120))
# خمسة عشر مليونًا ومائة وعشرون
For list of features see Features.md
Benchmarks
Processing a file containing 500000 Line, 6787731 Word, 112704541 Character. the task is to remove diacritics / transliterate to buckwalter
Buckwatler transliteration
Method | Time | ||
---|---|---|---|
anltk python-api | 1.379 seconds | ||
python camel_tools | 11.46 seconds |
Remove Diacritics
Method | Time | ||
---|---|---|---|
anltk python-api | 0.989 seconds | ||
python camel_tools | 4.892 seconds |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
anltk-1.0.4.tar.gz
(164.1 kB
view hashes)
Built Distributions
anltk-1.0.4-cp311-cp311-win_amd64.whl
(168.7 kB
view hashes)
anltk-1.0.4-cp310-cp310-win_amd64.whl
(168.6 kB
view hashes)
anltk-1.0.4-cp39-cp39-win_amd64.whl
(164.3 kB
view hashes)
anltk-1.0.4-cp38-cp38-win_amd64.whl
(168.6 kB
view hashes)
anltk-1.0.4-cp37-cp37m-win_amd64.whl
(167.3 kB
view hashes)
anltk-1.0.4-cp36-cp36m-win_amd64.whl
(167.2 kB
view hashes)
Close
Hashes for anltk-1.0.4-pp39-pypy39_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5cd70b92b872bc34bf2be0d3048e289de1f1419510269522dc99910718bc26e |
|
MD5 | 6f126d4865e1080f1c2a46462fc16750 |
|
BLAKE2b-256 | 659f7bfaa56be7bbc1e43668a8516241e5e4f68b8ce9b87dff8ef50d341da07b |
Close
Hashes for anltk-1.0.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10b022d0abfca95bdbfbf4284a21802cd4d7f1af699ecbb4532328fb6aba447d |
|
MD5 | 2b7a7b4944ae0b6645acfd94b5c25c71 |
|
BLAKE2b-256 | f2090654ac28157cdf8dd49aa7b94a81091aeb98af6bedd69981c0bad10a2eef |
Close
Hashes for anltk-1.0.4-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4413fd5ad0709a8ac23356ff9b4ec70405a7ac8266fc1e077b86fc12a752366a |
|
MD5 | 72595b49322196313c6fb877ac844294 |
|
BLAKE2b-256 | d96d551d8edf45c360fc27f1a5e1d86ded1ab62f211ae799839ed3313a2b7298 |
Close
Hashes for anltk-1.0.4-pp38-pypy38_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9cf85ab2f6279bd3215534a8e664965ed212d89dc6a730eff346b6eb5e2421d |
|
MD5 | f7ab597870e2dae7195d75962afc986a |
|
BLAKE2b-256 | ef0ffec82fb26eaa67ecc0fe0b3a24ae7dfdaa841c4c31a784672d1fe5627fa5 |
Close
Hashes for anltk-1.0.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68a74a8c2e34cfcdecf1a829aea7344c8b29d6394ade5a5b76c12f9d7a9bb4bb |
|
MD5 | 79c0e8ef5493497daef6a3065d3b771a |
|
BLAKE2b-256 | 3725cb782da18dbe4d9ca05edded1ebe4b5465b2389918018a6fa3f9883bf8a2 |
Close
Hashes for anltk-1.0.4-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5450c4db70672dca688374671ebd3201e44ec86880d010e17935847a73b29032 |
|
MD5 | 008c3667a43f291ae6fc763701a9f42d |
|
BLAKE2b-256 | 9ec35a693d3e394a7c8c22ada8f521d9c9ce398c749377e90d392ad473537ba1 |
Close
Hashes for anltk-1.0.4-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45e2f0e6a90f51dd97e35da94ffe437120f7b003222dfe6f8de5333303963cf4 |
|
MD5 | 7aaae52b8d68b3f07be8856e49412cfc |
|
BLAKE2b-256 | c838fd40d1d2986199dc4938da4fd439e743104badd50a44e9d7e37352a34b20 |
Close
Hashes for anltk-1.0.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36295dc683d239e10293e7f03ec840533c20f3737fa1cbf26c2e1dc6579949bf |
|
MD5 | 360b8b83b6b7dcaac29ec66234faa739 |
|
BLAKE2b-256 | dfc5032621b3cd43c414b4e5c8ac39613be3578b42dc706ec9d0fce2ff1bf4dd |
Close
Hashes for anltk-1.0.4-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a8a0183e6dcd75eb1f8c5bc35ca7d5042d178b1229aafc6dac043515ee29df4 |
|
MD5 | 933589a7cf0dfa6a69f877de1e07b58d |
|
BLAKE2b-256 | 934d600a0574cc19d12ad879355a893cbfdd44172d4f2c326c1a7be959e324dd |
Close
Hashes for anltk-1.0.4-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd126584ee4c2711dc803e1d35fca2a4f276be98eb49ddf4dd52b0b69e9e7d89 |
|
MD5 | 34476e94c8900ab0532092da1182cdd8 |
|
BLAKE2b-256 | dd4b0523b803052201ca9f5cd8438e4ccc2debe673f0cf1356c7aaaba3646cc0 |
Close
Hashes for anltk-1.0.4-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69ee2c68ea2292de0ac7ecf4dfb055c15bb3d08b7547b425dce98849a692de6e |
|
MD5 | 61e7b90d5a2181c0370d7e56f50ff71c |
|
BLAKE2b-256 | 1800ed1d8d01d836485986c35fa5c5dad057860f8bd10f596f837462b587f0cb |
Close
Hashes for anltk-1.0.4-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4c52e42cf73773ee536270f9704be780548883f7d622ef17171197f5dcfc729 |
|
MD5 | d9196b4585749b1351c6c6610b71cdf1 |
|
BLAKE2b-256 | dd8189c925653e696da9d3e7490ac5136c1df67488ce167484c436f39b33c761 |
Close
Hashes for anltk-1.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13c7dadf593bbd56d812084d9283433f27bda52db1a8e1e24b46ef1662d7fb13 |
|
MD5 | 89e18208d6cee093a00f6871e3e0d0b1 |
|
BLAKE2b-256 | ec7eed52c3088b17d6a55fc278b52591158ad4d6ce94a5762bf421bbdac180d1 |
Close
Hashes for anltk-1.0.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9996bf1f9148bf81bfba59c5b9e90e4c06aab7b356d36d965a3cd89c9f26289a |
|
MD5 | 8236496c21bdcf0d45b8245b55a567e3 |
|
BLAKE2b-256 | 41de6c8ee5057338e475f960c970dfd0ef990f0df6c23dc1fab22a8a9426166b |
Close
Hashes for anltk-1.0.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4c599c4a78ab8769d89dfd2c4fd9880b95afd7363079ffb12f5c8e0a3c3c917 |
|
MD5 | fee32234a7e6587f4bf02164fe2dc5c0 |
|
BLAKE2b-256 | 637bb45832166549eff664c84941eb4102b17bcdefa7f4de2ca693869db71094 |
Close
Hashes for anltk-1.0.4-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e10f49c602a7a0fe96ff26dffec1983dd078a564f4b69c0d5dd5535f0c4a1dc |
|
MD5 | f978483931c73525c8d0478de3829d8e |
|
BLAKE2b-256 | 064bbb5734b94898a5e35f6b0df6db0d030d1d6203f3112261f69cb87de5cd49 |
Close
Hashes for anltk-1.0.4-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40dac0e5bfbea7fc1a2ee0dd43c6be56a0aa78af54d5eb373f69bf0ed49a070d |
|
MD5 | 913a4aa4b40ec5a5060dddb77969ab0f |
|
BLAKE2b-256 | 864b2ee84a5c08e2290b3c712d6a488849e3afb8247b0b8f4b3f7cb982c5570a |
Close
Hashes for anltk-1.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cc4f6bc9fd7d66b91602d4692df1f5661d5a32557905e735c5433fbf1679dfe |
|
MD5 | 07f14c36ddcee5c0a07afbefad3da594 |
|
BLAKE2b-256 | cb77bad6d1bca971d70732d0c6065cb212656280ef593cf3b4b99ef64e30d98e |
Close
Hashes for anltk-1.0.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a47b4a51310f45579c593e67eb3aaeb12a573c837cbc2234ba25391b566e27ba |
|
MD5 | 81f706c2a8bb8680f7f032e25f6d13f4 |
|
BLAKE2b-256 | d61430ab2c18e8e6cc0de07cbe8f86f9837a761b615eba6918fcb498770ee759 |
Close
Hashes for anltk-1.0.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55877f5af0bdf181a46decc52274b36dc382a5d180ab2eb0cb281909981e13ac |
|
MD5 | 488628790d134c3f55a5b9600e55b71c |
|
BLAKE2b-256 | 0bd3521715fcb9fcd8840b20e88347c04db5d0de166101ee4f916933a8faccb0 |
Close
Hashes for anltk-1.0.4-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27d52bafb749c3c97921c45df6819a8a4241849311e397365a4a47db1e63748c |
|
MD5 | 2654e630f71907a9e49de080b75cf47f |
|
BLAKE2b-256 | ee78021feeec51a23788178b8801cbe36d94abc7c0cea45a5f63fdeb52181afc |
Close
Hashes for anltk-1.0.4-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1596bfa633fe0d909c3f3a611a59b784681817820337e812b38b410864079824 |
|
MD5 | 895e23578d6072ebb57ef3561de7b645 |
|
BLAKE2b-256 | 189b1c2e98405e5b78be745a0fec1b6a45264971b610d9ec097ccf5fd75f160d |
Close
Hashes for anltk-1.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f132e64f2c0feefb5cf807a0337add8bacaa33bc6c71c9737451660b35ce4b51 |
|
MD5 | 7a4bfdc50561a546a03a4fb750dd5e75 |
|
BLAKE2b-256 | c941788d14d2c98c810d9b1386eb3ae0b2952000a48408b4838add8725e7632d |
Close
Hashes for anltk-1.0.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e95a4655df036b0a64ca1b05bcca917fac1386bceae29e4352be590d884a52e |
|
MD5 | a41fd2a10759ff844d2900a983a83a9b |
|
BLAKE2b-256 | 94d4c6f3cf75499f9a6cda1715908b7d3ea1caa41f75c447d926780e27b82026 |
Close
Hashes for anltk-1.0.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8281fbb196b94ee79116aede71db7811f61eaf27566cf194932202d2ce77d69 |
|
MD5 | ad71748c4ea7d82d447e39f607f9b151 |
|
BLAKE2b-256 | 06e538e28d12bb7fbeeae8ce3310e3fccc5d14fe5da1a85dde7e5d004a009bf7 |
Close
Hashes for anltk-1.0.4-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d53139b7c01c20604340ba8317c4f81457bc6516cad115c9d0cfaeffb06a7bb5 |
|
MD5 | d5d54aa0d727620950d6e3fed3634a2a |
|
BLAKE2b-256 | f3201bc9bef7eb00d53b4e50c8dbbe3d7107a504e461045ab3f4da1c11b8e644 |
Close
Hashes for anltk-1.0.4-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7b353e6a32238da8930613b51518c22a147ea90e290f189e64bf7d2015dd272 |
|
MD5 | cec4dfaae6a5c9eaba9c041c6628b384 |
|
BLAKE2b-256 | 4d0c32fab1d6e12eb9bf9666c8772ae099ba66710a18d42fe2ee2632ac17aab3 |
Close
Hashes for anltk-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d4ca6e8e269e927725a1d9f256502b1b53a41b0f447df0736467a6f0714a297 |
|
MD5 | 29060b74243def18a8daea53969b4b1e |
|
BLAKE2b-256 | 1135406be2121039cd7ac06ed556344f746f67ddc18568ac8f156c78cb3b77a8 |
Close
Hashes for anltk-1.0.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51fbb20fad1e969b746e5b9f7b44b72b2e6bad169d2113ff56e5c8e666764ad5 |
|
MD5 | 7bfcfddecbbad6587ba749344284e851 |
|
BLAKE2b-256 | ce9984a55e910a70ea19ac8c8f9c202d7a0685b526f7c534cbca8998c5166258 |
Close
Hashes for anltk-1.0.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dafd58962221814d65a3e2d6d4fcc693b89ff72bc383aa1b3bb82122459ee56 |
|
MD5 | b7c2b39070b595009baa01cbe19ff002 |
|
BLAKE2b-256 | 6247ba3c86071450a0995736f12eae7e09fd19f8486b22f14cb4167e9e997391 |
Close
Hashes for anltk-1.0.4-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fcb6501205b83b67101f5db9dd95869a36b51cbf11b3e49663fc28f87a1c5d4 |
|
MD5 | 5d838fe27fd4ff6ea95ca45aaa3db365 |
|
BLAKE2b-256 | cab699bf866903937d1efc2d4f14384f1b44bd01f4e26e0649b2cf11278782a0 |
Close
Hashes for anltk-1.0.4-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 294ae87e0a610ae8132aed77e3f652e6def08e1cdc9fed85981c21d0cf6cd1ad |
|
MD5 | 3b445315f0ff930713f8a97d6b1281fd |
|
BLAKE2b-256 | 89b30ae677035f7e068a8195c3354c557f9ab6e68e3fe34fe6d68d1192e0cd31 |
Close
Hashes for anltk-1.0.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 718e27ceebbbdb71599d32b03820be73d683ed99579d7e6e922f4c4529854116 |
|
MD5 | 7e108280f9a9a6d7472d19c062900576 |
|
BLAKE2b-256 | ef7ccf4de2faebfd76407c0656d473bfacd1c1d3954fd27f862bf2daaabe7316 |
Close
Hashes for anltk-1.0.4-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3979be0fcbe251ba2b41aa864ee7aca1110fa5442b7fdc4fef61baf87bd4a2e1 |
|
MD5 | d2d97c57b0c585cd3ecc1564aa91f809 |
|
BLAKE2b-256 | 6f8830f6d5de93766e122e0987d1e656442f732497ea3458ff6a39f005eef3df |
Close
Hashes for anltk-1.0.4-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47c4fa36733509fe7b7e9ad2e9bb3acd4b0980e4695234a3080a07abd4548a9d |
|
MD5 | 65c239ef4b45404a8943f128e27c7d0a |
|
BLAKE2b-256 | 9b8b81d5181d83d87114e9f15ed803c4ace366ddaba26e2c343cb920b1d19fe4 |
Close
Hashes for anltk-1.0.4-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b07b910b6e5ee83709729febc193824777a6fa00e67bbab9d31ec5d4c7af5305 |
|
MD5 | ee479b6414c041a8d6cd946d09bb2c91 |
|
BLAKE2b-256 | 4c6adabd246d344dff91a10044c6804f1bef7194303363e5e62b62930ccc6788 |
Close
Hashes for anltk-1.0.4-cp36-cp36m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d81bc8e749c58d7df98f884a3abd7e94e77e785badbec1e13d08f8d395898b6 |
|
MD5 | 85a53e059a1651750bc4175366339d6b |
|
BLAKE2b-256 | 799f5c485921508f66928a2125bd52ea661d8acb18f9f2f80b4659056f04fc7e |
Close
Hashes for anltk-1.0.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b44638e58492145b812ea06a443753b2f0d10fe632a113c57e68c67afe51929 |
|
MD5 | 2474e66825a32a12cfec210b28ece9ea |
|
BLAKE2b-256 | 6eada311f410861c2f486ae4e9577ec1f77c60ab097bc83959a077f66367c199 |
Close
Hashes for anltk-1.0.4-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8f74226d809656a92d9a69752a6837eec1022f3427ed6ca9e9107bbfad2aec0 |
|
MD5 | cac1830d07b4fc87cc7731ebd790f0f3 |
|
BLAKE2b-256 | 456a8621ef966a2f572648a97e67dd0753bdba8301298c00680f02834186e99e |