Skip to main content

Arabic language processing toolkit

Project description

example workflow example workflow PyPI version License

Arabic Natural Language Toolkit (ANLTK)

ANLTK is a set of Arabic natural language processing tools. developed with focus on simplicity and performance.

ANLTK is a C++ library, with python bindings.

Installation

for python :

pip install anltk

Building

Note: Currently only tested on Linux, prebuilt python wheels are available for Linux, Windows, Macos on pypi

Dependencies:

  • utfcpp, automatically downloaded.
  • utf8proc, automatically downlaoded.
  • C++ Compiler that supports c++17.
  • Python3, meson, ninja
pip install meson
pip install ninja
git clone https://github.com/Abdullah-AlAttar/anltk.git \
    && cd anltk/ \
    && meson build --buildtype=release -Dbuild_tests=false \
    && cd build \
    && ninja \
    && cd ../ \
    && pip install -e .

Usage Examples:

C++ API :

#include "anltk/anltk.hpp"
#include <iostream>
#include <string>

int main()
{

    std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";

    std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
    // >bjd hwz HTy klmn sEfS qr$t vx* DZg

    std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";

    std::cout << anltk::remove_tashkeel(text) << '\n';
    // فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

    // Third paramters is a stop_list, charactres in this list won't be removed
    std::cout << anltk::remove_non_alpha(text, " ") << '\n';
    // فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان

    anltk::TafqitOptions opts;
    std::cout<< anltk::tafqit(15000120, opts) <<'\n';
    // خمسة عشر مليونًا ومائة وعشرون
}

Python API

import anltk


ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg

print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))

# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

print(anltk.tafqit(15000120))
# خمسة عشر مليونًا ومائة وعشرون

For list of features see Features.md

Benchmarks

Processing a file containing 500000 Line, 6787731 Word, 112704541 Character. the task is to remove diacritics / transliterate to buckwalter

Buckwatler transliteration

Method Time
anltk python-api 1.379 seconds
python camel_tools 11.46 seconds

Remove Diacritics

Method Time
anltk python-api 0.989 seconds
python camel_tools 4.892 seconds

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anltk-1.0.1.tar.gz (163.9 kB view details)

Uploaded Source

Built Distributions

anltk-1.0.1-pp37-pypy37_pp73-win32.whl (136.0 kB view details)

Uploaded PyPy Windows x86

anltk-1.0.1-pp37-pypy37_pp73-manylinux2010_x86_64.whl (247.4 kB view details)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

anltk-1.0.1-pp36-pypy36_pp73-win32.whl (135.9 kB view details)

Uploaded PyPy Windows x86

anltk-1.0.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl (247.4 kB view details)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

anltk-1.0.1-cp39-cp39-win_amd64.whl (154.1 kB view details)

Uploaded CPython 3.9 Windows x86-64

anltk-1.0.1-cp39-cp39-win32.whl (137.0 kB view details)

Uploaded CPython 3.9 Windows x86

anltk-1.0.1-cp39-cp39-manylinux2010_x86_64.whl (249.1 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

anltk-1.0.1-cp39-cp39-manylinux2010_i686.whl (261.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

anltk-1.0.1-cp38-cp38-win_amd64.whl (157.0 kB view details)

Uploaded CPython 3.8 Windows x86-64

anltk-1.0.1-cp38-cp38-win32.whl (136.9 kB view details)

Uploaded CPython 3.8 Windows x86

anltk-1.0.1-cp38-cp38-manylinux2010_x86_64.whl (248.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

anltk-1.0.1-cp38-cp38-manylinux2010_i686.whl (260.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

anltk-1.0.1-cp37-cp37m-win_amd64.whl (156.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

anltk-1.0.1-cp37-cp37m-win32.whl (137.7 kB view details)

Uploaded CPython 3.7m Windows x86

anltk-1.0.1-cp37-cp37m-manylinux2010_x86_64.whl (253.4 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

anltk-1.0.1-cp37-cp37m-manylinux2010_i686.whl (264.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

anltk-1.0.1-cp36-cp36m-win_amd64.whl (156.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

anltk-1.0.1-cp36-cp36m-win32.whl (137.7 kB view details)

Uploaded CPython 3.6m Windows x86

anltk-1.0.1-cp36-cp36m-manylinux2010_x86_64.whl (253.0 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

anltk-1.0.1-cp36-cp36m-manylinux2010_i686.whl (264.4 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

anltk-1.0.1-cp35-cp35m-win_amd64.whl (156.5 kB view details)

Uploaded CPython 3.5m Windows x86-64

anltk-1.0.1-cp35-cp35m-win32.whl (137.7 kB view details)

Uploaded CPython 3.5m Windows x86

anltk-1.0.1-cp35-cp35m-manylinux2010_x86_64.whl (253.0 kB view details)

Uploaded CPython 3.5m manylinux: glibc 2.12+ x86-64

anltk-1.0.1-cp35-cp35m-manylinux2010_i686.whl (264.4 kB view details)

Uploaded CPython 3.5m manylinux: glibc 2.12+ i686

File details

Details for the file anltk-1.0.1.tar.gz.

File metadata

  • Download URL: anltk-1.0.1.tar.gz
  • Upload date:
  • Size: 163.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b8f8e77c124358883b29fa32946bc0d9ef464efe66598f8e9d2f3dad7c60479a
MD5 6369ad2bfec43df5066a99ce8ffe71dd
BLAKE2b-256 4b39098a65dd7546223dbd75978c1268c900edba1ebcb97a6399b75bfa92df66

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-pp37-pypy37_pp73-win32.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-pp37-pypy37_pp73-win32.whl
Algorithm Hash digest
SHA256 dd47966a7937f0f7c1bcf56fa27a8a312bd740f7855be6ff58b277ab535dc9b0
MD5 c22dbc49ab4bd8fa9096942496d2cffb
BLAKE2b-256 a109b85a50b468d59d697b0cd44ff96cf53ab487e0c5a76998fc980d44ca6ff7

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-pp37-pypy37_pp73-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 57783295019ae1aebf716c0159fb42e8542a52915b4cfae52b3a7d557917c4a7
MD5 20bb49b032d555aaf4b375ad056b7f0e
BLAKE2b-256 f4a40db73541380281d4acb8a07cc9cbfb4e7f9d8e3dfe69acf9409c8d73ebf7

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-pp36-pypy36_pp73-win32.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-pp36-pypy36_pp73-win32.whl
Algorithm Hash digest
SHA256 006eb71801e947853ca45e91f6bca0da4450e6b7a5ea10205ca392ab913ff812
MD5 b5000a4cc71f46e5edbfca7821e90802
BLAKE2b-256 8b310cd31521e1763a49b7516d82cdb8615ca0dc1f2ead98f46a133f5a954d62

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 be8f593889e130859d6bc64046b208cf253c6535c22b6803cd6b19b4a7771a71
MD5 0318bfb31b92017a1858b36b5b55cdfd
BLAKE2b-256 5cf8ccd773864c98df0a31814c59a325d7864f8c5d981aaeb807025eb57b7e8c

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 154.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 720e0b45ed8fbd4534f3e0e369cdc842f346ac8a1156defe2802095b4fefdf42
MD5 9bfd57f829fc22cb982dd5b9550468da
BLAKE2b-256 c20929d0ac2e2c5742866b8dae14bdca8179598944808438ce7a357119a96aef

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp39-cp39-win32.whl.

File metadata

  • Download URL: anltk-1.0.1-cp39-cp39-win32.whl
  • Upload date:
  • Size: 137.0 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 9bc3ad8e733bcccbafadf59a6a86ad48431a21eb82a4e85a3a081e05bbaedf82
MD5 ec650e8f08aa8c95f5d87dc24ea1e06e
BLAKE2b-256 e9c58fcb619957b718d06ee8408322264426c1f85d6e4e3243cf24e427e69932

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b339d37f57269589e8b40d507b00e24c8d8c0adc9c96f421655643e8e344d232
MD5 424c00f19e0a853df6a105a8eb78d8df
BLAKE2b-256 3aa6c52f63c68fabee9817935c8c2dbb3ec131b15ab61c3abdce58dcdd1bd1fb

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp39-cp39-manylinux2010_i686.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp39-cp39-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 0d946945c819f6dfd354def31ac04ef418e8e20863c6993a5cb1be2b2771ec9f
MD5 9d351fb8ca44eeee0e71e10385cd2552
BLAKE2b-256 59274f41b4e0c74bcc35699b84c04ba117e8b8d8125f5d05730511143cb624b8

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 157.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 3d1b362653071e0f0469249eae3be84e0d58600b28e1c6d0dfb68286dbf3816f
MD5 1889fd14cbcc7c4265f545df810dd504
BLAKE2b-256 e6c935d8a749d3de82ff88a002b23c2e42aced27bdb8efc0fd28f0929e5aaf54

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp38-cp38-win32.whl.

File metadata

  • Download URL: anltk-1.0.1-cp38-cp38-win32.whl
  • Upload date:
  • Size: 136.9 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 4d179d313c8418cb6a48f992e3dd28fc8dd47a8f9bfb67f1865da2fd29e92512
MD5 7cdfd0389c10b7990964478afced9efe
BLAKE2b-256 d4dfbebbebb0cb5a4114157404096d0d5e687ac35dec875cfd9a6c5c6b232e94

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 73425d363ca760ca20bf2be39e07f3fb5d30a691093c4e4bc62a9a195e4cfec6
MD5 28749ac0e7cefeda6ae8cab8636340b2
BLAKE2b-256 59e0109741ddc8c4dea22bc09ab3c2a202b3097dd5edb65e0b8be5518e6c3f2d

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp38-cp38-manylinux2010_i686.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp38-cp38-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 9fb72fa77274debc9453655c325268879cb96e8a356df9d2ad50d0f463b8aa5b
MD5 74748340036a07dc2049462ef68b6f34
BLAKE2b-256 37c6d6f5f99538787ced00bee502053d1ba9c793f4771a4d1f6176984e7fe104

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 156.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 cf5d1dc70b36a7d0d5222436c555ef065aa3fb58f9987ef02437df51ac20ed7d
MD5 927154975b7a9afc15c527fec372bcfe
BLAKE2b-256 1252857cd257f013054cc93d068b1042fda15138321eb9135ceb7f9fdab3b03e

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: anltk-1.0.1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 137.7 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 f3411f4168a453e039e7df94cd8992a547c6bcec618d9b366d60ea43e7021a4d
MD5 e5d623d2acc11a366eda8465e04be982
BLAKE2b-256 494c355d314070139b725a478f9a7fc8d369db00973bebef0265397c1c2dc645

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 420cd4400dd8068b2c06826e1d6d8d7455b32d47480acbfac84be6e164c12e8b
MD5 b3de62cdc66748a2a2df0cd32703ced7
BLAKE2b-256 4c1902b05bf1459cffa481689a30d3dc0b5e10d1836e08025632a63b4092b653

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp37-cp37m-manylinux2010_i686.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp37-cp37m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 bed043e41f79b690bfb29a42ac3fdfacb7b13603472dd5b1266ca4781ccc5920
MD5 9b659ed13ac23d7720cf64671dc77af8
BLAKE2b-256 060c28432b603191226bb8ede87346d7e3e536371da522dbf4bd59b690db99cd

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 156.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 243a89652ac28405f012464f3214a16f58f5122f4382b8ec9c40ccea2b20f673
MD5 86a8afa201f931e4a6e7eeca28ed9b42
BLAKE2b-256 8cb3bbb09c0046b309c63cf304e6e793d0b1bcbd4bc9a6533756c4c9ad85eb26

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp36-cp36m-win32.whl.

File metadata

  • Download URL: anltk-1.0.1-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 137.7 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 5d4592c6c70f3c986320cf333c3424dde3f5656a99d9f79f1be35d646c00b88c
MD5 d566c330d9d917ab76feda74dacf1619
BLAKE2b-256 829eb26d4536e85986fe67124d079761080e9754904194464c0bad05fba1c4a7

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0ffd56310849dec6a4404bc5eedbbcf0d47108c679e2b81973535b81a949e157
MD5 d2c01705a2ecc3d1813476b0cc3fb0ff
BLAKE2b-256 808f8ec4d564859ff134dcb5e52a787b83171679a00dc8e9e802c86da7874fab

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp36-cp36m-manylinux2010_i686.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp36-cp36m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 fa6e0c4816cb92e326f636fb70bd2a8887e0726e5fc85d4be7b4355f142b4234
MD5 6f09e31965a9740bdf7d5d5d4a948d16
BLAKE2b-256 36071d3d9ab14b1f648acefdfc81768f3b6c58a6ad14f3772fef082336cc62cd

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 156.5 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 3a9ac86910364ce90369b884327c88ad3a8c9dc75ea97c215f7d46012cd23474
MD5 1604e7617bb10b07eba80ea62349ca1d
BLAKE2b-256 33fd94387cc4edc67b038a0aa84c348b60dc831954410b0cfdb5a7dfb3cd5c96

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp35-cp35m-win32.whl.

File metadata

  • Download URL: anltk-1.0.1-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 137.7 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for anltk-1.0.1-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 8d61907b79208f70546dc3e497a022dd74aa162cd43ef5317be3313a97bbcaa2
MD5 c229d649b757ebf3a8cff884056f3978
BLAKE2b-256 cd75352f31dcb56989763ff472378fa1ccf0de83557849f00af24aacd1c808a9

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp35-cp35m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 d6bbfd1346be78010a6425faa376d2b7398cbd5b1fee0d1ae66f5a283f26aad6
MD5 8212aad6c379d9bc9962b6f0554cb7ac
BLAKE2b-256 230744a698843415ee4b53c0f1278827f14f0e8e94febb9036113d45f1ec9f5f

See more details on using hashes here.

File details

Details for the file anltk-1.0.1-cp35-cp35m-manylinux2010_i686.whl.

File metadata

File hashes

Hashes for anltk-1.0.1-cp35-cp35m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 23df6ac81212e598e664d8b35cdde6b8c2dabcecac968d34f9c7b9d2e18567bc
MD5 979a18be21111156834f79846cd7133c
BLAKE2b-256 39afc15bb468335fc929e909f1e9db7a03e154896c3fa7ad00e8539388d1b4d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page