Skip to main content

Arabic language processing toolkit

Project description

example workflow

Arabic Natural Language Toolkit (ANLTK)

ANLTK is a set of Arabic natural language processing tools. developed with focus on performance.

ANLTK is a C++ library, with python bindings.

Installation

for python :

pip install pybind11
pip install anltk

Building

Note: Currently only tested on Linux
The Library depends on https://github.com/nemtrif/utfcpp.git, which is cloned automatically.
you also need a modern C++ Compiler, which supports C++17 also meson and ninja needs to be installed.
simply with pip

pip install meson
pip install ninja
git clone --recurse-submodules https://github.com/Abdullah-AlAttar/anltk.git \
    && cd anltk/anltk \
    && meson build --buildtype=release -Dbuild_tests=false \
    && cd build \
    && ninja \
    && cd ../../ \
    && python3 setup.py install

Usage Examples:

C++ API :

#include "anltk/anltk.hpp"
#include <iostream>
#include <string>

int main()
{

    std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";

    std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
    // >bjd hwz HTy klmn sEfS qr$t vx* DZg

    std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";

    std::cout << anltk::remove_tashkeel(text) << '\n';
    // فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

    // Third paramters is a stop_list, charactres in this list won't be removed
    std::cout << anltk::remove_non_alpha(text, " ") << '\n';
    // فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان
}

Python API

import anltk


ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg

print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))

# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

For list of features see Features.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anltk-0.4.3.tar.gz (173.2 kB view details)

Uploaded Source

Built Distributions

anltk-0.4.3-py3.6-linux-x86_64.egg (216.6 kB view details)

Uploaded Source

anltk-0.4.3-pp37-pypy37_pp73-manylinux2010_x86_64.whl (408.9 kB view details)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

anltk-0.4.3-cp310-cp310-manylinux2010_x86_64.whl (207.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

anltk-0.4.3-cp39-cp39-manylinux2010_x86_64.whl (207.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

anltk-0.4.3-cp38-cp38-manylinux2010_x86_64.whl (206.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

anltk-0.4.3-cp37-cp37m-manylinux2010_x86_64.whl (210.3 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

anltk-0.4.3-cp36-cp36m-manylinux2010_x86_64.whl (210.2 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file anltk-0.4.3.tar.gz.

File metadata

  • Download URL: anltk-0.4.3.tar.gz
  • Upload date:
  • Size: 173.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3.tar.gz
Algorithm Hash digest
SHA256 6b816350b127b122c785bd6cda3cc1e147305dcf1cd1a8d8c794d2255e352e04
MD5 88b97b77b7bec77fcfdbc641d26a4a9a
BLAKE2b-256 e1393a6e8254a49de9f9bc47613d2d2319569baf19f8db82173b2cdd257c2ebd

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-py3.6-linux-x86_64.egg.

File metadata

  • Download URL: anltk-0.4.3-py3.6-linux-x86_64.egg
  • Upload date:
  • Size: 216.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-py3.6-linux-x86_64.egg
Algorithm Hash digest
SHA256 59bdda8b0845abefd28aaba7332e0e99cbf2e0e8e1537006875c9f01c4fed55c
MD5 0f277b7551c03c915257a1c935cc6b26
BLAKE2b-256 f3e9fa4d89eea605a48f910c6a20d92475519ea41e4b3032793f3455127bac6e

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-pp37-pypy37_pp73-manylinux2010_x86_64.whl.

File metadata

  • Download URL: anltk-0.4.3-pp37-pypy37_pp73-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 408.9 kB
  • Tags: PyPy, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5c1b60cf04c2fe3a7a168d1125ca9acc7a2c26c918d0a0d8662a445cdbf658e3
MD5 7092aa212fcc7ad73772c8bc092948ec
BLAKE2b-256 5d6f49de242a0cc5aa082afe0e2e63bb8b6709c707f6faeb502977c582ca51e0

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-cp310-cp310-manylinux2010_x86_64.whl.

File metadata

  • Download URL: anltk-0.4.3-cp310-cp310-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 207.1 kB
  • Tags: CPython 3.10, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-cp310-cp310-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 47166072ed491624c12a11427a2c7aa60b070e356bbe6d04d470671512cfa171
MD5 8124dd3408b9356238047f63c1045f94
BLAKE2b-256 e2b23a4398a2fe2bad4876be3a803c82d772cb335f371f7bf38f6711f7aeb3ea

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: anltk-0.4.3-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 207.3 kB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 727d4ebe3b567ed68959080bfc5c6d5b3697415c47b3309288b581407a2e0ec3
MD5 800874006708c69138dfaa0969c73554
BLAKE2b-256 04fcf96be8bab6a8533ae8229879d24b04b44593a564a932729e7bb8690b4b6e

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: anltk-0.4.3-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 206.0 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b8a4a094a69e622f4ac71a23bdc821cd689bdcebdf8bc0456c38e400478e472d
MD5 1971dcf3eb30311ff4d3645fb15ae849
BLAKE2b-256 285d6c2511fc6c86c90ccbb0f525f85626eb466d1b6e22da24a93ccbeaf5d7c9

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: anltk-0.4.3-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 210.3 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b0d0d2a9b250da40d99a6f3c2ac4b328287cb20f8e47495f44074818d58566c3
MD5 84144f0b175eee78099a0d87c5bb4c68
BLAKE2b-256 a142c3449f030cdaa86b79c2d36094c6a6ee98651a69b74bd296a23079ce6293

See more details on using hashes here.

File details

Details for the file anltk-0.4.3-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: anltk-0.4.3-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 210.2 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for anltk-0.4.3-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1540392f3b89a4903da2eef3cfc4062c201b78cf76e9f29431b3617167628f65
MD5 8cd538d9190f9018a6cd19503afab141
BLAKE2b-256 509f7f405f2a9c3f4705b7bd2aec30b636f2f15016a165a48e0eee1d24c39ff6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page