Skip to main content

Arabic language processing toolkit

Project description

example workflow example workflow PyPI version License Downloads

Arabic Natural Language Toolkit (ANLTK)

ANLTK is a set of Arabic natural language processing tools. developed with focus on simplicity and performance.

ANLTK is a C++ library, with python bindings.

Installation

for python :

pip install anltk

Building

Note: Currently only tested on Linux, prebuilt python wheels are available for Linux, Windows, Macos on pypi

Dependencies:

  • utfcpp, automatically downloaded.
  • utf8proc, automatically downlaoded.
  • C++ Compiler that supports c++17.
  • Python3, meson, ninja
pip install meson
pip install ninja
git clone https://github.com/Abdullah-AlAttar/anltk.git \
    && cd anltk/ \
    && meson build --buildtype=release -Dbuild_tests=false \
    && cd build \
    && ninja \
    && cd ../ \
    && pip install -e .

Usage Examples:

C++ API :

#include "anltk/anltk.hpp"
#include <iostream>
#include <string>

int main()
{

    std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";

    std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
    // >bjd hwz HTy klmn sEfS qr$t vx* DZg

    std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";

    std::cout << anltk::remove_tashkeel(text) << '\n';
    // فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

    // Third paramters is a stop_list, charactres in this list won't be removed
    std::cout << anltk::remove_non_alpha(text, " ") << '\n';
    // فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان

    anltk::TafqitOptions opts;
    std::cout<< anltk::tafqit(15000120, opts) <<'\n';
    // خمسة عشر مليونًا ومائة وعشرون
}

Python API

import anltk


ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg

print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))

# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

print(anltk.tafqit(15000120))
# خمسة عشر مليونًا ومائة وعشرون

For list of features see Features.md

Benchmarks

Processing a file containing 500000 Line, 6787731 Word, 112704541 Character. the task is to remove diacritics / transliterate to buckwalter

Buckwatler transliteration

Method Time
anltk python-api 1.379 seconds
python camel_tools 11.46 seconds

Remove Diacritics

Method Time
anltk python-api 0.989 seconds
python camel_tools 4.892 seconds

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anltk-1.0.4.tar.gz (164.1 kB view hashes)

Uploaded Source

Built Distributions

anltk-1.0.4-pp39-pypy39_pp73-win_amd64.whl (167.1 kB view hashes)

Uploaded PyPy Windows x86-64

anltk-1.0.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (255.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

anltk-1.0.4-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (225.2 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

anltk-1.0.4-pp38-pypy38_pp73-win_amd64.whl (167.1 kB view hashes)

Uploaded PyPy Windows x86-64

anltk-1.0.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (255.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

anltk-1.0.4-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (225.2 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

anltk-1.0.4-pp37-pypy37_pp73-win_amd64.whl (167.1 kB view hashes)

Uploaded PyPy Windows x86-64

anltk-1.0.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

anltk-1.0.4-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (225.2 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

anltk-1.0.4-cp311-cp311-win_amd64.whl (168.7 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

anltk-1.0.4-cp311-cp311-musllinux_1_1_x86_64.whl (770.9 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

anltk-1.0.4-cp311-cp311-musllinux_1_1_i686.whl (836.7 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

anltk-1.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256.6 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

anltk-1.0.4-cp311-cp311-macosx_10_9_x86_64.whl (225.1 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

anltk-1.0.4-cp310-cp310-win_amd64.whl (168.6 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

anltk-1.0.4-cp310-cp310-musllinux_1_1_x86_64.whl (771.0 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

anltk-1.0.4-cp310-cp310-musllinux_1_1_i686.whl (836.7 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

anltk-1.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256.7 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

anltk-1.0.4-cp310-cp310-macosx_10_9_x86_64.whl (225.1 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

anltk-1.0.4-cp39-cp39-win_amd64.whl (164.3 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

anltk-1.0.4-cp39-cp39-musllinux_1_1_x86_64.whl (771.2 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

anltk-1.0.4-cp39-cp39-musllinux_1_1_i686.whl (836.7 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

anltk-1.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (257.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

anltk-1.0.4-cp39-cp39-macosx_10_9_x86_64.whl (225.2 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

anltk-1.0.4-cp38-cp38-win_amd64.whl (168.6 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

anltk-1.0.4-cp38-cp38-musllinux_1_1_x86_64.whl (770.6 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

anltk-1.0.4-cp38-cp38-musllinux_1_1_i686.whl (836.6 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

anltk-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

anltk-1.0.4-cp38-cp38-macosx_10_9_x86_64.whl (225.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

anltk-1.0.4-cp37-cp37m-win_amd64.whl (167.3 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

anltk-1.0.4-cp37-cp37m-musllinux_1_1_x86_64.whl (778.7 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

anltk-1.0.4-cp37-cp37m-musllinux_1_1_i686.whl (847.5 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

anltk-1.0.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (265.0 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

anltk-1.0.4-cp37-cp37m-macosx_10_9_x86_64.whl (218.7 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

anltk-1.0.4-cp36-cp36m-win_amd64.whl (167.2 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

anltk-1.0.4-cp36-cp36m-musllinux_1_1_x86_64.whl (778.0 kB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ x86-64

anltk-1.0.4-cp36-cp36m-musllinux_1_1_i686.whl (847.6 kB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ i686

anltk-1.0.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (264.9 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

anltk-1.0.4-cp36-cp36m-macosx_10_9_x86_64.whl (218.4 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page