Skip to main content

Arabic Natural Language Toolkit (ANLTK)

Project description

example workflow example workflow PyPI version License Downloads

Arabic Natural Language Toolkit (ANLTK)

ANLTK is a set of Arabic natural language processing tools. developed with focus on simplicity and performance.

ANLTK is a C++ library, with python bindings

Installation

for python :

pip install anltk

Building

Note: Currently only tested on Linux, prebuilt python wheels are available for Linux, Windows, Macos on pypi

Dependencies

  • utfcpp, automatically downloaded.
  • utf8proc, automatically downloaded.
  • C++ Compiler that supports c++17.
  • Python3, meson, ninja
  • Task (optional, for simplified build commands)

Building C++ Library

git clone https://github.com/Abdullah-AlAttar/anltk.git
cd anltk/

# Using taskfile (recommended)
task configure
task build
task test

# Or manually with meson
meson build --buildtype=release -Dbuild_tests=false
cd build
ninja

Building Python Bindings

# Complete setup (creates venv, installs deps, builds package)
task py:setup

# Or step by step:
task py:venv              # Create virtual environment
task py:deps              # Install build dependencies
task py:install           # Install in development mode

# Test the installation
task py:test              # Run quick tests

# Build wheel for distribution
task py:wheel             # Build wheel package

# Clean build artifacts
task clean                # Clean all build artifacts

Manual Python Build (without taskfile)

python3 -m venv .venv
.venv/bin/pip install --upgrade pip meson-python build pybind11 ninja patchelf
.venv/bin/pip install -e .

Usage Examples

C++ API

#include "anltk/anltk.hpp"
#include <iostream>
#include <string>

int main()
{

    std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";

    std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
    // >bjd hwz HTy klmn sEfS qr$t vx* DZg

    std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";

    std::cout << anltk::remove_tashkeel(text) << '\n';
    // فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

    // Third paramters is a stop_list, charactres in this list won't be removed
    std::cout << anltk::remove_non_alpha(text, " ") << '\n';
    // فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان

    anltk::TafqitOptions opts;
    std::cout<< anltk::tafqit(15000120, opts) <<'\n';
    // خمسة عشر مليونًا ومائة وعشرون
}

Python API

import anltk


ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg

print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))

# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.

print(anltk.tafqit(15000120))
# خمسة عشر مليونًا ومائة وعشرون

For list of features see Features.md

Benchmarks

Processing a file containing 500000 Line, 6787731 Word, 112704541 Character. the task is to remove diacritics / transliterate to buckwalter

Buckwatler transliteration

Method Time
anltk python-api 1.379 seconds
python camel_tools 11.46 seconds

Remove Diacritics

Method Time
anltk python-api 0.989 seconds
python camel_tools 4.892 seconds

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

anltk-1.0.8-cp314-cp314-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.14Windows x86-64

anltk-1.0.8-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (277.0 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp314-cp314-macosx_10_15_x86_64.whl (256.7 kB view details)

Uploaded CPython 3.14macOS 10.15+ x86-64

anltk-1.0.8-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

anltk-1.0.8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (276.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp313-cp313-macosx_10_13_x86_64.whl (257.8 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

anltk-1.0.8-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

anltk-1.0.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (277.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp312-cp312-macosx_10_13_x86_64.whl (257.8 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

anltk-1.0.8-cp311-cp311-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11Windows x86-64

anltk-1.0.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (275.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp311-cp311-macosx_10_9_x86_64.whl (252.9 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

anltk-1.0.8-cp310-cp310-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.10Windows x86-64

anltk-1.0.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (274.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp310-cp310-macosx_10_9_x86_64.whl (251.3 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

anltk-1.0.8-cp39-cp39-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.9Windows x86-64

anltk-1.0.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (274.2 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp39-cp39-macosx_10_9_x86_64.whl (251.5 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

anltk-1.0.8-cp38-cp38-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.8Windows x86-64

anltk-1.0.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (274.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

anltk-1.0.8-cp38-cp38-macosx_10_9_x86_64.whl (251.7 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file anltk-1.0.8-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 661563b6096fd085afbed55fca86a6562479c3f56e3deb2bbd8e4a08c602ab49
MD5 04329b63c447dcd46b9e70d3b35b7dbc
BLAKE2b-256 ab718953baebb8a03a334a6b36e0bb8f24c8befeacadcfb1d6f92770b410da87

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a919b1b4ef0815b04c01e9f85631d6d52091c923c215810be2734529b614701f
MD5 d50dc0477d9f7bf2fdf32c367440b680
BLAKE2b-256 cf92960f7ab0575fc206e384fc5c48d5a60d8c78c76c3f1137f9c6d43f67060d

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp314-cp314-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp314-cp314-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 55072bd48f9e4230e246d80c46b5768bd28e4c611ea7823a2f937a8d8e27d956
MD5 8b03f76d3429dd5f745020a33bd6d838
BLAKE2b-256 01743521736f6ae38bb7e1309c1046a917148f2d364fcac9fa315b95d721d4a5

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4c7e5f68ba5ad2bb1e99e63c6506faf2145ed52f2296d7a26d859daab633ee55
MD5 29b466105370cb4fb312f91c18b0c7fb
BLAKE2b-256 99d0ae3bbb5fdfaa3a2149b92409b151496c4e46803d92cfb6ea0c28c39654e0

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c8885a2348b18bb8cfa834d6b77f054ebb9d9a771ea7393793d950b8d3a6e35c
MD5 0bb31110d95331373a887b652a6bebc4
BLAKE2b-256 b3fe6ada7265c094d85d1779c17e8fb389bac30f989c4992e217397833975b9e

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 29a7d69305c2252a2442e644a85e212e0aa21326a81b50ec4c8d374f65ca6fc2
MD5 b3e63d9fd0f9bbb76fbfa35d024d7d6a
BLAKE2b-256 6895f4b2b72599f4d5f8c76ca3713b39334804091d704d5d87dc51fd69a35888

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f02fb704956a70dcfac9cb2ed23fa0569c0e4db163e5c2b86e78ee427dfd3b75
MD5 19b09d022496c8e50ba3b67c61c9a192
BLAKE2b-256 cb888489baf35a6474ea4755927b0633432f26cb7fe779ba07f4607fb36426c5

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6ebf6b45d8c435c91cf4d9c3851e89cf2c14827d14a72e58fca4a1cecb4284c5
MD5 340dae81905fb3c699ee29af2ade8b4e
BLAKE2b-256 8af8c4a66ea8d38df96d0aa0ca9d71982079bcddcb0b9fa3bdd88190f6f6a6a9

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 39743e8cff0823d856fcb9b5255fa0dcffed9c4eff91e507bc63fd16dc5a0050
MD5 c21da18561e65407ee6e888e95be7977
BLAKE2b-256 3f9e01c9f3b6e6dd204c1d8d4815c591599badc0a82a606f3b725f3647e54bb4

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0bf05c32fda9b8ef72fecec6de483e34eab934f593792601a4813fc759c5a0f8
MD5 1a531ccde4d7e43bb013788ba4b9bfa4
BLAKE2b-256 5599600db229ce16cc99e090b6c575ccd80617985921e7bad427b5bacd1d6b07

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8385adfe365cdfd53b4808b9743a85ecbff56fdf614e52231b77d3ea3ff6b57c
MD5 bb878c170d7c6e25f2d1de50ff9c03f0
BLAKE2b-256 5adea744df5556caafcfd0079b2dfe02fbff0d109188d7e9c0762e8134bc3916

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6ed7ebb4c9b86251be69e58e3f2f35e27477ac6a8d5ba58f06e023ca2124f645
MD5 0e78c831b6d1ea99170d870c6d4d2999
BLAKE2b-256 31ba3622e06617ff80df288880c689bb0dd8b59c70e514fcd736201313cf2865

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 be502b1db75d96817572891bdde663ed6e26dafee9e77b5658b1a6deae46f6ba
MD5 2f9b367ab042f12e6c6b9ccaada1cbaa
BLAKE2b-256 e2f2c90c7fd8b1b6b25a9f9b7195cf3d0da11c590c0ba8e0608c604a97c9d8b1

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5f6d6caa92ea8d2062719d13b2f515bef169b8f6a706fc38c47d00a42195a6a3
MD5 01e943368ef87db0d0399f7680193cca
BLAKE2b-256 3ef782f0032f63e7c2d76a89cd62933ed2c418db35e73e596ac2fe83a82a5d03

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b1b2c7b01da6d6244a9fde5bb5642aadd454a3e0510d8a9bd03482af85fd379f
MD5 a1e534326523bf8edf8ef8baf001f8d2
BLAKE2b-256 6b1b63c6363a8937c0146349589d71e5c2344b8d81d97612f22625049835ef76

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 db715c27aa99d190f9457f4f3dac89a439f97751e90936de1b5b30fcf6a94c1d
MD5 51a566a70a167bdb05d54df73fcb2631
BLAKE2b-256 7a4676908bce51a63ec8318cdd1cc3537237dde3e4c5eabb9bab89ca5b751d0c

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7087a0847fc77c2db8cea95f93fb72b245da7fb2caf4257dd271ae27d93c187e
MD5 cf5f19e70b094e969c64c7ac4a00241c
BLAKE2b-256 4689ed747557ca08b8ad9b9ac4e1e09d5f7c8804829b061882e14e3f33c5d844

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 570bd76078c6c059eb52be3f76864982d2baabfbed0ac030a7c1ccfdef160eac
MD5 10ebd26cde2e6baced178c1c50f0e630
BLAKE2b-256 965b8349019a7687d2ec3e7f0cebb8f4b82fd8235c33f077c282dbd974796e22

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: anltk-1.0.8-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for anltk-1.0.8-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 cc2557ca82ae0792c5949ebb4c0bb6b3ce0de1951f1e089654fd44bffc09d830
MD5 1bdb488300838269d8d567dcce82b4b0
BLAKE2b-256 078c814929096092ca44559de998b31f006936cb0fc6a51da8a6cd4b663ace47

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 36929c2637f938a81285d790bba901a62ff469b1eca61cd31a527b91a5c62329
MD5 857e9d2d4ed2a7117030af6685c46a5a
BLAKE2b-256 ae509f3822c82d9e4e1ef7ffb523c5178e662013d833eaa734e3be2e8e352abb

See more details on using hashes here.

File details

Details for the file anltk-1.0.8-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for anltk-1.0.8-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9576300c09487572618855c95cef981d38a3f489cd9e10c5ba636b92f37f902c
MD5 d4c7d08e85ccea65609f565a32470b8b
BLAKE2b-256 57455c52b8d3a0025ccfea70540ff84dbd671e1c94b027331203908f8a140417

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page