Arabic Natural Language Toolkit (ANLTK)
Project description
Arabic Natural Language Toolkit (ANLTK)
ANLTK is a set of Arabic natural language processing tools. developed with focus on simplicity and performance.
ANLTK is a C++ library, with python bindings
Installation
for python :
pip install anltk
Building
Note: Currently only tested on Linux, prebuilt python wheels are available for Linux, Windows, Macos on pypi
Dependencies
- utfcpp, automatically downloaded.
- utf8proc, automatically downloaded.
- C++ Compiler that supports c++17.
- Python3, meson, ninja
- Task (optional, for simplified build commands)
Building C++ Library
git clone https://github.com/Abdullah-AlAttar/anltk.git
cd anltk/
# Using taskfile (recommended)
task configure
task build
task test
# Or manually with meson
meson build --buildtype=release -Dbuild_tests=false
cd build
ninja
Building Python Bindings
# Complete setup (creates venv, installs deps, builds package)
task py:setup
# Or step by step:
task py:venv # Create virtual environment
task py:deps # Install build dependencies
task py:install # Install in development mode
# Test the installation
task py:test # Run quick tests
# Build wheel for distribution
task py:wheel # Build wheel package
# Clean build artifacts
task clean # Clean all build artifacts
Manual Python Build (without taskfile)
python3 -m venv .venv
.venv/bin/pip install --upgrade pip meson-python build pybind11 ninja patchelf
.venv/bin/pip install -e .
Usage Examples
C++ API
#include "anltk/anltk.hpp"
#include <iostream>
#include <string>
int main()
{
std::string ar_text = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ";
std::cout << anltk::transliterate(ar_text, anltk::CharMapping::AR2BW) << '\n';
// >bjd hwz HTy klmn sEfS qr$t vx* DZg
std::string text = "فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ.";
std::cout << anltk::remove_tashkeel(text) << '\n';
// فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
// Third paramters is a stop_list, charactres in this list won't be removed
std::cout << anltk::remove_non_alpha(text, " ") << '\n';
// فراشة ملونة تطير في البستان حلوة مهندمة تدهش الإنسان
anltk::TafqitOptions opts;
std::cout<< anltk::tafqit(15000120, opts) <<'\n';
// خمسة عشر مليونًا ومائة وعشرون
}
Python API
import anltk
ar = "أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ"
bw = anltk.transliterate(ar, anltk.AR2BW)
print(bw)
# >bjd hwz HTy klmn sEfS qr$t vx* DZg
print(anltk.remove_tashkeel("فَرَاشَةٌ مُلَوَّنَةٌ تَطِيْرُ في البُسْتَانِ، حُلْوَةٌ مُهَنْدَمَةٌ تُدْهِشُ الإِنْسَانَ."))
# فراشة ملونة تطير في البستان، حلوة مهندمة تدهش الإنسان.
print(anltk.tafqit(15000120))
# خمسة عشر مليونًا ومائة وعشرون
For list of features see Features.md
Benchmarks
Processing a file containing 500000 Line, 6787731 Word, 112704541 Character. the task is to remove diacritics / transliterate to buckwalter
Buckwatler transliteration
| Method | Time | ||
|---|---|---|---|
| anltk python-api | 1.379 seconds | ||
| python camel_tools | 11.46 seconds |
Remove Diacritics
| Method | Time | ||
|---|---|---|---|
| anltk python-api | 0.989 seconds | ||
| python camel_tools | 4.892 seconds |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anltk-1.0.8-cp314-cp314-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp314-cp314-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.14, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
661563b6096fd085afbed55fca86a6562479c3f56e3deb2bbd8e4a08c602ab49
|
|
| MD5 |
04329b63c447dcd46b9e70d3b35b7dbc
|
|
| BLAKE2b-256 |
ab718953baebb8a03a334a6b36e0bb8f24c8befeacadcfb1d6f92770b410da87
|
File details
Details for the file anltk-1.0.8-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 277.0 kB
- Tags: CPython 3.14, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a919b1b4ef0815b04c01e9f85631d6d52091c923c215810be2734529b614701f
|
|
| MD5 |
d50dc0477d9f7bf2fdf32c367440b680
|
|
| BLAKE2b-256 |
cf92960f7ab0575fc206e384fc5c48d5a60d8c78c76c3f1137f9c6d43f67060d
|
File details
Details for the file anltk-1.0.8-cp314-cp314-macosx_10_15_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp314-cp314-macosx_10_15_x86_64.whl
- Upload date:
- Size: 256.7 kB
- Tags: CPython 3.14, macOS 10.15+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55072bd48f9e4230e246d80c46b5768bd28e4c611ea7823a2f937a8d8e27d956
|
|
| MD5 |
8b03f76d3429dd5f745020a33bd6d838
|
|
| BLAKE2b-256 |
01743521736f6ae38bb7e1309c1046a917148f2d364fcac9fa315b95d721d4a5
|
File details
Details for the file anltk-1.0.8-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c7e5f68ba5ad2bb1e99e63c6506faf2145ed52f2296d7a26d859daab633ee55
|
|
| MD5 |
29b466105370cb4fb312f91c18b0c7fb
|
|
| BLAKE2b-256 |
99d0ae3bbb5fdfaa3a2149b92409b151496c4e46803d92cfb6ea0c28c39654e0
|
File details
Details for the file anltk-1.0.8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 276.5 kB
- Tags: CPython 3.13, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8885a2348b18bb8cfa834d6b77f054ebb9d9a771ea7393793d950b8d3a6e35c
|
|
| MD5 |
0bb31110d95331373a887b652a6bebc4
|
|
| BLAKE2b-256 |
b3fe6ada7265c094d85d1779c17e8fb389bac30f989c4992e217397833975b9e
|
File details
Details for the file anltk-1.0.8-cp313-cp313-macosx_10_13_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp313-cp313-macosx_10_13_x86_64.whl
- Upload date:
- Size: 257.8 kB
- Tags: CPython 3.13, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29a7d69305c2252a2442e644a85e212e0aa21326a81b50ec4c8d374f65ca6fc2
|
|
| MD5 |
b3e63d9fd0f9bbb76fbfa35d024d7d6a
|
|
| BLAKE2b-256 |
6895f4b2b72599f4d5f8c76ca3713b39334804091d704d5d87dc51fd69a35888
|
File details
Details for the file anltk-1.0.8-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f02fb704956a70dcfac9cb2ed23fa0569c0e4db163e5c2b86e78ee427dfd3b75
|
|
| MD5 |
19b09d022496c8e50ba3b67c61c9a192
|
|
| BLAKE2b-256 |
cb888489baf35a6474ea4755927b0633432f26cb7fe779ba07f4607fb36426c5
|
File details
Details for the file anltk-1.0.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 277.0 kB
- Tags: CPython 3.12, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ebf6b45d8c435c91cf4d9c3851e89cf2c14827d14a72e58fca4a1cecb4284c5
|
|
| MD5 |
340dae81905fb3c699ee29af2ade8b4e
|
|
| BLAKE2b-256 |
8af8c4a66ea8d38df96d0aa0ca9d71982079bcddcb0b9fa3bdd88190f6f6a6a9
|
File details
Details for the file anltk-1.0.8-cp312-cp312-macosx_10_13_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp312-cp312-macosx_10_13_x86_64.whl
- Upload date:
- Size: 257.8 kB
- Tags: CPython 3.12, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39743e8cff0823d856fcb9b5255fa0dcffed9c4eff91e507bc63fd16dc5a0050
|
|
| MD5 |
c21da18561e65407ee6e888e95be7977
|
|
| BLAKE2b-256 |
3f9e01c9f3b6e6dd204c1d8d4815c591599badc0a82a606f3b725f3647e54bb4
|
File details
Details for the file anltk-1.0.8-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bf05c32fda9b8ef72fecec6de483e34eab934f593792601a4813fc759c5a0f8
|
|
| MD5 |
1a531ccde4d7e43bb013788ba4b9bfa4
|
|
| BLAKE2b-256 |
5599600db229ce16cc99e090b6c575ccd80617985921e7bad427b5bacd1d6b07
|
File details
Details for the file anltk-1.0.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 275.6 kB
- Tags: CPython 3.11, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8385adfe365cdfd53b4808b9743a85ecbff56fdf614e52231b77d3ea3ff6b57c
|
|
| MD5 |
bb878c170d7c6e25f2d1de50ff9c03f0
|
|
| BLAKE2b-256 |
5adea744df5556caafcfd0079b2dfe02fbff0d109188d7e9c0762e8134bc3916
|
File details
Details for the file anltk-1.0.8-cp311-cp311-macosx_10_9_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp311-cp311-macosx_10_9_x86_64.whl
- Upload date:
- Size: 252.9 kB
- Tags: CPython 3.11, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ed7ebb4c9b86251be69e58e3f2f35e27477ac6a8d5ba58f06e023ca2124f645
|
|
| MD5 |
0e78c831b6d1ea99170d870c6d4d2999
|
|
| BLAKE2b-256 |
31ba3622e06617ff80df288880c689bb0dd8b59c70e514fcd736201313cf2865
|
File details
Details for the file anltk-1.0.8-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be502b1db75d96817572891bdde663ed6e26dafee9e77b5658b1a6deae46f6ba
|
|
| MD5 |
2f9b367ab042f12e6c6b9ccaada1cbaa
|
|
| BLAKE2b-256 |
e2f2c90c7fd8b1b6b25a9f9b7195cf3d0da11c590c0ba8e0608c604a97c9d8b1
|
File details
Details for the file anltk-1.0.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 274.1 kB
- Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f6d6caa92ea8d2062719d13b2f515bef169b8f6a706fc38c47d00a42195a6a3
|
|
| MD5 |
01e943368ef87db0d0399f7680193cca
|
|
| BLAKE2b-256 |
3ef782f0032f63e7c2d76a89cd62933ed2c418db35e73e596ac2fe83a82a5d03
|
File details
Details for the file anltk-1.0.8-cp310-cp310-macosx_10_9_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp310-cp310-macosx_10_9_x86_64.whl
- Upload date:
- Size: 251.3 kB
- Tags: CPython 3.10, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1b2c7b01da6d6244a9fde5bb5642aadd454a3e0510d8a9bd03482af85fd379f
|
|
| MD5 |
a1e534326523bf8edf8ef8baf001f8d2
|
|
| BLAKE2b-256 |
6b1b63c6363a8937c0146349589d71e5c2344b8d81d97612f22625049835ef76
|
File details
Details for the file anltk-1.0.8-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db715c27aa99d190f9457f4f3dac89a439f97751e90936de1b5b30fcf6a94c1d
|
|
| MD5 |
51a566a70a167bdb05d54df73fcb2631
|
|
| BLAKE2b-256 |
7a4676908bce51a63ec8318cdd1cc3537237dde3e4c5eabb9bab89ca5b751d0c
|
File details
Details for the file anltk-1.0.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 274.2 kB
- Tags: CPython 3.9, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7087a0847fc77c2db8cea95f93fb72b245da7fb2caf4257dd271ae27d93c187e
|
|
| MD5 |
cf5f19e70b094e969c64c7ac4a00241c
|
|
| BLAKE2b-256 |
4689ed747557ca08b8ad9b9ac4e1e09d5f7c8804829b061882e14e3f33c5d844
|
File details
Details for the file anltk-1.0.8-cp39-cp39-macosx_10_9_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp39-cp39-macosx_10_9_x86_64.whl
- Upload date:
- Size: 251.5 kB
- Tags: CPython 3.9, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
570bd76078c6c059eb52be3f76864982d2baabfbed0ac030a7c1ccfdef160eac
|
|
| MD5 |
10ebd26cde2e6baced178c1c50f0e630
|
|
| BLAKE2b-256 |
965b8349019a7687d2ec3e7f0cebb8f4b82fd8235c33f077c282dbd974796e22
|
File details
Details for the file anltk-1.0.8-cp38-cp38-win_amd64.whl.
File metadata
- Download URL: anltk-1.0.8-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc2557ca82ae0792c5949ebb4c0bb6b3ce0de1951f1e089654fd44bffc09d830
|
|
| MD5 |
1bdb488300838269d8d567dcce82b4b0
|
|
| BLAKE2b-256 |
078c814929096092ca44559de998b31f006936cb0fc6a51da8a6cd4b663ace47
|
File details
Details for the file anltk-1.0.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 274.9 kB
- Tags: CPython 3.8, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36929c2637f938a81285d790bba901a62ff469b1eca61cd31a527b91a5c62329
|
|
| MD5 |
857e9d2d4ed2a7117030af6685c46a5a
|
|
| BLAKE2b-256 |
ae509f3822c82d9e4e1ef7ffb523c5178e662013d833eaa734e3be2e8e352abb
|
File details
Details for the file anltk-1.0.8-cp38-cp38-macosx_10_9_x86_64.whl.
File metadata
- Download URL: anltk-1.0.8-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 251.7 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9576300c09487572618855c95cef981d38a3f489cd9e10c5ba636b92f37f902c
|
|
| MD5 |
d4c7d08e85ccea65609f565a32470b8b
|
|
| BLAKE2b-256 |
57455c52b8d3a0025ccfea70540ff84dbd671e1c94b027331203908f8a140417
|