A complete production-ready Python package for processing Arabic text
Project description
Arabic NLP Toolkit
A complete, production-ready Python package for processing Arabic text following MLOps best practices. Designed for use in machine learning pipelines, it includes utilities for cleaning, normalization, tokenization, and basic sentiment analysis.
Features
- Text Cleaning: Strip diacritics, remove special characters, and normalize Arabic text.
- Tokenization: Simple and rule-based tokenizers and sentence splitters tailored for Arabic text.
- Sentiment Analysis: Basic rule-based sentiment scoring (positive, negative, neutral).
- MLOps Ready: Designed with modularity, testing, and continuous integration in mind.
Installation
From Source
Ensure you have Python 3.8+ installed. You can install it directly by cloning the repository and running:
git clone https://github.com/Melad98/MLOps-package-structure.git
cd MLOps-package-structure
pip install .
For development (including dependencies for testing and building):
pip install -e ".[dev]"
Usage Example
from arabic_nlp_toolkit.cleaning import normalize_arabic
text = normalize_arabic("النّص العَرَبِيُّ!")
print(text)
# Output might be "النص العربي!" depending on the normalization functions
Testing
To run the unit tests, install development requirements and use pytest:
pip install -e ".[dev]"
pytest tests/
Packaging
Ensure your package is buildable using build:
python -m build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arabic_nlp_toolkit-0.1.0.tar.gz.
File metadata
- Download URL: arabic_nlp_toolkit-0.1.0.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f59dbe6b18057ee04a3d16f7d4a8a6d95a372b81cd7ff2808a060968af5fd44
|
|
| MD5 |
7cde82c411e1551e6354d7e0c7b3fab9
|
|
| BLAKE2b-256 |
68aadcdcc3d043604c6f21bb8fcce8c789eb1b1665fff1e47aa07ae36ee6e175
|
File details
Details for the file arabic_nlp_toolkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: arabic_nlp_toolkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
904a88272300b6c6edfaa9afbb819e46b2fc3baf98c6c94d4eaada2bf8f362a7
|
|
| MD5 |
88cf5bd80d5974241ddec95603239485
|
|
| BLAKE2b-256 |
48515f8aaa024a60cb8415306e0fc028520344ade82247867dda8eab8c939593
|