A library for assisting in translating Darija to English. It provides a list of potential translations for a given darija word. It also supports translation of full sentences using LLMs (e.g., OpenAI).
Project description
DarijaAssistant Library
DarijaAssistant is a Python library designed to assist in translating Moroccan Darija (a dialect of Arabic) into English. It integrates two main functionalities:
-
Assisted Translation: The
DarijaAssistantclass provides additional support for translating words and sentences using a custom word-distance algorithm, offering assistance to improve translation accuracy, especially for difficult or ambiguous phrases. -
LLM Client: A client that allows interaction with any language model (LLM) hosted at any URL. For enhanced usability, the library also provides built-in support for OpenAI’s GPT models, allowing users to easily integrate them by simply providing the OpenAI API key and the model name, making it work out of the box.
This library allows users to perform both raw and assisted translations, improving the contextual understanding of Moroccan Darija sentences through caching, normalization, and additional linguistic analysis.
Installation
To install the library, run:
pip install DarijaTranslatorAssistant
Usage
1. Initializing the Translation model
You can choose between a model hosted at any URL or OpenAI. Here's how to initialize the client:
from DarijaTranslatorAssistant.llm_client import LLMClient
# Example using OpenAI GPT model
llm_client = LLMClient(use_openai=True, openai_api_key="your_openai_api_key", openai_model="gpt-4o")
# Example using an LLM hosted at a specific URL
llm_client = LLMClient(llm_url="http://your-llm-url.com", use_openai=False)
2. Simple Translation
You can perform a direct translation using the LLM client.
sentence = "law3lm asahbi"
# only uses OpenAI's gpt-4o
translation_without_assistance = llm_client.translate(sentence)
print(translation_without_assistance)
# [output]: The world, my friend.
3. Assisted Translation
For more context-aware translation, use the DarijaAssistant class. This will assist the translation process by leveraging a word-distance algorithm.
from DarijaTranslatorAssistant.darija_assistant import DarijaAssistant
# Initialize DarijaAssistant with the LLM client
assistant = DarijaAssistant(llm_client=llm_client)
# Use assisted translation: OpenAI's gpt-4o + DarijaAssistant
sentence = "law3lm asahbi"
result = assistant.assist_and_translate(sentence)
print(result)
# [output]: I do not know my friend.
4. Example Translations
Here's the difference between GPT-4 translations and our approach, showing how each handles Darija sentences with and without specialized assistance.
| Darija Sentence | GPT4o Translation Without Assistance | Assisted Translation |
|---|---|---|
| law3lm asahbi | The world, my friend. | I do not know my friend. |
| kbchlaba9ich | I feel thirsty. | Fill my cup. |
| 3rram dyal lbrahch | Brahch's pen. | Plenty of kids. |
| chof 3la tfrnisa | Check the outlet. | Look at the smile. |
5. Expanding the Dictionary
You can add new words and translations using the DarijaDataManager from the DarijaDistance package, which the DarijaAssistant library relies on.
from DarijaDistance.preprocess import DarijaDataManager
data_manager = DarijaDataManager()
data_manager.add_translations([('khona', 'brother')])
Now, the word "khona" will be recognized and translated as "brother" in future translations. This addition is persistent, meaning it will be saved to the library's data, not just the current session. As a result, future instances of DarijaAssistant will automatically recognize and apply this translation, without needing to re-add it.
6. Access to Word-Distance Methods
As a user of the DarijaAssistant library, you have access to all the methods from the word-distance algorithm, such as checking translation confidence, retrieving exact matches, and more.
Contributing
Contributions are welcome! If you have any ideas, suggestions, or find a bug, please open an issue or submit a pull request to the Github repo.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Contact
If you have any questions or feedback, you can find me on LinkedIn: Aissam Outchakoucht or on X: @aissam_out.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for darijatranslatorassistant-1.0.1.tar.gz
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 | 2db495bb074306733c471f8f7750b50302ab0180ad36bc6805ea1eae9f7ffe46 |
|
| MD5 | dc0cd0c705efc354f9d7d9bee740cc09 |
|
| BLAKE2b-256 | faa5893dfa25716818e768d3a459664533865032cafecc709b0f459d27a72adb |
Hashes for DarijaTranslatorAssistant-1.0.1-py3-none-any.whl
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 | fd8043797ce86613a7b244165d91da2fe2577ddf3e5742c106a56bcb53b683e4 |
|
| MD5 | 47459e4bd115c54004d025beea5eaa52 |
|
| BLAKE2b-256 | 8765f8a6e53bf3b72e27a815aacd4171ba7802eade4a766279a0cd7dd8cc52f2 |