This project aims to forward translate and backward translate window titles using the help of llama model.
Project description
-
Forward Translation
- Word-wise translation of input strings from any detected source language → English.
- Optional token-level mapping (
getTokenList) for round-trip tracking. - Preserve specified “protected” tokens (e.g. IDs, placeholders, rules).
-
Preserving Sementics during Translation
- Chunk-wise translation of input strings from any detected source language → English.
- Merges common prepositional phrases, translates whole sentences when possible, then aligns individual parts for better context.
- Uses BERT-based alignment to keep multi-word units intact.
-
Backward Translation
- Restoration of original text by inverting the mapping produced during forward translation.
📦 Installation
To install the package, run
pip install windowtitles-translation@git+https://github.com/paxray/windowtitles-translation-llm.git
to install without cloning the repository. If the repository is already cloned, running
pip install .
in the root folder also works.
Features
- Forward translation using Hugging Face MarianMT.
- Back-translation to reconstruct original window titles.
- Token mapping for granular translation control.
- Preserve placeholders and specified words.
Usage
Forward Translation
Use the getTranslations function from main.py:
from main import getTranslations
payload = {
"windowTitles": ["Fenêtre de terminal", "Título de documento"],
"preserveWordsList": ["Word"],
"getTokenList": True,
"windowTitlesLanguage": None,
"translationType": "forward"
}
translations = getTranslations(payload)
print(translations)
Output example:
{
"Fenêtre de terminal": {
"language": "fr",
"translation": "Terminal window",
"tokenMapping": { ... }
},
"Título de documento": {
"language": "es",
"translation": "Document title",
"tokenMapping": { ... }
}
}
Backward Translation
Use the getOriginalData function to reverse a translation:
from main import getOriginalData
payload = {
"alteredWindowTitles": {
"alteredWindowTitle": { ... }
},
"translationType": "backward"
}
originals = getOriginalData(payload)
print(originals)
CLI
By default, running main.py performs a backward translation on sample data:
python main.py
Results are saved to data/output-data/output_file_name.json.
Configuration Options
Below are the configuration options for both forward and backward translation payloads. Each field’s datatype and whether it is mandatory are listed.
Forward Translation Payload:
windowTitles(List[str], mandatory):- A list of strings representing the window titles to be translated.
preserveWordsList(List[str], optional, default=[]):- Words or tokens you want to remain unchanged during translation.
getTokenList(bool, optional, default=False):- If true, includes a mapping of source-to-target tokens in the output.
windowTitlesLanguage(str, optional, default=None):- ISO 639-1 code (e.g., "fr", "es"); forces the source language detection.
translationType(str, mandatory):- For forward translation use the keyword "forward"
Backward Translation Payload (fields within alteredWindowTitles entries):
alteredWindowTitles(str, mandatory):- The translated window title you wish to revert.
tokenMapping(Dict[int, int], mandatory):- Original-to-translated token index mapping used during forward translation.
translationType(str, mandatory):- For backward translation use the keyword "backward"
Config.py
device= "cpu" #or "cuda"- Set to 'CPU' by default. Change to 'Cuda' to use GPU.
Project Structure
└── src
└── windowtitles_translation
├── constants.py
├── config.py
├── commonMethods.py
├── backwardTranslation.py
├── forwardTranslation.py
├── forwardTranslationWithSemantics.py
├── main.py
└── data
├── input-data
└── output-data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file windowtitles_translation_llm-0.1.0.tar.gz.
File metadata
- Download URL: windowtitles_translation_llm-0.1.0.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6398174c86152abf2299ea2c320fa72590c97eab809dedc6033daf1b58df7fe
|
|
| MD5 |
6a84093090aee413cd9e3a44af8fdc95
|
|
| BLAKE2b-256 |
01a04467394ef1bc278baa0b837b6b705f1b618cb234c17ec49d4795f7f968da
|
File details
Details for the file windowtitles_translation_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: windowtitles_translation_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
843711beffd0b050b972c4c326ba8fa22e39bad85b112a8cd5716772782b1acb
|
|
| MD5 |
b74aa0f7190b31dbaf443c7a1b89cba4
|
|
| BLAKE2b-256 |
4b9d72d94223664dc98cb59c913f34c56a83deeb1440753bf31f7e45364243c5
|