Skip to main content

This project aims to forward translate and backward translate window titles using the help of llama model.

Project description

  • Forward Translation

    • Word-wise translation of input strings from any detected source language → English.
    • Optional token-level mapping (getTokenList) for round-trip tracking.
    • Preserve specified “protected” tokens (e.g. IDs, placeholders, rules).
  • Preserving Sementics during Translation

    • Chunk-wise translation of input strings from any detected source language → English.
    • Merges common prepositional phrases, translates whole sentences when possible, then aligns individual parts for better context.
    • Uses BERT-based alignment to keep multi-word units intact.
  • Backward Translation

    • Restoration of original text by inverting the mapping produced during forward translation.

📦 Installation

To install the package, run

pip install windowtitles-translation@git+https://github.com/paxray/windowtitles-translation-llm.git

to install without cloning the repository. If the repository is already cloned, running

pip install .

in the root folder also works.

Features

  • Forward translation using Hugging Face MarianMT.
  • Back-translation to reconstruct original window titles.
  • Token mapping for granular translation control.
  • Preserve placeholders and specified words.

Usage

Forward Translation

Use the getTranslations function from main.py:

from main import getTranslations

payload = {
    "windowTitles": ["Fenêtre de terminal", "Título de documento"],
    "preserveWordsList": ["Word"],
    "getTokenList": True,
    "windowTitlesLanguage": None,
    "translationType": "forward" 
}

translations = getTranslations(payload)
print(translations)

Output example:

{
  "Fenêtre de terminal": {
    "language": "fr",
    "translation": "Terminal window",
    "tokenMapping": { ... }
  },
  "Título de documento": {
    "language": "es",
    "translation": "Document title",
    "tokenMapping": { ... }
  }
} 

Backward Translation

Use the getOriginalData function to reverse a translation:

from main import getOriginalData

payload = {
    "alteredWindowTitles": {
        "alteredWindowTitle": { ... }
    },
    "translationType": "backward"
}

originals = getOriginalData(payload)
print(originals)

CLI

By default, running main.py performs a backward translation on sample data:

python main.py

Results are saved to data/output-data/output_file_name.json.

Configuration Options

Below are the configuration options for both forward and backward translation payloads. Each field’s datatype and whether it is mandatory are listed.

Forward Translation Payload:

  • windowTitles (List[str], mandatory):
    • A list of strings representing the window titles to be translated.
  • preserveWordsList (List[str], optional, default=[]):
    • Words or tokens you want to remain unchanged during translation.
  • getTokenList (bool, optional, default=False):
    • If true, includes a mapping of source-to-target tokens in the output.
  • windowTitlesLanguage (str, optional, default=None):
    • ISO 639-1 code (e.g., "fr", "es"); forces the source language detection.
  • translationType (str, mandatory):
    • For forward translation use the keyword "forward"

Backward Translation Payload (fields within alteredWindowTitles entries):

  • alteredWindowTitles (str, mandatory):
    • The translated window title you wish to revert.
  • tokenMapping (Dict[int, int], mandatory):
    • Original-to-translated token index mapping used during forward translation.
  • translationType (str, mandatory):
    • For backward translation use the keyword "backward"

Config.py

  • device = "cpu" #or "cuda"
  • Set to 'CPU' by default. Change to 'Cuda' to use GPU.

Project Structure

└── src
  └── windowtitles_translation
    ├── constants.py
    ├── config.py
    ├── commonMethods.py
    ├── backwardTranslation.py
    ├── forwardTranslation.py
    ├── forwardTranslationWithSemantics.py
    ├── main.py
└── data
    ├── input-data
    └── output-data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

windowtitles_translation_llm-0.1.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

windowtitles_translation_llm-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file windowtitles_translation_llm-0.1.0.tar.gz.

File metadata

File hashes

Hashes for windowtitles_translation_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6398174c86152abf2299ea2c320fa72590c97eab809dedc6033daf1b58df7fe
MD5 6a84093090aee413cd9e3a44af8fdc95
BLAKE2b-256 01a04467394ef1bc278baa0b837b6b705f1b618cb234c17ec49d4795f7f968da

See more details on using hashes here.

File details

Details for the file windowtitles_translation_llm-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for windowtitles_translation_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 843711beffd0b050b972c4c326ba8fa22e39bad85b112a8cd5716772782b1acb
MD5 b74aa0f7190b31dbaf443c7a1b89cba4
BLAKE2b-256 4b9d72d94223664dc98cb59c913f34c56a83deeb1440753bf31f7e45364243c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page