AI-native localization pipeline with automated quality control

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Omni-Localizer (OL)

AI-native localization pipeline that translates documents through intelligent LLM routing with built-in quality control.

What It Does

Translate documents (Markdown, XLIFF) using LLM APIs
Automatic failover — switches to backup model if primary fails
Quality preservation — shields code blocks, links, images during translation
LLM-based judging — evaluates translation accuracy and fluency
Restoration layer — uses LLM to restore placeholders after translation

Quick Start

1. Install

pip install -e .

2. Configure API Keys

Create a .bat file (gitignored) with your API keys:

@echo off
set OPENAI_API_KEY=your_api_key
set PYTHONPATH=src
python -m ol_cli translate-md %* -c config/default.yaml -s en -t zh

3. Run

test_en_to_zh.bat your_document.md -o output/

Configuration

config/default.yaml — Example LLM pool configuration:

llm_pool:
  translation:
    - provider: "openai"
      model: "gpt-4o-mini"
      priority: 1
      api_key: "${OPENAI_API_KEY}"
      role: "translation"
    - provider: "openai"
      model: "gpt-4o"
      priority: 2
      api_key: "${OPENAI_API_KEY}"
      role: "translation"
  judging:
    - provider: "openai"
      model: "gpt-4o-mini"
      priority: 1
      api_key: "${OPENAI_API_KEY}"
      role: "judging"
  restoration:
    - provider: "openai"
      model: "gpt-4o-mini"
      priority: 1
      api_key: "${OPENAI_API_KEY}"
      role: "restoration"

CLI Commands

# Translate markdown
ol translate-md <file.md> -c <config.yaml> -s en -t zh -o output/

# Translate XLIFF
ol translate-xliff <file.xlf> -c <config.yaml> -s en -t zh -o output/

# Extract warnings from file
ol extract-warnings <file> -o warnings.md

Output Metadata

YAML Frontmatter (Markdown)

When translating Markdown files, OL automatically adds YAML frontmatter to the output:

---
source_lang: en
target_lang: zh
original_file: input.md
processor: "OL"
version: "0.2.0"
translated_at: 2026-05-22T15:00:00Z
---

# Content follows...

CLI Control:

# Enable frontmatter (default)
ol translate-md input.md -s en -t zh -o output/

# Disable frontmatter
ol translate-md input.md -s en -t zh -o output/ --no-frontmatter

XLIFF Header Note

When translating XLIFF files, OL adds a header note with translation metadata:

<?xml version="1.0" encoding="utf-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <header>
    <note from="OL">Translated from en to zh by OL</note>
  </header>
  <file original="input.xlf" source-language="en" target-language="zh">
    ...
  </file>
</xliff>

Batch Processing

Batch translate supports the same frontmatter options:

# With frontmatter (default)
ol translate-batch ./docs/ -s en -t zh -o output/

# Without frontmatter
ol translate-batch ./docs/ -s en -t zh -o output/ --no-frontmatter

Key Features

Feature	Description
Model Pool Failover	LiteLLM router with primary + backup models per role
Content Shielding	Code blocks, links, images preserved during translation
4-Layer Repair	Regex → Span alignment → LLM restoration → Safe fallback
Translation + Judging	JudgeService evaluates quality (adequacy, fluency, terminology)
TM Integration	hypomnema for translation memory lookups
TM/TB/SG Automation	Pre-injection of TM matches + glossary terms for context-aware translation
Term Disambiguation	LLM-based polyseme resolution with confidence fallback
QA Rules Subset	translate-toolkit pofilter rules (accelerators, brackets, printf, variables, xmltags)

Architecture

MD Channel: Token Stream + 4-layer semantic repair
XLIFF Channel: translate-toolkit based
LLM Routing: LiteLLM with model pool failover
LQA: openevalkit Scorer→Judge + COMET
TM: hypomnema (TMX)
Alignment: span-aligner + VectorAlign
TM/TB/SG Automation: Plan B pre-injection (query TM/glossary before translate(), inject into prompt)

TM/TB/SG Automation (MVP Phase 1)

Omni-Localizer supports agent-native translation memory and terminology workflows for higher-quality, consistent translations.

Glossary Format

JSON glossary with nested structure:

{
  "API endpoint": {
    "translation": "API 端点",
    "variants": {"API endpoint": "API 端点", "API endpoints": "API 端点"},
    "confidence": 0.95
  }
}

Translation Memory + Glossary Injection

When BatchProcessor is initialized with a tm_service and glossary:

TM lookup: TMService.search() queries source text against TMX translation memory
Top-3 matches (threshold 0.85) are selected
Relevant glossary terms are extracted via get_relevant_terms() (top-5, relevance-selected, not random)
build_translate_prompt() pre-injects context into the LLM prompt

Terminology Extraction

Auto-build glossary from source texts using KeyBERT (with sentence-transformers) or YAKE fallback:

from ol_terminology.extractor import extract_terms
terms = extract_terms(["source text 1", "source text 2"])
# Returns dict[str, float]: term -> importance_score

Term Disambiguation

Resolve polysemous terms with LLM-based context understanding:

from ol_terminology.disambiguator import disambiguate
resolved = disambiguate(text, glossary, model_pool=model_pool)
# Returns dict[str, str]: term -> resolved_translation

QA Rules Subset

Run a focused set of translate-toolkit pofilter checks:

from ol_lqa.qa_rules import check_pair, QAWarning
warnings = check_pair(source, target)
# Selected rules: accelerators, brackets, printf, variables, xmltags

Graceful Degradation

If TM service or glossary is unavailable, translation proceeds without context injection—no blocking errors.

Dependencies

TM/TB/SG features require additional packages:

pip install -e ".[ml]"  # sentence-transformers + torch
pip install keybert>=0.9.0 yake>=0.5.0

Agent Usage

Omni-Localizer can be used as a skill by coding agents (OpenCode, Hermes). Agents read the SKILL.md file to understand how to invoke translation.

OpenCode

Add the skill to your project:

cp -r src/.opencode/skills/ol-localizer <your-project>/.opencode/skills/

Reference it in your OpenCode configuration if needed

For detailed usage, see src/.opencode/skills/ol-localizer/SKILL.md

Hermes

Copy or symlink the skill:

cp -r src/.hermes/skills/ol-localizer ~/.hermes/skills/

Restart Hermes to activate

For detailed usage, see src/.hermes/skills/ol-localizer/SKILL.md

Environment Variables

Configure your LLM provider API keys in your shell environment.

Testing the Agent Integration

Verify skill files exist:

ls src/.opencode/skills/ol-localizer/SKILL.md
ls src/.hermes/skills/ol-localizer/SKILL.md

Test JSON output (machine-readable for agents):

python -m ol_cli translate-md input.md -c config/default.yaml -s en -t zh -o output/ --json

Expected JSON output:

{"success": true, "input_file": "input.md", "output_file": "output/input.md", "source_lang": "en", "target_lang": "zh"}

Run skill tests:

pytest tests/test_opencode_skill.py tests/test_hermes_skill.py -v

Verify --json flag in help:

python -m ol_cli translate-md --help | grep json

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

1StepMore

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.3

May 23, 2026

0.2.1

May 22, 2026

0.2.0

May 22, 2026

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omni_localizer-0.2.3.tar.gz (109.1 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omni_localizer-0.2.3-py3-none-any.whl (66.7 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file omni_localizer-0.2.3.tar.gz.

File metadata

Download URL: omni_localizer-0.2.3.tar.gz
Upload date: May 23, 2026
Size: 109.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omni_localizer-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`d1cf246f3f7874d41c784cff23144f5a2baedf87bf858976918371304103a8cd`
MD5	`38387d798014b021cb6667e647c80ae9`
BLAKE2b-256	`d97206759c2998599cac34849040a331c2cab5c1f1ff5798b0cc57a4fedd191d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for omni_localizer-0.2.3.tar.gz:

Publisher: publish.yml on 1StepMore/Omni_Localizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: omni_localizer-0.2.3.tar.gz
- Subject digest: d1cf246f3f7874d41c784cff23144f5a2baedf87bf858976918371304103a8cd
- Sigstore transparency entry: 1610237083
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: 1StepMore/Omni_Localizer@dce0b52c9206518622cfd424f3252fa466d147ef
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/1StepMore
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dce0b52c9206518622cfd424f3252fa466d147ef
- Trigger Event: push

File details

Details for the file omni_localizer-0.2.3-py3-none-any.whl.

File metadata

Download URL: omni_localizer-0.2.3-py3-none-any.whl
Upload date: May 23, 2026
Size: 66.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omni_localizer-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`17f802d85fe08a8ce845163a9d9fb811b9be1688d7ad33a5bdff88c0fae4d120`
MD5	`36a482d5a18846db421d851f3208ccf6`
BLAKE2b-256	`f7cfd2de44019d0ab36f84b6e03e30890712bee105a71ee0c763bdfb02906f57`

See more details on using hashes here.

Provenance

The following attestation bundles were made for omni_localizer-0.2.3-py3-none-any.whl:

Publisher: publish.yml on 1StepMore/Omni_Localizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: omni_localizer-0.2.3-py3-none-any.whl
- Subject digest: 17f802d85fe08a8ce845163a9d9fb811b9be1688d7ad33a5bdff88c0fae4d120
- Sigstore transparency entry: 1610237311
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: 1StepMore/Omni_Localizer@dce0b52c9206518622cfd424f3252fa466d147ef
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/1StepMore
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dce0b52c9206518622cfd424f3252fa466d147ef
- Trigger Event: push

omni-localizer 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Omni-Localizer (OL)

What It Does

Quick Start

1. Install

2. Configure API Keys

3. Run

Configuration

CLI Commands

Output Metadata

YAML Frontmatter (Markdown)

XLIFF Header Note

Batch Processing

Key Features

Architecture

TM/TB/SG Automation (MVP Phase 1)

Glossary Format

Translation Memory + Glossary Injection

Terminology Extraction

Term Disambiguation

QA Rules Subset

Graceful Degradation

Dependencies

Agent Usage

OpenCode

Hermes

Environment Variables

Testing the Agent Integration

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance