AI-native localization pipeline with automated quality control
Project description
Omni-Localizer (OL)
AI-native localization pipeline that translates documents through intelligent LLM routing with built-in quality control.
What It Does
- Translate documents (Markdown, XLIFF) using LLM APIs
- Automatic failover — switches to backup model if primary fails
- Quality preservation — shields code blocks, links, images during translation
- LLM-based judging — evaluates translation accuracy and fluency
- Restoration layer — uses LLM to restore placeholders after translation
Quick Start
1. Install
pip install -e .
2. Configure API Keys
Create a .bat file (gitignored) with your API keys:
@echo off
set OPENAI_API_KEY=your_api_key
set PYTHONPATH=src
python -m ol_cli translate-md %* -c config/default.yaml -s en -t zh
3. Run
test_en_to_zh.bat your_document.md -o output/
Configuration
config/default.yaml — Example LLM pool configuration:
llm_pool:
translation:
- provider: "openai"
model: "gpt-4o-mini"
priority: 1
api_key: "${OPENAI_API_KEY}"
role: "translation"
- provider: "openai"
model: "gpt-4o"
priority: 2
api_key: "${OPENAI_API_KEY}"
role: "translation"
judging:
- provider: "openai"
model: "gpt-4o-mini"
priority: 1
api_key: "${OPENAI_API_KEY}"
role: "judging"
restoration:
- provider: "openai"
model: "gpt-4o-mini"
priority: 1
api_key: "${OPENAI_API_KEY}"
role: "restoration"
CLI Commands
# Translate markdown
ol translate-md <file.md> -c <config.yaml> -s en -t zh -o output/
# Translate XLIFF
ol translate-xliff <file.xlf> -c <config.yaml> -s en -t zh -o output/
# Extract warnings from file
ol extract-warnings <file> -o warnings.md
Output Metadata
YAML Frontmatter (Markdown)
When translating Markdown files, OL automatically adds YAML frontmatter to the output:
---
source_lang: en
target_lang: zh
original_file: input.md
processor: "OL"
version: "0.2.0"
translated_at: 2026-05-22T15:00:00Z
---
# Content follows...
CLI Control:
# Enable frontmatter (default)
ol translate-md input.md -s en -t zh -o output/
# Disable frontmatter
ol translate-md input.md -s en -t zh -o output/ --no-frontmatter
XLIFF Header Note
When translating XLIFF files, OL adds a header note with translation metadata:
<?xml version="1.0" encoding="utf-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
<header>
<note from="OL">Translated from en to zh by OL</note>
</header>
<file original="input.xlf" source-language="en" target-language="zh">
...
</file>
</xliff>
Batch Processing
Batch translate supports the same frontmatter options:
# With frontmatter (default)
ol translate-batch ./docs/ -s en -t zh -o output/
# Without frontmatter
ol translate-batch ./docs/ -s en -t zh -o output/ --no-frontmatter
Key Features
| Feature | Description |
|---|---|
| Model Pool Failover | LiteLLM router with primary + backup models per role |
| Content Shielding | Code blocks, links, images preserved during translation |
| 4-Layer Repair | Regex → Span alignment → LLM restoration → Safe fallback |
| Translation + Judging | JudgeService evaluates quality (adequacy, fluency, terminology) |
| TM Integration | hypomnema for translation memory lookups |
Architecture
- MD Channel: Token Stream + 4-layer semantic repair
- XLIFF Channel: translate-toolkit based
- LLM Routing: LiteLLM with model pool failover
- LQA: openevalkit Scorer→Judge + COMET
- TM: hypomnema (TMX)
- Alignment: span-aligner + VectorAlign
Agent Usage
Omni-Localizer can be used as a skill by coding agents (OpenCode, Hermes). Agents read the SKILL.md file to understand how to invoke translation.
OpenCode
-
Add the skill to your project:
cp -r src/.opencode/skills/ol-localizer <your-project>/.opencode/skills/
-
Reference it in your OpenCode configuration if needed
For detailed usage, see src/.opencode/skills/ol-localizer/SKILL.md
Hermes
-
Copy or symlink the skill:
cp -r src/.hermes/skills/ol-localizer ~/.hermes/skills/
-
Restart Hermes to activate
For detailed usage, see src/.hermes/skills/ol-localizer/SKILL.md
Environment Variables
Configure your LLM provider API keys in your shell environment.
Testing the Agent Integration
Verify skill files exist:
ls src/.opencode/skills/ol-localizer/SKILL.md
ls src/.hermes/skills/ol-localizer/SKILL.md
Test JSON output (machine-readable for agents):
python -m ol_cli translate-md input.md -c config/default.yaml -s en -t zh -o output/ --json
Expected JSON output:
{"success": true, "input_file": "input.md", "output_file": "output/input.md", "source_lang": "en", "target_lang": "zh"}
Run skill tests:
pytest tests/test_opencode_skill.py tests/test_hermes_skill.py -v
Verify --json flag in help:
python -m ol_cli translate-md --help | grep json
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omni_localizer-0.2.1.tar.gz.
File metadata
- Download URL: omni_localizer-0.2.1.tar.gz
- Upload date:
- Size: 92.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a4ab50be03c5309699329ea477f0f9597c34e407136d74cd02da395ee0149ef
|
|
| MD5 |
3bbbe908abce132c452cc89d8cb9b8c5
|
|
| BLAKE2b-256 |
a100ba8ed470f176012d48a318d93c0b4286df9d5cbc7bcdaeb2649b658ad580
|
Provenance
The following attestation bundles were made for omni_localizer-0.2.1.tar.gz:
Publisher:
publish.yml on 1StepMore/Omni_Localizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omni_localizer-0.2.1.tar.gz -
Subject digest:
3a4ab50be03c5309699329ea477f0f9597c34e407136d74cd02da395ee0149ef - Sigstore transparency entry: 1601758491
- Sigstore integration time:
-
Permalink:
1StepMore/Omni_Localizer@95b2e99823d8e71750c10b3e452260ff7acbbe7e -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/1StepMore
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@95b2e99823d8e71750c10b3e452260ff7acbbe7e -
Trigger Event:
push
-
Statement type:
File details
Details for the file omni_localizer-0.2.1-py3-none-any.whl.
File metadata
- Download URL: omni_localizer-0.2.1-py3-none-any.whl
- Upload date:
- Size: 55.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
282f4531f0d7fead58be15d8f2293416de82c029dea0877e63a112e6dca6247d
|
|
| MD5 |
2d4417057b44b980940d333cc2b0b3bb
|
|
| BLAKE2b-256 |
d8ee5de1ee15e5f45a70a3f48992eeee4f72ab5f970a2dcbecd0288dd8e8d734
|
Provenance
The following attestation bundles were made for omni_localizer-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on 1StepMore/Omni_Localizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omni_localizer-0.2.1-py3-none-any.whl -
Subject digest:
282f4531f0d7fead58be15d8f2293416de82c029dea0877e63a112e6dca6247d - Sigstore transparency entry: 1601758589
- Sigstore integration time:
-
Permalink:
1StepMore/Omni_Localizer@95b2e99823d8e71750c10b3e452260ff7acbbe7e -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/1StepMore
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@95b2e99823d8e71750c10b3e452260ff7acbbe7e -
Trigger Event:
push
-
Statement type: