Add your description here
Project description
openai-llm-translate
A Python translation toolkit for OpenAI-compatible LLM APIs. It supports plain text, long documents, Markdown, HTML/rich text, batch translation, retries, glossary hints, and controlled async concurrency.
Features
- OpenAI SDK-compatible sync and async clients
- Plain text translation for short strings
- Long document splitting with context windows
- Markdown protection for code blocks, inline code, and URLs
- DOM-aware HTML translation with preserved tags and skipped code/script blocks
- Batched HTML text-node translation to reduce API calls
- Batch document translation
- Retry handling for transient API errors
- Configurable async concurrency via
max_concurrent
Installation
This project uses uv:
uv sync
Or install the package in editable mode from this repository:
uv pip install -e .
Configuration
The package works with OpenAI-compatible APIs. Keep secrets in environment variables:
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"
OPENAI_BASE_URL is optional when using the default OpenAI endpoint.
Quick start
from llm_translate import LLMTranslator
translator = LLMTranslator(
api_key="your-api-key",
base_url="https://api.openai.com/v1",
model="gpt-4o-mini",
)
result = translator.translate(
"Large language models can translate text.",
source_lang="en",
target_lang="zh-CN",
)
print(result.text)
Long document translation
Long documents are split into chunks, translated with neighboring context, and merged back in source order.
result = translator.translate_document(
long_text,
source_lang="en",
target_lang="zh-CN",
)
print(result.text)
print(len(result.chunks))
Markdown translation
Markdown mode protects fenced code blocks, inline code, and URLs before translation, then restores them after translation.
result = translator.translate_document(
markdown_text,
source_lang="en",
target_lang="zh-CN",
format="markdown",
)
The older markdown=True option is also supported:
result = translator.translate_document(markdown_text, target_lang="zh-CN", markdown=True)
HTML translation
HTML mode parses the document with BeautifulSoup, translates visible text nodes, preserves the DOM structure, and skips tags such as script, style, code, pre, svg, noscript, and textarea.
result = translator.translate_document(
"<article><h1>Hello</h1><p>World</p></article>",
source_lang="en",
target_lang="zh-CN",
format="html",
)
print(result.text)
For async HTML translation, text nodes are grouped into segment batches and requests are limited by max_concurrent.
import asyncio
async def main() -> None:
translator = LLMTranslator(
api_key="your-api-key",
model="gpt-4o-mini",
max_chunk_chars=1000,
max_concurrent=3,
)
result = await translator.atranslate_document(
html_text,
source_lang="zh-CN",
target_lang="en",
format="html",
)
print(result.text)
asyncio.run(main())
Batch translation
results = translator.translate_document_batch(
["First document.", "Second document."],
source_lang="en",
target_lang="zh-CN",
)
print([result.text for result in results])
Async batch translation preserves input order while limiting concurrency:
results = await translator.atranslate_document_batch(
documents,
source_lang="en",
target_lang="zh-CN",
)
Glossary hints
Pass a glossary to bias term translation:
result = translator.translate_document(
text,
source_lang="en",
target_lang="zh-CN",
glossary={"watch": "手表", "battery life": "电池续航"},
)
Real API example
Run the included example after setting environment variables:
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini"
uv run python examples/real_translation.py
Optional tuning:
export OPENAI_BASE_URL="https://api.openai.com/v1"
export TRANSLATE_MAX_CHUNK_CHARS="1000"
export TRANSLATE_MAX_CONCURRENT="3"
export TRANSLATE_MAX_RETRIES="2"
Development
Run tests:
uv run pytest
Current test coverage includes text splitting, Markdown protection, client APIs, retry behavior, batch flows, async document concurrency, and HTML translation.
Release
Publishing is handled by GitHub Actions when a version tag is pushed.
Before the first release, configure PyPI Trusted Publishing for this repository:
- PyPI project name:
openai-llm-translate - Repository owner:
yunhai-dev - Repository name:
llm-translate - Workflow name:
publish.yml - Environment name:
pypi
Then create and push a version tag:
git tag v0.1.0
git push origin v0.1.0
The workflow runs the test suite, builds the package with uv build, and publishes with uv publish.
Project status
This package is under active development. The current implementation focuses on robust OpenAI-compatible translation flows for text, Markdown, and HTML documents.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openai_llm_translate-0.1.2.tar.gz.
File metadata
- Download URL: openai_llm_translate-0.1.2.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
438a92721fad720ccc7c5f6e613e61c825f6bfd3e3f5b63090a1a4d756933679
|
|
| MD5 |
a9e0fd470c4e4a856c74ca3f0e725dbf
|
|
| BLAKE2b-256 |
b81cc14861752c08c0756ba631cf010377bbb622bfdc7de833611040f1201503
|
File details
Details for the file openai_llm_translate-0.1.2-py3-none-any.whl.
File metadata
- Download URL: openai_llm_translate-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aef981f80314cfa3a89718b1767387a0cf01630fa212b0dd401fb3e0edf9d9a4
|
|
| MD5 |
02e400f1494f1e4eefe569bfd5528c70
|
|
| BLAKE2b-256 |
1e973957f71318f5e78c07908e6e53e43405b298a898364330a6db9f063953c7
|