Skip to main content

Add your description here

Project description

openai-llm-translate

A Python translation toolkit for OpenAI-compatible LLM APIs. It supports plain text, long documents, Markdown, HTML/rich text, batch translation, retries, glossary hints, and controlled async concurrency.

Features

  • OpenAI SDK-compatible sync and async clients
  • Plain text translation for short strings
  • Long document splitting with context windows
  • Markdown protection for code blocks, inline code, and URLs
  • DOM-aware HTML translation with preserved tags and skipped code/script blocks
  • Batched HTML text-node translation to reduce API calls
  • Batch document translation
  • Retry handling for transient API errors
  • Configurable async concurrency via max_concurrent

Installation

This project uses uv:

uv sync

Or install the package in editable mode from this repository:

uv pip install -e .

Configuration

The package works with OpenAI-compatible APIs. Keep secrets in environment variables:

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"

OPENAI_BASE_URL is optional when using the default OpenAI endpoint.

Quick start

from llm_translate import LLMTranslator

translator = LLMTranslator(
    api_key="your-api-key",
    base_url="https://api.openai.com/v1",
    model="gpt-4o-mini",
)

result = translator.translate(
    "Large language models can translate text.",
    source_lang="en",
    target_lang="zh-CN",
)

print(result.text)

Long document translation

Long documents are split into chunks, translated with neighboring context, and merged back in source order.

result = translator.translate_document(
    long_text,
    source_lang="en",
    target_lang="zh-CN",
)

print(result.text)
print(len(result.chunks))

Markdown translation

Markdown mode protects fenced code blocks, inline code, and URLs before translation, then restores them after translation.

result = translator.translate_document(
    markdown_text,
    source_lang="en",
    target_lang="zh-CN",
    format="markdown",
)

The older markdown=True option is also supported:

result = translator.translate_document(markdown_text, target_lang="zh-CN", markdown=True)

HTML translation

HTML mode parses the document with BeautifulSoup, translates visible text nodes, preserves the DOM structure, and skips tags such as script, style, code, pre, svg, noscript, and textarea.

result = translator.translate_document(
    "<article><h1>Hello</h1><p>World</p></article>",
    source_lang="en",
    target_lang="zh-CN",
    format="html",
)

print(result.text)

For async HTML translation, text nodes are grouped into segment batches and requests are limited by max_concurrent.

import asyncio

async def main() -> None:
    translator = LLMTranslator(
        api_key="your-api-key",
        model="gpt-4o-mini",
        max_chunk_chars=1000,
        max_concurrent=3,
    )
    result = await translator.atranslate_document(
        html_text,
        source_lang="zh-CN",
        target_lang="en",
        format="html",
    )
    print(result.text)

asyncio.run(main())

Batch translation

results = translator.translate_document_batch(
    ["First document.", "Second document."],
    source_lang="en",
    target_lang="zh-CN",
)

print([result.text for result in results])

Async batch translation preserves input order while limiting concurrency:

results = await translator.atranslate_document_batch(
    documents,
    source_lang="en",
    target_lang="zh-CN",
)

Glossary hints

Pass a glossary to bias term translation:

result = translator.translate_document(
    text,
    source_lang="en",
    target_lang="zh-CN",
    glossary={"watch": "手表", "battery life": "电池续航"},
)

Real API example

Run the included example after setting environment variables:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini"
uv run python examples/real_translation.py

Optional tuning:

export OPENAI_BASE_URL="https://api.openai.com/v1"
export TRANSLATE_MAX_CHUNK_CHARS="1000"
export TRANSLATE_MAX_CONCURRENT="3"
export TRANSLATE_MAX_RETRIES="2"

Development

Run tests:

uv run pytest

Current test coverage includes text splitting, Markdown protection, client APIs, retry behavior, batch flows, async document concurrency, and HTML translation.

Release

Publishing is handled by GitHub Actions when a version tag is pushed.

Before the first release, configure PyPI Trusted Publishing for this repository:

  • PyPI project name: openai-llm-translate
  • Repository owner: yunhai-dev
  • Repository name: llm-translate
  • Workflow name: publish.yml
  • Environment name: pypi

Then create and push a version tag:

git tag v0.1.0
git push origin v0.1.0

The workflow runs the test suite, builds the package with uv build, and publishes with uv publish.

Project status

This package is under active development. The current implementation focuses on robust OpenAI-compatible translation flows for text, Markdown, and HTML documents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_llm_translate-0.1.2.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openai_llm_translate-0.1.2-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file openai_llm_translate-0.1.2.tar.gz.

File metadata

  • Download URL: openai_llm_translate-0.1.2.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openai_llm_translate-0.1.2.tar.gz
Algorithm Hash digest
SHA256 438a92721fad720ccc7c5f6e613e61c825f6bfd3e3f5b63090a1a4d756933679
MD5 a9e0fd470c4e4a856c74ca3f0e725dbf
BLAKE2b-256 b81cc14861752c08c0756ba631cf010377bbb622bfdc7de833611040f1201503

See more details on using hashes here.

File details

Details for the file openai_llm_translate-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: openai_llm_translate-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openai_llm_translate-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aef981f80314cfa3a89718b1767387a0cf01630fa212b0dd401fb3e0edf9d9a4
MD5 02e400f1494f1e4eefe569bfd5528c70
BLAKE2b-256 1e973957f71318f5e78c07908e6e53e43405b298a898364330a6db9f063953c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page