Skip to main content

Add your description here

Project description

openai-llm-translate

A Python translation toolkit for OpenAI-compatible LLM APIs. It supports plain text, long documents, Markdown, HTML/rich text, batch translation, retries, glossary hints, and controlled async concurrency.

Features

  • OpenAI SDK-compatible sync and async clients
  • Plain text translation for short strings
  • Long document splitting with context windows
  • Markdown protection for code blocks, inline code, and URLs
  • DOM-aware HTML translation with preserved tags and skipped code/script blocks
  • Batched HTML text-node translation to reduce API calls
  • Batch document translation
  • Retry handling for transient API errors
  • Configurable async concurrency via max_concurrent

Installation

This project uses uv:

uv sync

Or install the package in editable mode from this repository:

uv pip install -e .

Configuration

The package works with OpenAI-compatible APIs. Keep secrets in environment variables:

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4o-mini"

OPENAI_BASE_URL is optional when using the default OpenAI endpoint.

Quick start

from llm_translate import LLMTranslator

translator = LLMTranslator(
    api_key="your-api-key",
    base_url="https://api.openai.com/v1",
    model="gpt-4o-mini",
)

result = translator.translate(
    "Large language models can translate text.",
    source_lang="en",
    target_lang="zh-CN",
)

print(result.text)

Long document translation

Long documents are split into chunks, translated with neighboring context, and merged back in source order.

result = translator.translate_document(
    long_text,
    source_lang="en",
    target_lang="zh-CN",
)

print(result.text)
print(len(result.chunks))

Markdown translation

Markdown mode protects fenced code blocks, inline code, and URLs before translation, then restores them after translation.

result = translator.translate_document(
    markdown_text,
    source_lang="en",
    target_lang="zh-CN",
    format="markdown",
)

The older markdown=True option is also supported:

result = translator.translate_document(markdown_text, target_lang="zh-CN", markdown=True)

HTML translation

HTML mode parses the document with BeautifulSoup, translates visible text nodes, preserves the DOM structure, and skips tags such as script, style, code, pre, svg, noscript, and textarea.

result = translator.translate_document(
    "<article><h1>Hello</h1><p>World</p></article>",
    source_lang="en",
    target_lang="zh-CN",
    format="html",
)

print(result.text)

For async HTML translation, text nodes are grouped into segment batches and requests are limited by max_concurrent.

import asyncio

async def main() -> None:
    translator = LLMTranslator(
        api_key="your-api-key",
        model="gpt-4o-mini",
        max_chunk_chars=1000,
        max_concurrent=3,
    )
    result = await translator.atranslate_document(
        html_text,
        source_lang="zh-CN",
        target_lang="en",
        format="html",
    )
    print(result.text)

asyncio.run(main())

Batch translation

results = translator.translate_document_batch(
    ["First document.", "Second document."],
    source_lang="en",
    target_lang="zh-CN",
)

print([result.text for result in results])

Async batch translation preserves input order while limiting concurrency:

results = await translator.atranslate_document_batch(
    documents,
    source_lang="en",
    target_lang="zh-CN",
)

Glossary hints

Pass a glossary to bias term translation:

result = translator.translate_document(
    text,
    source_lang="en",
    target_lang="zh-CN",
    glossary={"watch": "手表", "battery life": "电池续航"},
)

Real API example

Run the included example after setting environment variables:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini"
uv run python examples/real_translation.py

Optional tuning:

export OPENAI_BASE_URL="https://api.openai.com/v1"
export TRANSLATE_MAX_CHUNK_CHARS="1000"
export TRANSLATE_MAX_CONCURRENT="3"
export TRANSLATE_MAX_RETRIES="2"

Development

Run tests:

uv run pytest

Current test coverage includes text splitting, Markdown protection, client APIs, retry behavior, batch flows, async document concurrency, and HTML translation.

Release

Publishing is handled by GitHub Actions when a version tag is pushed.

Before the first release, configure PyPI Trusted Publishing for this repository:

  • PyPI project name: openai-llm-translate
  • Repository owner: yunhai-dev
  • Repository name: llm-translate
  • Workflow name: publish.yml
  • Environment name: pypi

Then create and push a version tag:

git tag v0.1.0
git push origin v0.1.0

The workflow runs the test suite, builds the package with uv build, and publishes with uv publish.

Project status

This package is under active development. The current implementation focuses on robust OpenAI-compatible translation flows for text, Markdown, and HTML documents.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_llm_translate-0.1.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openai_llm_translate-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file openai_llm_translate-0.1.0.tar.gz.

File metadata

  • Download URL: openai_llm_translate-0.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openai_llm_translate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 519675657e04e2cda5d8ee3f00f2e4d731096f41a6641e83fcb4d1370dbc75d5
MD5 be365198b2edd9e07070542b0b9fc816
BLAKE2b-256 5e923c799fa1023778796c95e0ca68981ac64255e94d15379a844eff1b19c983

See more details on using hashes here.

File details

Details for the file openai_llm_translate-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openai_llm_translate-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openai_llm_translate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b78c80b49291f366f13dfcdfbad162a2bd555a11ce6a30cba4f783763018a9da
MD5 8b90c3eecf301bd7429d4f9a260f30f2
BLAKE2b-256 b526ce3b4c61230a12f5c2a73cee3d0166cb74b50ba333882d6b085cf2398f52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page