Skip to main content

No project description provided

Project description

abersetz

Translate entire directories of text and Markdown files using modern AI. Feed it a folder; get back a translated folder. No boilerplate, no broken formatting.

What it does

Abersetz takes a file or directory tree, detects the text format (plain text or HTML), slices large documents into chunks at natural sentence and paragraph boundaries, feeds each chunk to a translation engine, and stitches the results back together preserving the original layout.

Translation memory carries vocabulary terms forward across chunks so "widget" in paragraph 1 still means "widget" in paragraph 47.

Engines

Abersetz speaks to several translation backends through a unified interface:

Selector What it uses
tr/google, tr/bing, … translators package — scrapes web translation endpoints
dt/google, dt/deepl, dt/microsoft, … deep-translator — more stable, official APIs
ullm/<profile> Any OpenAI-compatible LLM endpoint (OpenAI, Anthropic, SiliconFlow, local Ollama)
hysf Tencent Hunyuan via SiliconFlow API
mthy/mlx, mthy/gguf Tencent Hunyuan-MT running locally (Apple Silicon or CPU via llama.cpp)
gemma/mlx, gemma/gguf Google TranslateGemma running locally

LLM engines wrap text in XML tags and extract the <output> block from the response, which makes them tolerant of chatty models that add extra commentary.

Install

pip install abersetz
# or
uv pip install abersetz

Quick start

# Translate a single file to Spanish using Google (via translators)
abersetz translate file.md --to-lang es --engine tr/google

# Translate a directory tree to Polish using OpenAI
abersetz translate ./docs --to-lang pl --engine ullm/openai

# Dry run — verify paths and settings without burning API credits
abersetz translate ./docs --to-lang de --dry-run

# Validate that your API keys work
abersetz validate

Output files land in a subdirectory named after the target language by default (e.g. ./docs/pl/). Use --output-dir to redirect them, or --write-over to replace files in place.

CLI reference

abersetz translate <path> [options]

  --engine TEXT      Engine selector, e.g. tr/google, ullm/openai, mthy/mlx
  --from-lang TEXT   Source language code (default: auto-detect)
  --to-lang TEXT     Target language code (required)
  --output-dir PATH  Where to write translated files
  --write-over       Overwrite source files instead of creating a subdirectory
  --chunk-size INT   Max tokens per chunk for LLM engines
  --save-voc         Write a .voc.json sidecar file with accumulated terminology
  --dry-run          Show what would be translated without calling any API
  --recurse / --no-recurse  Walk subdirectories (default: on)
  --xclude PATTERN   Glob pattern(s) to skip

abersetz validate    Ping all configured engines with a test phrase
abersetz list        Show available engines from config

Configuration

Drop an abersetz.toml in your project root or ~/.config/abersetz/config.toml. Example with OpenAI:

[defaults]
engine = "ullm/openai"
to_lang = "pl"
chunk_size = 2000

[engines.ullm.options.profiles.openai]
model = "gpt-4o-mini"
base_url = ""  # leave empty for official OpenAI endpoint

[credentials]
openai = "sk-..."  # or set OPENAI_API_KEY env var

For local Hunyuan-MT on Apple Silicon:

[engines.mthy.options]
backend = "mlx"
mlx_path = "/path/to/Tencent-HunyuanMT-mlx"
max_tokens = 2048

Python API

from abersetz.pipeline import TranslatorOptions, translate_path
from pathlib import Path

results = translate_path(
    Path("./docs"),
    TranslatorOptions(engine="tr/google", to_lang="es"),
)
for r in results:
    print(f"{r.source} -> {r.destination} ({r.chunks} chunks)")

How chunking works

Translation APIs reject large inputs. LLMs have context windows. Abersetz handles both:

  • HTML: sent as one piece so tags stay intact.
  • Plain text / Markdown: split by the semantic-text-splitter library at sentence and paragraph boundaries, respecting the chunk_size setting. Falls back to brute-force character slicing if the library is unavailable.

Vocabulary accumulated during earlier chunks is included in the prompt for later ones (for LLM engines), so terminology stays consistent across the whole document.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abersetz-1.0.24.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abersetz-1.0.24-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file abersetz-1.0.24.tar.gz.

File metadata

  • Download URL: abersetz-1.0.24.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for abersetz-1.0.24.tar.gz
Algorithm Hash digest
SHA256 415a949cb4260db21ba913922488d6731c6ac7dbd9fe710cf0c1cf7e24f9d45b
MD5 3d181f096e7eb415f49d56978e9664be
BLAKE2b-256 8c8634a5e1e729cc1dbd557ed1499a2157d30461a0bfcea6c2186ce3c2a5dcc3

See more details on using hashes here.

File details

Details for the file abersetz-1.0.24-py3-none-any.whl.

File metadata

  • Download URL: abersetz-1.0.24-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for abersetz-1.0.24-py3-none-any.whl
Algorithm Hash digest
SHA256 0c9af6be092de5772c6c0e154ccec0e40bd0673a350424bb21ce68c854d0e724
MD5 c2ecff37cd8ccba86114d1afb7d85ea0
BLAKE2b-256 738baf104e8b175d260c8d043176f37434a29f444185564f3945dab38d15dd2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page