No project description provided
Project description
abersetz
Translate entire directories of text and Markdown files using modern AI. Feed it a folder; get back a translated folder. No boilerplate, no broken formatting.
What it does
Abersetz takes a file or directory tree, detects the text format (plain text or HTML), slices large documents into chunks at natural sentence and paragraph boundaries, feeds each chunk to a translation engine, and stitches the results back together preserving the original layout.
Translation memory carries vocabulary terms forward across chunks so "widget" in paragraph 1 still means "widget" in paragraph 47.
Engines
Abersetz speaks to several translation backends through a unified interface:
| Selector | What it uses |
|---|---|
tr/google, tr/bing, … |
translators package — scrapes web translation endpoints |
dt/google, dt/deepl, dt/microsoft, … |
deep-translator — more stable, official APIs |
ullm/<profile> |
Any OpenAI-compatible LLM endpoint (OpenAI, Anthropic, SiliconFlow, local Ollama) |
hysf |
Tencent Hunyuan via SiliconFlow API |
mthy/mlx, mthy/gguf |
Tencent Hunyuan-MT running locally (Apple Silicon or CPU via llama.cpp) |
gemma/mlx, gemma/gguf |
Google TranslateGemma running locally |
LLM engines wrap text in XML tags and extract the <output> block from the response, which makes them tolerant of chatty models that add extra commentary.
Install
pip install abersetz
# or
uv pip install abersetz
Quick start
# Translate a single file to Spanish using Google (via translators)
abersetz translate file.md --to-lang es --engine tr/google
# Translate a directory tree to Polish using OpenAI
abersetz translate ./docs --to-lang pl --engine ullm/openai
# Dry run — verify paths and settings without burning API credits
abersetz translate ./docs --to-lang de --dry-run
# Validate that your API keys work
abersetz validate
Output files land in a subdirectory named after the target language by default (e.g. ./docs/pl/). Use --output-dir to redirect them, or --write-over to replace files in place.
CLI reference
abersetz translate <path> [options]
--engine TEXT Engine selector, e.g. tr/google, ullm/openai, mthy/mlx
--from-lang TEXT Source language code (default: auto-detect)
--to-lang TEXT Target language code (required)
--output-dir PATH Where to write translated files
--write-over Overwrite source files instead of creating a subdirectory
--chunk-size INT Max tokens per chunk for LLM engines
--save-voc Write a .voc.json sidecar file with accumulated terminology
--dry-run Show what would be translated without calling any API
--recurse / --no-recurse Walk subdirectories (default: on)
--xclude PATTERN Glob pattern(s) to skip
abersetz validate Ping all configured engines with a test phrase
abersetz list Show available engines from config
Configuration
Drop an abersetz.toml in your project root or ~/.config/abersetz/config.toml. Example with OpenAI:
[defaults]
engine = "ullm/openai"
to_lang = "pl"
chunk_size = 2000
[engines.ullm.options.profiles.openai]
model = "gpt-4o-mini"
base_url = "" # leave empty for official OpenAI endpoint
[credentials]
openai = "sk-..." # or set OPENAI_API_KEY env var
For local Hunyuan-MT on Apple Silicon:
[engines.mthy.options]
backend = "mlx"
mlx_path = "/path/to/Tencent-HunyuanMT-mlx"
max_tokens = 2048
Python API
from abersetz.pipeline import TranslatorOptions, translate_path
from pathlib import Path
results = translate_path(
Path("./docs"),
TranslatorOptions(engine="tr/google", to_lang="es"),
)
for r in results:
print(f"{r.source} -> {r.destination} ({r.chunks} chunks)")
How chunking works
Translation APIs reject large inputs. LLMs have context windows. Abersetz handles both:
- HTML: sent as one piece so tags stay intact.
- Plain text / Markdown: split by the
semantic-text-splitterlibrary at sentence and paragraph boundaries, respecting thechunk_sizesetting. Falls back to brute-force character slicing if the library is unavailable.
Vocabulary accumulated during earlier chunks is included in the prompt for later ones (for LLM engines), so terminology stays consistent across the whole document.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abersetz-1.0.24.tar.gz.
File metadata
- Download URL: abersetz-1.0.24.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
415a949cb4260db21ba913922488d6731c6ac7dbd9fe710cf0c1cf7e24f9d45b
|
|
| MD5 |
3d181f096e7eb415f49d56978e9664be
|
|
| BLAKE2b-256 |
8c8634a5e1e729cc1dbd557ed1499a2157d30461a0bfcea6c2186ce3c2a5dcc3
|
File details
Details for the file abersetz-1.0.24-py3-none-any.whl.
File metadata
- Download URL: abersetz-1.0.24-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c9af6be092de5772c6c0e154ccec0e40bd0673a350424bb21ce68c854d0e724
|
|
| MD5 |
c2ecff37cd8ccba86114d1afb7d85ea0
|
|
| BLAKE2b-256 |
738baf104e8b175d260c8d043176f37434a29f444185564f3945dab38d15dd2a
|