Skip to main content

No project description provided

Project description

abersetz

Translate entire directories of text and Markdown files using modern AI. Feed it a folder; get back a translated folder. No boilerplate, no broken formatting.

What it does

Abersetz takes a file or directory tree, detects the text format (plain text or HTML), slices large documents into chunks at natural sentence and paragraph boundaries, feeds each chunk to a translation engine, and stitches the results back together preserving the original layout.

Translation memory carries vocabulary terms forward across chunks so "widget" in paragraph 1 still means "widget" in paragraph 47.

Engines

Abersetz speaks to several translation backends through a unified interface:

Selector What it uses
tr/google, tr/bing, … translators package — scrapes web translation endpoints
dt/google, dt/deepl, dt/microsoft, … deep-translator — more stable, official APIs
ullm/<profile> Any OpenAI-compatible LLM endpoint (OpenAI, Anthropic, SiliconFlow, local Ollama)
lms LMStudio local models via the official lmstudio SDK
mthy/mlx, mthy/gguf Tencent Hunyuan-MT running locally (Apple Silicon or CPU via llama.cpp)
gemma/mlx, gemma/gguf Google TranslateGemma running locally

LLM engines wrap text in XML tags and extract the <output> block from the response, which makes them tolerant of chatty models that add extra commentary.

Install

pip install abersetz
# or
uv pip install abersetz

Quick start

# Translate a single file to Spanish using Google (via translators)
abersetz translate file.md --to-lang es --engine tr/google

# Translate a directory tree to Polish using OpenAI
abersetz translate ./docs --to-lang pl --engine ullm/openai

# Dry run — verify paths and settings without burning API credits
abersetz translate ./docs --to-lang de --dry-run

# Validate that your API keys work
abersetz validate

Output files land in a subdirectory named after the target language by default (e.g. ./docs/pl/). Use --output-dir to redirect them, or --write-over to replace files in place.

CLI reference

abersetz translate <path> [options]

  --engine TEXT      Engine selector, e.g. tr/google, ullm/openai, mthy/mlx
  --from-lang TEXT   Source language code (default: auto-detect)
  --to-lang TEXT     Target language code (required)
  --output-dir PATH  Where to write translated files
  --write-over       Overwrite source files instead of creating a subdirectory
  --chunk-size INT   Max tokens per chunk for LLM engines
  --save-voc         Write a .voc.json sidecar file with accumulated terminology
  --dry-run          Show what would be translated without calling any API
  --recurse / --no-recurse  Walk subdirectories (default: on)
  --xclude PATTERN   Glob pattern(s) to skip

abersetz validate    Ping all configured engines with a test phrase
abersetz list        Show available engines from config

Configuration

Drop an abersetz.toml in your project root or ~/.config/abersetz/config.toml. Example with OpenAI:

[defaults]
engine = "ullm/openai"
to_lang = "pl"
chunk_size = 2000

[engines.ullm.options.profiles.openai]
model = "gpt-4o-mini"
base_url = ""  # leave empty for official OpenAI endpoint

[credentials]
openai = "sk-..."  # or set OPENAI_API_KEY env var

For local Hunyuan-MT on Apple Silicon:

[engines.mthy.options]
backend = "mlx"
mlx_path = "/path/to/Tencent-HunyuanMT-mlx"
max_tokens = 2048

Python API

from abersetz.pipeline import TranslatorOptions, translate_path
from pathlib import Path

results = translate_path(
    Path("./docs"),
    TranslatorOptions(engine="tr/google", to_lang="es"),
)
for r in results:
    print(f"{r.source} -> {r.destination} ({r.chunks} chunks)")

How chunking works

Translation APIs reject large inputs. LLMs have context windows. Abersetz handles both:

  • HTML: sent as one piece so tags stay intact.
  • Plain text / Markdown: split by the semantic-text-splitter library at sentence and paragraph boundaries, respecting the chunk_size setting. Falls back to brute-force character slicing if the library is unavailable.

Vocabulary accumulated during earlier chunks is included in the prompt for later ones (for LLM engines), so terminology stays consistent across the whole document.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abersetz-1.0.25.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abersetz-1.0.25-py3-none-any.whl (59.4 kB view details)

Uploaded Python 3

File details

Details for the file abersetz-1.0.25.tar.gz.

File metadata

  • Download URL: abersetz-1.0.25.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for abersetz-1.0.25.tar.gz
Algorithm Hash digest
SHA256 de23e6b74de79d791dc685f5d357eff677125e112416fb829ff720960f394997
MD5 6e472600c071aa25cc94bfe9b605a918
BLAKE2b-256 7e44c305e22db1d8811ededbbad847f5c45403a1ac98319e1cc4a2a5771d1229

See more details on using hashes here.

File details

Details for the file abersetz-1.0.25-py3-none-any.whl.

File metadata

  • Download URL: abersetz-1.0.25-py3-none-any.whl
  • Upload date:
  • Size: 59.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for abersetz-1.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 d9ea9512d81c2abbc413d707d3e1eac80d45c88152d8a1b264a71f3db99e2a24
MD5 2f27e194af04d15675e9a1b9ec847421
BLAKE2b-256 6da0a1787156300574e99dfb66cd86435b76b988b4de3a9436a1e28505fc0282

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page