No project description provided
Project description
abersetz
A minimalist file translator that uses established machine translation engines while keeping configuration portable and repeatable. It follows a simple pipeline: locate → chunk → translate → merge. Provides both a Python API and a CLI powered by fire.
Why abersetz?
- Translates files, not just strings.
- Supports engines from
translators,deep-translator, and pluggable LLM-based backends for consistent terminology. - Stores engine preferences and credentials using
platformdirs, supporting either raw values or environment variables. - Shares vocabulary across chunks to maintain consistency in long documents.
- Keeps the codebase small: no custom infrastructure, just clear components doing their job.
Key Features
- Recursive file discovery with include/exclude filters.
- Automatic HTML vs. plain-text detection to preserve markup where possible.
- Semantic chunking via
semantic-text-splitter, with per-engine configurable lengths. - Vocabulary-aware translation pipeline that merges
<voc>JSON output from LLM engines. - Dry-run mode for offline testing and demos.
- Optional vocabulary sidecar files when
--save-vocis enabled. - Built-in
abersetz validatecommand that pings configured engines, reports latency, and shows pricing hints from the research catalog. - Optional local MLX/GGUF engines for HY-MT and TranslateGemma when configured (
mthy,gemma).
Installation
pip install abersetz
Quick Start
First-time Setup
# Auto-discover and configure available translation services
abersetz setup
# Test configured engines with a quick validation
abersetz validate --target-lang es
Use abersetz setup --include-community to include community/self-hosted engines like LibreTranslate in the defaults.
This scans your environment for API keys, tests endpoints, and generates an optimized config.
Basic Translation
# Translate using main CLI
abersetz tr pl ./docs --engine tr/google --output ./build/pl
# Or use the shorthand
abtr pl ./docs --engine tr/google --output ./build/pl
CLI Options
to_lang: First positional argument specifying target language.--from-lang: Source language (default:auto).--engine: One of:tr/<provider>(e.g.,tr/google)dt/<provider>(e.g.,dt/deepl)hyll/<profile>where profiles are defined in config.- Legacy selectors like
translators/googlestill work and are auto-normalized.
- Legacy selectors like
--recurse/--no-recurse: Traverse subdirectories (default: on).--write_over: Replace input files instead of writing to output directory.--save-voc: Save merged vocabulary JSON next to each translated file.--chunk-size/--html-chunk-size: Override default chunk lengths.--verbose: Enable debug logging vialoguru.
Extra options for abersetz engines:
--family tr|dt|ll|hy: Filter by engine family.--configured-only: Show only configured engines.
Extra options for abersetz validate:
--selectors tr/google,ll/default: Limit checks to specific engines (comma-separated).--target-lang es: Set validation language (default:es).--sample-text "Hello!": Use custom text for validation.
Configuration
abersetz saves runtime configuration under the user config path from platformdirs. The config file includes:
- Global defaults (engine, languages, chunk sizes)
- Engine-specific settings (endpoints, retry policies, HTML behavior)
- Credential entries, supporting
{ "env": "ENV_NAME" }or{ "value": "actual-secret" }
Example config.toml:
[defaults]
engine = "tr/google"
from_lang = "auto"
to_lang = "en"
chunk_size = 1200
html_chunk_size = 1800
[credentials.siliconflow]
name = "siliconflow"
env = "SILICONFLOW_API_KEY"
[engines.hysf]
chunk_size = 2400
[engines.hysf.credential]
name = "siliconflow"
[engines.hysf.options]
model = "tencent/Hunyuan-MT-7B"
base_url = "https://api.siliconflow.com/v1"
temperature = 0.3
[engines.ullm]
chunk_size = 2400
[engines.ullm.credential]
name = "siliconflow"
[engines.ullm.options.profiles.default]
base_url = "https://api.siliconflow.com/v1"
model = "tencent/Hunyuan-MT-7B"
temperature = 0.3
max_input_tokens = 32000
[engines.ullm.options.profiles.default.prolog]
Local engines (optional):
[engines.mthy]
name = "mthy"
chunk_size = 1200
[engines.mthy.options]
backend = "mlx" # or "gguf"
model_path = "/path/to/HY-MT1.5-7B-8bit"
[engines.gemma]
name = "gemma"
chunk_size = 1200
[engines.gemma.options]
backend = "gguf" # or "mlx"
model_path = "/path/to/translategemma-27b-it-Q8_0.gguf"
n_ctx = 4096
n_gpu_layers = -1
Use abersetz config show and abersetz config path to inspect the file.
CLI Tools
abersetz: Main CLI exposingtr(translate),validate, andconfigcommands.abtr: Shorthand for translation (abersetz tr).
Python API
from abersetz import translate_path, TranslatorOptions
translate_path(
path="docs",
options=TranslatorOptions(to_lang="de", engine="tr/google"),
)
Examples
The examples/ folder includes ready-to-run demos:
poem_en.txt: Source text.poem_pl.txt: Translated sample.vocab.json: Vocabulary generated during translation.walkthrough.md: Step-by-step CLI usage log.validate_report.sh: Captures validation summary for quick audits.
Development Workflow
uv sync
python -m pytest --cov=. --cov-report=term-missing
ruff check src tests
ruff format src tests
Testing Philosophy
- Unit tests cover every helper directly.
- Integration tests simulate the full pipeline with a stub engine.
- Network calls are mocked; CI never touches real APIs.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abersetz-1.0.22.tar.gz.
File metadata
- Download URL: abersetz-1.0.22.tar.gz
- Upload date:
- Size: 336.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54fedb86512c05502cbbc53190929aba9d57b1f9c9841ebb1c86f0317a0fa445
|
|
| MD5 |
900e73b70d8473b35f9f6c59d5c8605d
|
|
| BLAKE2b-256 |
48673a87856955330e408eb7c927ecb5fcd745d5bfc1e689b70c0dbbd7b627e6
|
File details
Details for the file abersetz-1.0.22-py3-none-any.whl.
File metadata
- Download URL: abersetz-1.0.22-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b45925665cfa2dde0d51bd7b22d846d1d08c3ae6be4347e78d925eeadd0260b8
|
|
| MD5 |
ea7bbf04132b2c29dc77b06d1976e9fa
|
|
| BLAKE2b-256 |
fde8b89ff1773afb921fcfea735304a1e0177255820fb5534618cb6a24eaabb4
|