No project description provided
Project description
this_file: README.md
abersetz
Minimalist file translator that reuses proven machine translation engines while keeping configuration portable and repeatable. The tool walks through a simple locate → chunk → translate → merge pipeline and exposes both a Python API and a fire-powered CLI.
Why abersetz?
- Focuses on translating files, not single strings.
- Reuses stable engines from
translatorsanddeep-translator, plus pluggable LLM-based engines for consistent terminology. - Persists engine preferences and API secrets with
platformdirs, supporting either raw values or the environment variable that stores them. - Shares vocabulary between chunks so long documents stay consistent.
- Keeps a lean codebase: no custom infrastructure, just clear building blocks.
Key Features
- Recursive file discovery with include/exclude filters.
- Automatic HTML vs. plain-text detection to preserve markup when possible.
- Semantic chunking via
semantic-text-splitter, with configurable lengths per engine. - Vocabulary-aware translation pipeline that merges
<vocabulary>JSON emitted by LLM engines. - Offline-friendly dry-run mode for testing and demos.
- Optional vocabulary sidecar files when
--save-vocis set.
Installation
pip install abersetz
Quick Start
# Using the main CLI
abersetz tr pl ./docs --engine translators/google --output ./build/pl
# Or using the shorthand command
abtr pl ./docs --engine translators/google --output ./build/pl
CLI Options (preview)
to_lang: first positional argument selecting the target language.--from-lang: source language (defaults toauto).--engine: one oftranslators/<provider>(e.g.translators/google)deep-translator/<provider>(e.g.deep-translator/deepl)hysfullm/<profile>where profiles are defined in config.
--recurse/--no-recurse: recurse into subdirectories (defaults to on).--overwrite: replace input files instead of writing to output dir.--save-voc: drop merged vocabulary JSON next to each translated file.--chunk-size/--html-chunk-size: override default chunk lengths.--verbose: enable debug logging via loguru.
Configuration
abersetz stores runtime configuration under the user config path determined by platformdirs. The config file keeps:
- Global defaults (engine, languages, chunk sizes).
- Engine-specific settings (API endpoints, retry policies, HTML behaviour).
- Credential entries, each allowing either
{ "env": "ENV_NAME" }or{ "value": "actual-secret" }.
Example snippet (stored in config.toml):
[defaults]
engine = "translators/google"
from_lang = "auto"
to_lang = "en"
chunk_size = 1200
html_chunk_size = 1800
[credentials.siliconflow]
name = "siliconflow"
env = "SILICONFLOW_API_KEY"
[engines.hysf]
chunk_size = 2400
[engines.hysf.credential]
name = "siliconflow"
[engines.hysf.options]
model = "tencent/Hunyuan-MT-7B"
base_url = "https://api.siliconflow.com/v1"
temperature = 0.3
[engines.ullm]
chunk_size = 2400
[engines.ullm.credential]
name = "siliconflow"
[engines.ullm.options.profiles.default]
base_url = "https://api.siliconflow.com/v1"
model = "tencent/Hunyuan-MT-7B"
temperature = 0.3
max_input_tokens = 32000
[engines.ullm.options.profiles.default.prolog]
Use abersetz config show and abersetz config path to inspect the file.
CLI Tools
abersetz: Main CLI withtr(translate) andconfigcommandsabtr: Direct translation shorthand (equivalent toabersetz tr)
Python API
from abersetz import translate_path, TranslatorOptions
translate_path(
path="docs",
options=TranslatorOptions(to_lang="de", engine="translators/google"),
)
Examples
The examples/ directory holds ready-to-run demos:
poem_en.txt: source text.poem_pl.txt: translated sample output.vocab.json: vocabulary generated during translation.walkthrough.md: step-by-step CLI invocation log.
Development Workflow
uv sync
python -m pytest --cov=. --cov-report=term-missing
ruff check src tests
ruff format src tests
Testing Philosophy
- Every helper has direct unit coverage.
- Integration tests exercise the pipeline with a stub engine.
- Network calls are mocked; real APIs are never hit in CI.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abersetz-1.0.12.tar.gz.
File metadata
- Download URL: abersetz-1.0.12.tar.gz
- Upload date:
- Size: 233.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daf0d30568a978fe62953fe6bb5d127b5a5ce31de43eb7482f3ec0d22b514a8b
|
|
| MD5 |
14566f0a567b2812ff4c79e834d345f7
|
|
| BLAKE2b-256 |
622b1f9735a448010beb5ae1b611ab0e5102b14f729142a56b661eaf1e8dbd3c
|
File details
Details for the file abersetz-1.0.12-py3-none-any.whl.
File metadata
- Download URL: abersetz-1.0.12-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
233251f2d9cb1dcc5a482f22a2063c802514bb7bb5b62729af9f87b0f94b9ed1
|
|
| MD5 |
152e323647288567bb29948bb4e9796c
|
|
| BLAKE2b-256 |
d3d2b66cf7d004ccdf0241edc01f5c1545d701bb72cf2e137a1b6a3cc17293ec
|