Skip to main content

No project description provided

Project description

abersetz

A minimalist file translator that uses established machine translation engines while keeping configuration portable and repeatable. It follows a simple pipeline: locate → chunk → translate → merge. Provides both a Python API and a CLI powered by fire.

Why abersetz?

  • Translates files, not just strings.
  • Supports engines from translators, deep-translator, and pluggable LLM-based backends for consistent terminology.
  • Stores engine preferences and credentials using platformdirs, supporting either raw values or environment variables.
  • Shares vocabulary across chunks to maintain consistency in long documents.
  • Keeps the codebase small: no custom infrastructure, just clear components doing their job.

Key Features

  • Recursive file discovery with include/exclude filters.
  • Automatic HTML vs. plain-text detection to preserve markup where possible.
  • Semantic chunking via semantic-text-splitter, with per-engine configurable lengths.
  • Vocabulary-aware translation pipeline that merges <voc> JSON output from LLM engines.
  • Dry-run mode for offline testing and demos.
  • Optional vocabulary sidecar files when --save-voc is enabled.
  • Built-in abersetz validate command that pings configured engines, reports latency, and shows pricing hints from the research catalog.
  • Optional local MLX/GGUF engines for HY-MT and TranslateGemma when configured (mthy, gemma).

Installation

pip install abersetz

Quick Start

First-time Setup

# Auto-discover and configure available translation services
abersetz setup

# Test configured engines with a quick validation
abersetz validate --target-lang es

Use abersetz setup --include-community to include community/self-hosted engines like LibreTranslate in the defaults.

This scans your environment for API keys, tests endpoints, and generates an optimized config.

Basic Translation

# Translate using main CLI
abersetz tr pl ./docs --engine tr/google --output ./build/pl

# Or use the shorthand
abtr pl ./docs --engine tr/google --output ./build/pl

CLI Options

  • to_lang: First positional argument specifying target language.
  • --from-lang: Source language (default: auto).
  • --engine: One of:
    • tr/<provider> (e.g., tr/google)
    • dt/<provider> (e.g., dt/deepl)
    • hy
    • ll/<profile> where profiles are defined in config.
      • Legacy selectors like translators/google still work and are auto-normalized.
  • --recurse/--no-recurse: Traverse subdirectories (default: on).
  • --write_over: Replace input files instead of writing to output directory.
  • --save-voc: Save merged vocabulary JSON next to each translated file.
  • --chunk-size / --html-chunk-size: Override default chunk lengths.
  • --verbose: Enable debug logging via loguru.

Extra options for abersetz engines:

  • --family tr|dt|ll|hy: Filter by engine family.
  • --configured-only: Show only configured engines.

Extra options for abersetz validate:

  • --selectors tr/google,ll/default: Limit checks to specific engines (comma-separated).
  • --target-lang es: Set validation language (default: es).
  • --sample-text "Hello!": Use custom text for validation.

Configuration

abersetz saves runtime configuration under the user config path from platformdirs. The config file includes:

  • Global defaults (engine, languages, chunk sizes)
  • Engine-specific settings (endpoints, retry policies, HTML behavior)
  • Credential entries, supporting { "env": "ENV_NAME" } or { "value": "actual-secret" }

Example config.toml:

[defaults]
engine = "tr/google"
from_lang = "auto"
to_lang = "en"
chunk_size = 1200
html_chunk_size = 1800

[credentials.siliconflow]
name = "siliconflow"
env = "SILICONFLOW_API_KEY"

[engines.hysf]
chunk_size = 2400

[engines.hysf.credential]
name = "siliconflow"

[engines.hysf.options]
model = "tencent/Hunyuan-MT-7B"
base_url = "https://api.siliconflow.com/v1"
temperature = 0.3

[engines.ullm]
chunk_size = 2400

[engines.ullm.credential]
name = "siliconflow"

[engines.ullm.options.profiles.default]
base_url = "https://api.siliconflow.com/v1"
model = "tencent/Hunyuan-MT-7B"
temperature = 0.3
max_input_tokens = 32000

[engines.ullm.options.profiles.default.prolog]

Local engines (optional):

[engines.mthy]
name = "mthy"
chunk_size = 1200

[engines.mthy.options]
backend = "mlx" # or "gguf"
model_path = "/path/to/HY-MT1.5-7B-8bit"

[engines.gemma]
name = "gemma"
chunk_size = 1200

[engines.gemma.options]
backend = "gguf" # or "mlx"
model_path = "/path/to/translategemma-27b-it-Q8_0.gguf"
n_ctx = 4096
n_gpu_layers = -1

Use abersetz config show and abersetz config path to inspect the file.

CLI Tools

  • abersetz: Main CLI exposing tr (translate), validate, and config commands.
  • abtr: Shorthand for translation (abersetz tr).

Python API

from abersetz import translate_path, TranslatorOptions

translate_path(
    path="docs",
    options=TranslatorOptions(to_lang="de", engine="tr/google"),
)

Examples

The examples/ folder includes ready-to-run demos:

  • poem_en.txt: Source text.
  • poem_pl.txt: Translated sample.
  • vocab.json: Vocabulary generated during translation.
  • walkthrough.md: Step-by-step CLI usage log.
  • validate_report.sh: Captures validation summary for quick audits.

Development Workflow

uv sync
python -m pytest --cov=. --cov-report=term-missing
ruff check src tests
ruff format src tests

Testing Philosophy

  • Unit tests cover every helper directly.
  • Integration tests simulate the full pipeline with a stub engine.
  • Network calls are mocked; CI never touches real APIs.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abersetz-1.0.22.tar.gz (336.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abersetz-1.0.22-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file abersetz-1.0.22.tar.gz.

File metadata

  • Download URL: abersetz-1.0.22.tar.gz
  • Upload date:
  • Size: 336.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for abersetz-1.0.22.tar.gz
Algorithm Hash digest
SHA256 54fedb86512c05502cbbc53190929aba9d57b1f9c9841ebb1c86f0317a0fa445
MD5 900e73b70d8473b35f9f6c59d5c8605d
BLAKE2b-256 48673a87856955330e408eb7c927ecb5fcd745d5bfc1e689b70c0dbbd7b627e6

See more details on using hashes here.

File details

Details for the file abersetz-1.0.22-py3-none-any.whl.

File metadata

  • Download URL: abersetz-1.0.22-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for abersetz-1.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 b45925665cfa2dde0d51bd7b22d846d1d08c3ae6be4347e78d925eeadd0260b8
MD5 ea7bbf04132b2c29dc77b06d1976e9fa
BLAKE2b-256 fde8b89ff1773afb921fcfea735304a1e0177255820fb5534618cb6a24eaabb4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page