Skip to main content

No project description provided

Project description


this_file: README.md

abersetz

Minimalist file translator that reuses proven machine translation engines while keeping configuration portable and repeatable. The tool walks through a simple locate → chunk → translate → merge pipeline and exposes both a Python API and a fire-powered CLI.

Why abersetz?

  • Focuses on translating files, not single strings.
  • Reuses stable engines from translators and deep-translator, plus pluggable LLM-based engines for consistent terminology.
  • Persists engine preferences and API secrets with platformdirs, supporting either raw values or the environment variable that stores them.
  • Shares vocabulary between chunks so long documents stay consistent.
  • Keeps a lean codebase: no custom infrastructure, just clear building blocks.

Key Features

  • Recursive file discovery with include/exclude filters.
  • Automatic HTML vs. plain-text detection to preserve markup when possible.
  • Semantic chunking via semantic-text-splitter, with configurable lengths per engine.
  • Vocabulary-aware translation pipeline that merges <vocabulary> JSON emitted by LLM engines.
  • Offline-friendly dry-run mode for testing and demos.
  • Optional vocabulary sidecar files when --save-voc is set.

Installation

pip install abersetz

Quick Start

# Using the main CLI
abersetz tr ./docs --to-lang pl --engine translators/google --output ./build/pl

# Or using the shorthand command
abtr ./docs --to-lang pl --engine translators/google --output ./build/pl

CLI Options (preview)

  • --from-lang: source language (defaults to auto).
  • --to-lang: target language (default en).
  • --engine: one of
    • translators/<provider> (e.g. translators/google)
    • deep-translator/<provider> (e.g. deep-translator/deepl)
    • hysf
    • ullm/<profile> where profiles are defined in config.
  • --recurse/--no-recurse: recurse into subdirectories (defaults to on).
  • --overwrite: replace input files instead of writing to output dir.
  • --save-voc: drop merged vocabulary JSON next to each translated file.
  • --chunk-size / --html-chunk-size: override default chunk lengths.
  • --verbose: enable debug logging via loguru.

Configuration

abersetz stores runtime configuration under the user config path determined by platformdirs. The config file keeps:

  • Global defaults (engine, languages, chunk sizes).
  • Engine-specific settings (API endpoints, retry policies, HTML behaviour).
  • Credential entries, each allowing either { "env": "ENV_NAME" } or { "value": "actual-secret" }.

Example snippet (stored in config.json):

{
  "defaults": {
    "engine": "translators/google",
    "from_lang": "auto",
    "to_lang": "en",
    "chunk_size": 1200,
    "html_chunk_size": 1800
  },
  "credentials": {
    "siliconflow": {"env": "SILICONFLOW_API_KEY"}
  },
  "engines": {
    "hysf": {
      "chunk_size": 2400,
      "credential": {"name": "siliconflow"},
      "options": {
        "model": "tencent/Hunyuan-MT-7B",
        "base_url": "https://api.siliconflow.com/v1",
        "temperature": 0.3
      }
    },
    "ullm": {
      "chunk_size": 2400,
      "credential": {"name": "siliconflow"},
      "options": {
        "profiles": {
          "default": {
            "base_url": "https://api.siliconflow.com/v1",
            "model": "tencent/Hunyuan-MT-7B",
            "temperature": 0.3,
            "max_input_tokens": 32000,
            "prolog": {}
          }
        }
      }
    }
  }
}

Use abersetz config show and abersetz config path to inspect the file.

CLI Tools

  • abersetz: Main CLI with tr (translate) and config commands
  • abtr: Direct translation shorthand (equivalent to abersetz tr)

Python API

from abersetz import translate_path, TranslatorOptions

translate_path(
    path="docs",
    options=TranslatorOptions(to_lang="de", engine="translators/google"),
)

Examples

The examples/ directory holds ready-to-run demos:

  • poem_en.txt: source text.
  • poem_pl.txt: translated sample output.
  • vocab.json: vocabulary generated during translation.
  • walkthrough.md: step-by-step CLI invocation log.

Development Workflow

uv sync
python -m pytest --cov=. --cov-report=term-missing
ruff check src tests
ruff format src tests

Testing Philosophy

  • Every helper has direct unit coverage.
  • Integration tests exercise the pipeline with a stub engine.
  • Network calls are mocked; real APIs are never hit in CI.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abersetz-1.0.9.tar.gz (216.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abersetz-1.0.9-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file abersetz-1.0.9.tar.gz.

File metadata

  • Download URL: abersetz-1.0.9.tar.gz
  • Upload date:
  • Size: 216.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for abersetz-1.0.9.tar.gz
Algorithm Hash digest
SHA256 e34bb8a2f16175a3cb02a75027238ca024d394d122d52442b1cdfca7620fa9a1
MD5 dd64e35afe793074f4370c6103ab4d02
BLAKE2b-256 ac14ca35c3264072bab1aaaf4cb0bc4bde4c51fd96e3d432b50f81a7da5bdddf

See more details on using hashes here.

File details

Details for the file abersetz-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: abersetz-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for abersetz-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 571cfe24e92e0a6f6a90df5d0bc5cb1d0dc84231173e93561ec46fd4d82bb0a2
MD5 b1846d9a09536e6188dc8db11eee53ea
BLAKE2b-256 65afe507234f47ae0806e0661c1f9db9b3a52cf0fc5968a38108d0d6f80d32c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page