Skip to main content

Translate university lectures between languages using the Claude API

Project description

translate-lecture

Translates university lectures between languages using the Claude API. All visual formatting and styling is preserved exactly — only the text changes.

Note: The current release supports PowerPoint (.pptx) files only. Support for additional formats — including Jupyter notebooks and LaTeX files — is planned for future releases.

Features

  • Translates every text element: slide bodies, tables, speaker notes, grouped shapes, and elements inside AlternateContent XML wrappers
  • Fully config-driven: one JSON file per lecture controls the language pair, file list, and all terminology
  • Auto-generates an initial config by analysing your PPTX files with Claude
  • Respects domain-specific terminology: keep technical terms in the source language, force specific translations, never translate acronyms or author names
  • Batched API calls with automatic retry and exponential backoff on rate-limit errors
  • Post-processing fixes compound-word spacing artifacts (e.g. ClusteringAlgorithmenClustering-Algorithmen)

Setup

Prerequisites: Python 3.10+ and an Anthropic API key.

Export your API key once per shell session before running any commands:

export ANTHROPIC_API_KEY=<your-key>

Option A — install from PyPI

pip install translate-lecture

Option B — clone and install locally

git clone https://github.com/...
cd translate-lecture
pip install -e .

Both options install dependencies and register two commands:

translate-lecture my_lecture.json
translate-lecture-init-config lecture1.pptx --target-language German

A virtual environment is recommended for either option:

python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

Workflow

1. Generate an initial config

Point init-config at your source PPTX files and specify the target language. Claude analyses the slide text and produces a JSON config pre-populated with terminology for your domain.

translate-lecture-init-config \
    lecture1.pptx lecture2.pptx \
    --target-language German \
    --output my_lecture.json

The generated config uses claude-sonnet-4-6 by default. Override with --model if needed.

2. Review and edit the config

Open the generated JSON and adjust the term lists to match your lecture's conventions. See Config reference below for what each field does.

3. Translate

translate-lecture my_lecture.json

Each source file is translated and saved to its configured target path. Slide counts are verified after saving.

Audit mode

Preview all extracted text without making any API calls:

translate-lecture my_lecture.json --audit

Selecting a model

The default translation model is claude-haiku-4-5-20251001 (fast and cheap). Override for higher quality:

translate-lecture my_lecture.json --model claude-sonnet-4-6

Config reference

{
  "lecture": {
    "description": "machine learning and AI lecture",
    "source_language": "English",
    "target_language": "German"
  },
  "model": {
    "batch_size": 40
  },
  "files": [
    {"source": "01_Introduction.pptx", "target": "01_Introduction_DE.pptx"},
    {"source": "05_Clustering.pptx",   "target": "05_Clustering_DE.pptx"}
  ],
  "translation": {
    "skip_exact": [...],
    "keep_in_source_language": [...],
    "forced_translations": {...},
    "spacing_fix_prefixes": [...]
  }
}
Field Description
lecture.description Used in the system prompt to set the translation context
lecture.source_language Source language name, e.g. "English"
lecture.target_language Target language name, e.g. "German"
model.batch_size Texts per API call (default: 40)
files Source/target PPTX pairs, resolved relative to the config file's directory
translation.skip_exact Strings never sent for translation: acronyms, author names, single chars, digits
translation.keep_in_source_language Multi-word technical terms kept in the source language (listed in the system prompt)
translation.forced_translations Exact source→target mappings included in the system prompt
translation.spacing_fix_prefixes Source-language prefixes that get incorrectly merged with target-language words; a hyphen is inserted automatically

PPTX file paths are resolved relative to the config file's directory, so the config and slides can live anywhere on disk.

Example config

We recommend starting with the generated config from translate-lecture-init-config and adjusting the term lists as needed. Below, you can see how the generated file a machine learning lecture might look.

{
  "lecture": {
    "description": "machine learning and AI lecture",
    "source_language": "English",
    "target_language": "German"
  },
  "model": {
    "batch_size": 40,
    "retry_batch_size": 25
  },
  "files": [
    {"source": "01_Introduction.pptx",      "target": "01_Introduction_DE.pptx"},
    {"source": "02_Foundations-of-ML.pptx", "target": "02_Foundations-of-ML_DE.pptx"}
  ],
  "translation": {
    "skip_exact": [
      "Prof. Dr. Steffen Herbold",
      "DBSCAN", "k-means", "SVM", "CNN", "LSTM", "LLM",
      "k", "n", "d", "m",
      "+", "-", "0", "1", "2"
    ],
    "keep_in_source_language": [
      "Deep Learning", "Clustering", "Backpropagation",
      "Gradient Descent", "Overfitting", "Underfitting",
      "Random Forest", "Neural Network", "Decision Tree",
      "Training", "Test", "Validation", "Feature", "Label",
      "Precision", "Recall", "F1", "Bias", "Variance"
    ],
    "forced_translations": {
      "Artificial Intelligence": "Künstliche Intelligenz (KI)",
      "Classification":          "Klassifikation",
      "Introduction":            "Einführung",
      "Overview":                "Überblick",
      "Summary":                 "Zusammenfassung",
      "Algorithm":               "Algorithmus"
    },
    "spacing_fix_prefixes": [
      "Clustering", "Learning", "Training", "Boosting", "Bagging"
    ]
  }
}

skip_exact — strings passed through unchanged without any API call. Use this for acronyms, author names, single-letter variables, digits, and operators.

keep_in_source_language — multi-word technical terms that stay in English even inside German text. They are listed in the system prompt so the model knows not to translate them.

forced_translations — explicit source→target pairs injected into the system prompt. Use this when a term has one unambiguous well-known translation.

spacing_fix_prefixes — English prefixes that the model sometimes merges with the following German word (e.g. ClusteringAlgorithmen). A hyphen is inserted automatically to produce Clustering-Algorithmen.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translate_lecture-0.1.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

translate_lecture-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file translate_lecture-0.1.0.tar.gz.

File metadata

  • Download URL: translate_lecture-0.1.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for translate_lecture-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1360d37eda708a5cef912ce6202ee8ce5dcec6d5a733a191673719dd6d949d35
MD5 11e519b253f2396a2047ee06b6b85a98
BLAKE2b-256 f5ced80fa04bef62c8d10ec0d969a2fe4367537a285165c26c6d8e309cbae35a

See more details on using hashes here.

File details

Details for the file translate_lecture-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for translate_lecture-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a054919f3dbe8a4d97173c3cacfe93027cc431320c266ae8de79c5e11ef9f1c
MD5 fcb2374c6db2045e18c57e74dc519e72
BLAKE2b-256 4613e64f9bf1cc5822da8671655b829fb5f74d517adebd8836d3c550d72ed1b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page