Translate university lectures between languages using the Claude API
Project description
translate-lecture
Translates university lectures between languages using the Claude API. All visual formatting and styling is preserved exactly — only the text changes.
Note: The current release supports PowerPoint (
.pptx) files only. Support for additional formats — including Jupyter notebooks and LaTeX files — is planned for future releases.
Features
- Translates every text element: slide bodies, tables, speaker notes, grouped shapes, and elements inside
AlternateContentXML wrappers - Fully config-driven: one JSON file per lecture controls the language pair, file list, and all terminology
- Auto-generates an initial config by analysing your PPTX files with Claude
- Respects domain-specific terminology: keep technical terms in the source language, force specific translations, never translate acronyms or author names
- Batched API calls with automatic retry and exponential backoff on rate-limit errors
- Post-processing fixes compound-word spacing artifacts (e.g.
ClusteringAlgorithmen→Clustering-Algorithmen)
Setup
Prerequisites: Python 3.10+ and an Anthropic API key.
Export your API key once per shell session before running any commands:
export ANTHROPIC_API_KEY=<your-key>
Option A — install from PyPI
pip install translate-lecture
Option B — clone and install locally
git clone https://github.com/...
cd translate-lecture
pip install -e .
Both options install dependencies and register two commands:
translate-lecture my_lecture.json
translate-lecture-init-config lecture1.pptx --target-language German
A virtual environment is recommended for either option:
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
Workflow
1. Generate an initial config
Point init-config at your source PPTX files and specify the target language. Claude analyses
the slide text and produces a JSON config pre-populated with terminology for your domain.
translate-lecture-init-config \
lecture1.pptx lecture2.pptx \
--target-language German \
--output my_lecture.json
The generated config uses claude-sonnet-4-6 by default. Override with --model if needed.
2. Review and edit the config
Open the generated JSON and adjust the term lists to match your lecture's conventions. See Config reference below for what each field does.
3. Translate
translate-lecture my_lecture.json
Each source file is translated and saved to its configured target path. Slide counts are verified after saving.
Audit mode
Preview all extracted text without making any API calls:
translate-lecture my_lecture.json --audit
Selecting a model
The default translation model is claude-haiku-4-5-20251001 (fast and cheap). Override for
higher quality:
translate-lecture my_lecture.json --model claude-sonnet-4-6
Config reference
{
"lecture": {
"description": "machine learning and AI lecture",
"source_language": "English",
"target_language": "German"
},
"model": {
"batch_size": 40
},
"files": [
{"source": "01_Introduction.pptx", "target": "01_Introduction_DE.pptx"},
{"source": "05_Clustering.pptx", "target": "05_Clustering_DE.pptx"}
],
"translation": {
"skip_exact": [...],
"keep_in_source_language": [...],
"forced_translations": {...},
"spacing_fix_prefixes": [...]
}
}
| Field | Description |
|---|---|
lecture.description |
Used in the system prompt to set the translation context |
lecture.source_language |
Source language name, e.g. "English" |
lecture.target_language |
Target language name, e.g. "German" |
model.batch_size |
Texts per API call (default: 40) |
files |
Source/target PPTX pairs, resolved relative to the config file's directory |
translation.skip_exact |
Strings never sent for translation: acronyms, author names, single chars, digits |
translation.keep_in_source_language |
Multi-word technical terms kept in the source language (listed in the system prompt) |
translation.forced_translations |
Exact source→target mappings included in the system prompt |
translation.spacing_fix_prefixes |
Source-language prefixes that get incorrectly merged with target-language words; a hyphen is inserted automatically |
PPTX file paths are resolved relative to the config file's directory, so the config and slides can live anywhere on disk.
Example config
We recommend starting with the generated config from translate-lecture-init-config and adjusting
the term lists as needed. Below, you can see how the generated file a machine learning lecture might look.
{
"lecture": {
"description": "machine learning and AI lecture",
"source_language": "English",
"target_language": "German"
},
"model": {
"batch_size": 40,
"retry_batch_size": 25
},
"files": [
{"source": "01_Introduction.pptx", "target": "01_Introduction_DE.pptx"},
{"source": "02_Foundations-of-ML.pptx", "target": "02_Foundations-of-ML_DE.pptx"}
],
"translation": {
"skip_exact": [
"Prof. Dr. Steffen Herbold",
"DBSCAN", "k-means", "SVM", "CNN", "LSTM", "LLM",
"k", "n", "d", "m",
"+", "-", "0", "1", "2"
],
"keep_in_source_language": [
"Deep Learning", "Clustering", "Backpropagation",
"Gradient Descent", "Overfitting", "Underfitting",
"Random Forest", "Neural Network", "Decision Tree",
"Training", "Test", "Validation", "Feature", "Label",
"Precision", "Recall", "F1", "Bias", "Variance"
],
"forced_translations": {
"Artificial Intelligence": "Künstliche Intelligenz (KI)",
"Classification": "Klassifikation",
"Introduction": "Einführung",
"Overview": "Überblick",
"Summary": "Zusammenfassung",
"Algorithm": "Algorithmus"
},
"spacing_fix_prefixes": [
"Clustering", "Learning", "Training", "Boosting", "Bagging"
]
}
}
skip_exact — strings passed through unchanged without any API call. Use this for
acronyms, author names, single-letter variables, digits, and operators.
keep_in_source_language — multi-word technical terms that stay in English even inside
German text. They are listed in the system prompt so the model knows not to translate them.
forced_translations — explicit source→target pairs injected into the system prompt.
Use this when a term has one unambiguous well-known translation.
spacing_fix_prefixes — English prefixes that the model sometimes merges with the
following German word (e.g. ClusteringAlgorithmen). A hyphen is inserted automatically
to produce Clustering-Algorithmen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file translate_lecture-0.1.0.tar.gz.
File metadata
- Download URL: translate_lecture-0.1.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1360d37eda708a5cef912ce6202ee8ce5dcec6d5a733a191673719dd6d949d35
|
|
| MD5 |
11e519b253f2396a2047ee06b6b85a98
|
|
| BLAKE2b-256 |
f5ced80fa04bef62c8d10ec0d969a2fe4367537a285165c26c6d8e309cbae35a
|
File details
Details for the file translate_lecture-0.1.0-py3-none-any.whl.
File metadata
- Download URL: translate_lecture-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a054919f3dbe8a4d97173c3cacfe93027cc431320c266ae8de79c5e11ef9f1c
|
|
| MD5 |
fcb2374c6db2045e18c57e74dc519e72
|
|
| BLAKE2b-256 |
4613e64f9bf1cc5822da8671655b829fb5f74d517adebd8836d3c550d72ed1b5
|