A tool to translate markdown files using GPT-4

These details have been verified by PyPI

Project links

homepage

GitHub Statistics

Maintainers

tcapelle

These details have not been verified by PyPI

Project description

gpt_translate: Translating MD files with GPT-4

This is a tool to translate Markdown files without breaking the structure of the document. It is powered by OpenAI models and has multiple parsing and formatting options. The provided default example is the one we use to translate our documentation website docs.wandb.ai to japanese and korean.

You can click here to see the output of the translation on the screenshot above.

Installation

We have a stable version on PyPI, so you can install it with pip:

$ pip install gpt-translate

or to get latest version from the repo:

$ cd gpt_translate
$ pip install .

Export your OpenAI API key:

export OPENAI_API_KEY=aa-proj-bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Usage

The library provides a set of commands that you can access as CLI. All the commands start by gpt_translate.:

gpt_translate.file: Translate a single file
gpt_translate.folder: Translate a folder recursively
gpt_translate.files: Translate a list of files, accepts .txt list of files as input.
gpt_translate.eval: Evaluate the quality of the translation

Litellm Integration

This project now uses litellm as the default interface for interacting with language models. Instead of calling the OpenAI API directly, all LLM interactions are performed using litellm.acompletion. Key features include:

Asynchronous LLM Calls: Efficient asynchronous completions via litellm.acompletion.
Pydantic Response Validation: Responses are automatically validated with Pydantic models using model_validate_json, ensuring that outputs conform to expected schemas.
Enhanced Recursive Handling: The longer_create function recursively handles token-limit scenarios by chaining completions.

These improvements simplify the translation pipeline while ensuring robust response validation and improved handling of long outputs.

We use GPT4 by default. You can change this on configs/config.yaml. The dafault values are:

# Logs:
debug: false  # Debug mode
weave_project: "gpt-translate"  # Weave project
silence_openai: true  # Silence OpenAI logger

# Translation:
language: "ja"  # Language to translate to
config_folder: "./configs"  # Config folder, where the prompts and dictionaries are
replace: true  # Replace existing file
remove_comments: true  # Remove comments
do_translate_header_description: true  # Translate the header description
max_concurrent_calls: 7  # Max number of concurrent calls to OpenAI

# Files:
input_file: "docs/intro.md"  # File to translate
out_file: " intro_ja.md"  # File to save the translated file to
input_folder: null  # Folder to translate
out_folder: null  # Folder to save the translated files to
limit: null  # Limit number of files to translate

# Model:
model: "gpt-4o"
temperature: 1.0
max_tokens: 16000

You can override the arguments at runtime or by creating another config.yaml file. You can also use the --config_path flag to specify a different config file.

The --config_folder argument is where the prompts and dictionaries are located, the actual config.yaml could be located somewhere else. Maybe I need a better naming here =P.
You can add new languages by providing the language translation dictionaries in configs/language_dicts

Examples

To translate a single file:

$ gpt_translate.file \
  --input_file README.md \
  --out_file README_es_.md \
  --language es
  --config_folder ./configs

Translate a list of files from list.txt:

$ gpt_translate.files \
  --input_file list.txt \
  --input_folder docs \ 
  --out_folder docs_ja \
  --language ja
  --config_folder ./configs

Note here that we need to pass and input and output folder. This is because we will be using the input folder to get the relative path and create the same folder structure in the output folder. This is tipically what you want for documentation websites that are organized in folders like ./docs.

Translate a full folder recursively:

$ gpt_translate.folder \
  --input_folder docs \
  --out_folder docs_ja \
  --language ja
  --config_folder ./configs

If you don't know what to do, you can always do --help on any of the commands:

$ gpt_translate.* --help

Weave Tracing

The library does a lot! keeping track of every piece of interaction is necessary. We added W&B Weave support to trace every call to the model and underlying processing bits.

You can pass a project name to the CLI to trace the calls:

$ gpt_translate.folder \
  --input_folder docs \
  --output_folder docs_ja \
  --language ja \
  --weave_project gpt-translate
  --config_folder ./configs

Weave Tracing

Evaluation

Once the translation is done, you can evaluate the quality of the translation by running:

$ gpt_translate.eval \
  --eval_dataset "Translation-ja:latest"

You can iterate on the translation prompts and dictionaries to improve the quality of the translation.

Weave Evaluation

The config for the evaluation shares many similarities with the translation config, which is stored in configs/eval_config.yaml. The configs/evaluation_prompt.txt file contains the prompt used by the LLM Judge to evaluate the translation quality. Feel free to play with it to find better ways to evaluate the quality of the translation according to your needs.

Whenever you run gpt_translate.files or gpt_translate.folder, it automatically creates a new Weave Dataset with the name in the format Translation-{language}:{timestamp}.

Weave Dataset

Github Action

We supply an action.yml file to use this library in a Github Action. It is not much tested, but it should work.

You will need to setup your Weights & Biases API key as a secret in your Github repository as WANDB_API_KEY.

An example workflow is shown in https://github.com/tcapelle/dummy_docs and the corresponding workflow file

TroubleShooting

If you have any issue, you can always pass the --debug flag to get more information about what is happening:

$ gpt_translate.folder ... --debug

this will get you a very verbose output (calls to models, inputs and outputs, etc.)

Project details

These details have been verified by PyPI

Project links

homepage

GitHub Statistics

Maintainers

tcapelle

These details have not been verified by PyPI

Release history Release notifications | RSS feed

6.0.0

Oct 8, 2025

This version

5.0.1

Feb 21, 2025

4.0.0

Oct 9, 2024

3.1.0

Jul 30, 2024

3.0.0

Jul 3, 2024

2.0.1

Mar 13, 2024

1.2

Jan 30, 2024

1.1

Jan 17, 2024

1.0

Jan 16, 2024

0.9.0

Jan 16, 2024

0.8.5

Jan 16, 2024

0.8.4

Jan 16, 2024

0.8.3

Jan 16, 2024

0.8.2

Jan 16, 2024

0.8.1

Jan 16, 2024

0.8.0

Jan 16, 2024

0.6.4

Jan 15, 2024

0.6.0

Dec 19, 2023

0.5.1

May 22, 2023

0.0.0

Feb 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpt_translate-5.0.1.tar.gz (2.8 MB view details)

Uploaded Feb 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpt_translate-5.0.1-py3-none-any.whl (19.2 kB view details)

Uploaded Feb 21, 2025 Python 3

File details

Details for the file gpt_translate-5.0.1.tar.gz.

File metadata

Download URL: gpt_translate-5.0.1.tar.gz
Upload date: Feb 21, 2025
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for gpt_translate-5.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2029d3d7cf2ecd7f1423c2ac201c806e86859ae4b9b2051c5f90ff26a287811c`
MD5	`8a59bd01e4a4ab6388a9ed652bec7450`
BLAKE2b-256	`a07a292ddff30b7832f09685e2f6894b51856e62425cf9760a1fbd0afc4be93d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gpt_translate-5.0.1.tar.gz:

Publisher: pypi.yml on tcapelle/gpt_translate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gpt_translate-5.0.1.tar.gz
- Subject digest: 2029d3d7cf2ecd7f1423c2ac201c806e86859ae4b9b2051c5f90ff26a287811c
- Sigstore transparency entry: 173265176
- Sigstore integration time: Feb 21, 2025
Source repository:
- Permalink: tcapelle/gpt_translate@3eb6ae234cbe160ad0bd10f8ca94c26d080a2172
- Branch / Tag: refs/heads/main
- Owner: https://github.com/tcapelle
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@3eb6ae234cbe160ad0bd10f8ca94c26d080a2172
- Trigger Event: workflow_dispatch

File details

Details for the file gpt_translate-5.0.1-py3-none-any.whl.

File metadata

Download URL: gpt_translate-5.0.1-py3-none-any.whl
Upload date: Feb 21, 2025
Size: 19.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for gpt_translate-5.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2df8aeeb2e5ca18dadc63ac81ab2fc3e493e329ab958c4747329dff7b7e1177c`
MD5	`814d34b709d3920ca27b9420d9a8ef4e`
BLAKE2b-256	`53009490d21007e4342f78ab34391ba0d51485d3ccf0ceb4f846536f5724ef00`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gpt_translate-5.0.1-py3-none-any.whl:

Publisher: pypi.yml on tcapelle/gpt_translate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gpt_translate-5.0.1-py3-none-any.whl
- Subject digest: 2df8aeeb2e5ca18dadc63ac81ab2fc3e493e329ab958c4747329dff7b7e1177c
- Sigstore transparency entry: 173265180
- Sigstore integration time: Feb 21, 2025
Source repository:
- Permalink: tcapelle/gpt_translate@3eb6ae234cbe160ad0bd10f8ca94c26d080a2172
- Branch / Tag: refs/heads/main
- Owner: https://github.com/tcapelle
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@3eb6ae234cbe160ad0bd10f8ca94c26d080a2172
- Trigger Event: workflow_dispatch

gpt-translate 5.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

gpt_translate: Translating MD files with GPT-4

Installation

Usage

Litellm Integration

Examples

Weave Tracing

Evaluation

Github Action

TroubleShooting

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance