Skip to main content

Convert PDFs with MinerU and translate the generated Markdown into Simplified Chinese.

Project description

MDTrans

简体中文 | English

Overview

Convert any English PDF document into Chinese Markdown.

MDTrans supports both digitally generated PDFs and scanned PDFs. It preserves the document structure from the source PDF, including text, headings, lists, and tables, and translates the content into Simplified Chinese. Because translation is performed with larger LLM context windows, MDTrans usually achieves better document-level consistency and overall translation quality than typical short-context translation tools. The final translated Markdown file is written beside the original Markdown with a .zh.md suffix.

How It Works

MDTrans first calls the official mineru CLI from an async Python subprocess to convert a PDF into Markdown, including scanned-document workflows supported by MinerU, then uses LangChain and an OpenAI-compatible chat model to translate the generated Markdown into Simplified Chinese with larger document context rather than fragmented sentence-by-sentence translation.

Requirements

  • A GPU capable of running MinerU, ideally with at least 16 GB of VRAM

Installation

uv tool install mdtrans

Configuration

The configuration file is located at ~/.config/mdtrans/config.toml. If it does not exist, MDTrans creates a template for you on first run.

[llm]
base_url = "https://api.deepseek.com"
model = "deepseek-chat"
context_window = 64000
max_output_tokens = 8000
max_chunk_tokens = 5000

The mimo-flash model from Xiaomi is a good default choice when available, with a strong balance between translation quality and speed.

Usage

MDTrans relies on an OpenAI-compatible API, so you must export OPENAI_API_KEY before running it:

export OPENAI_API_KEY="your-api-key"
mdtrans /path/to/input.pdf /path/to/output-dir

The tool runs in this order:

  1. Accept the source PDF path as the first positional argument
  2. Accept the output directory as the second positional argument
  3. Run mineru -p <selected-pdf> -o <output-dir> -b hybrid-auto-engine
  4. Discover the generated Markdown files under the chosen output directory
  5. Write translated Chinese copies as *.zh.md beside the original Markdown files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdtrans-0.1.0.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdtrans-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file mdtrans-0.1.0.tar.gz.

File metadata

  • Download URL: mdtrans-0.1.0.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for mdtrans-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d138aaad808a362df2dd63c485a3f57dd78c392cf5e7cc9b17db997ae30389d0
MD5 e279d662c6768d67378d63b05207ad46
BLAKE2b-256 0af8c93542855ab5f1d72cd37187c51c1a513d7db6f3c27aaee3235eea47ab9b

See more details on using hashes here.

File details

Details for the file mdtrans-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mdtrans-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for mdtrans-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b55dd4486c9513fdb2b76fc69c821b483c685ed3a555d1dcb8359f3ec7aa6962
MD5 dfadbe85de8f3ddfdfd8875cb67ec979
BLAKE2b-256 525f032c955f53b67359c0a2fce95a4e031b3ab1037c4224c376c45214b0ec1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page