Convert PDFs with MinerU and translate the generated Markdown into Simplified Chinese.
Project description
MDTrans
简体中文 | English
Overview
Convert any English PDF document into Chinese Markdown.
MDTrans supports both digitally generated PDFs and scanned PDFs. It preserves the document structure from the source PDF, including text, headings, lists, and tables, and translates the content into Simplified Chinese. Because translation is performed with larger LLM context windows, MDTrans usually achieves better document-level consistency and overall translation quality than typical short-context translation tools. The final translated Markdown file is written beside the original Markdown with a .zh.md suffix.
How It Works
MDTrans first calls the official mineru CLI from an async Python subprocess to convert a PDF into Markdown, including scanned-document workflows supported by MinerU, then uses LangChain and an OpenAI-compatible chat model to translate the generated Markdown into Simplified Chinese with larger document context rather than fragmented sentence-by-sentence translation.
Requirements
- A GPU capable of running MinerU, ideally with at least 16 GB of VRAM
Installation
uv tool install mdtrans
Configuration
The configuration file is located at ~/.config/mdtrans/config.toml. If it does not exist, MDTrans creates a template for you on first run.
[llm]
base_url = "https://api.deepseek.com"
model = "deepseek-chat"
context_window = 64000
max_output_tokens = 8000
max_chunk_tokens = 5000
The mimo-flash model from Xiaomi is a good default choice when available, with a strong balance between translation quality and speed.
Usage
MDTrans relies on an OpenAI-compatible API, so you must export OPENAI_API_KEY before running it:
export OPENAI_API_KEY="your-api-key"
mdtrans /path/to/input.pdf /path/to/output-dir
The tool runs in this order:
- Accept the source PDF path as the first positional argument
- Accept the output directory as the second positional argument
- Run
mineru -p <selected-pdf> -o <output-dir> -b hybrid-auto-engine - Discover the generated Markdown files under the chosen output directory
- Write translated Chinese copies as
*.zh.mdbeside the original Markdown files
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdtrans-0.1.0.tar.gz.
File metadata
- Download URL: mdtrans-0.1.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d138aaad808a362df2dd63c485a3f57dd78c392cf5e7cc9b17db997ae30389d0
|
|
| MD5 |
e279d662c6768d67378d63b05207ad46
|
|
| BLAKE2b-256 |
0af8c93542855ab5f1d72cd37187c51c1a513d7db6f3c27aaee3235eea47ab9b
|
File details
Details for the file mdtrans-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mdtrans-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b55dd4486c9513fdb2b76fc69c821b483c685ed3a555d1dcb8359f3ec7aa6962
|
|
| MD5 |
dfadbe85de8f3ddfdfd8875cb67ec979
|
|
| BLAKE2b-256 |
525f032c955f53b67359c0a2fce95a4e031b3ab1037c4224c376c45214b0ec1e
|