A simple tool to transform PDF and DOCX to Markdown using marker-pdf
Project description
NuoYi
A simple tool to transform PDF and DOCX to Markdown.
NuoYi uses marker-pdf for high-quality PDF conversion with OCR and layout detection. All processing is done fully offline after the initial model download.
Features
- 9 PDF Engines: marker, mineru, docling, pymupdf, pdfplumber, llamaparse, mathpix, mineru-cloud, doc2x
- PDF to Markdown: High-quality conversion with multiple engine options
- DOCX to Markdown: Native support for Microsoft Word documents
- Automatic GPU/CPU Selection: Detects available VRAM and falls back to CPU if needed
- Smart Engine Selection: Auto-selects the best engine based on available resources
- Batch Processing: Convert entire directories of documents
- GUI Interface: PySide6-based graphical interface for easy batch conversion
- Image Extraction: Automatically extracts and saves images from PDFs
- Multi-language Support: 10 languages including Chinese, English, Japanese, etc.
- Cloud Engines: LlamaParse, Mathpix, MinerU Cloud, Doc2x for zero-GPU environments
Installation
Requires Python 3.10 or higher (marker-pdf requires Python >= 3.10).
From PyPI
pip install nuoyi
With GUI support
pip install nuoyi[gui]
With NVIDIA CUDA support (IMPORTANT for GPU users)
If you encounter CUBLAS_STATUS_NOT_INITIALIZED errors when using GPU, install the CUDA libraries:
pip install nuoyi[cuda]
Or manually:
pip install nvidia-cublas nvidia-cuda-runtime nvidia-cufft nvidia-cusolver nvidia-cusparse nvidia-curand nvidia-cuda-nvrtc nvidia-nvtx
Why is this needed? PyTorch's CUDA packages sometimes don't include all required NVIDIA libraries. The nvidia-* packages ensure complete CUDA library installation for marker-pdf to work properly.
Full installation with all features
pip install nuoyi[all-cuda]
From source
git clone https://github.com/cycleuser/NuoYi.git
cd NuoYi
pip install -e .
macOS Installation Notes
marker-pdf fully supports macOS (both Intel and Apple Silicon). On macOS, PyTorch is installed automatically without CUDA. Apple Silicon Macs can use MPS acceleration via --device mps.
If you encounter torch installation issues on macOS, install the CPU-only version of PyTorch first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install nuoyi
AMD ROCm GPU Setup (Linux)
NuoYi supports AMD Radeon GPUs (RX 5000/6000/7000 series) on Linux via ROCm.
Supported GPUs:
- RX 7900 XTX/XT, RX 7800/7700/7600 (RDNA 3)
- RX 6900/6800/6700/6600 (RDNA 2)
- RX 5700/5600/5500 (RDNA)
- ⚠️ RX 580/590 (Polaris) are NOT supported by ROCm
Step 1: Create a dedicated conda environment
conda create -n rocm python=3.12 -y
conda activate rocm
Step 2: Install ROCm PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
Verify: python -c "import torch; print(torch.version.hip)" should output 6.2.xxxxx
Step 3: Install NuoYi (without touching torch)
⚠️ IMPORTANT: Do NOT use pip install -e ".[dev]" as it will replace ROCm torch with CUDA version.
# Install NuoYi without dependencies
pip install --no-deps -e .
# Install marker-pdf without dependencies
pip install --no-deps marker-pdf
# Install remaining dependencies (no-deps to avoid torch replacement)
pip install pydantic python-docx PyMuPDF Pillow flask pytest ruff \
python-dotenv rapidfuzz "regex>=2024.4.28,<2025.0.0" \
"scikit-learn>=1.6.1,<2.0.0" tqdm "transformers>=4.45.2,<5.0.0" \
"Pillow>=10.1.0,<11.0.0" google-genai markdown2 markdownify \
"openai>=1.65.2,<2.0.0" pdftext pre-commit pydantic-settings \
surya-ocr "opencv-python-headless==4.11.0.86" --no-deps
Step 4: Run with ROCm
# Single file
nuoyi input.pdf --device rocm -o output.md
# Batch conversion
nuoyi ./papers --batch --device rocm --output ./output
NuoYi automatically configures ROCm environment variables (HSA_ENABLE_SDMA=0, TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1, and auto-detected HSA_OVERRIDE_GFX_VERSION).
For detailed troubleshooting, see AMD_ROCM_SETUP.md.
PDF Engines
NuoYi supports 9 PDF conversion engines:
Local Engines (Free, Offline)
| Engine | Install | GPU | OCR | Models | Best For |
|---|---|---|---|---|---|
| marker | pip install marker-pdf |
Recommended | Yes | ~3GB | Best quality overall |
| mineru | pip install magic-pdf[full] |
Optional | Yes | ~1.5GB | Chinese documents |
| docling | pip install docling |
Optional | Yes | ~1.5GB | Balanced quality |
| pymupdf | pip install pymupdf4llm |
No | No | None | Fastest, digital PDFs |
| pdfplumber | pip install pdfplumber |
No | No | None | Tables, lightweight |
Cloud Engines (API Key Required)
| Engine | Install | Best For | API Key |
|---|---|---|---|
| llamaparse | pip install llama-parse |
Excellent quality | LLAMA_CLOUD_API_KEY |
| mathpix | pip install requests |
Math/science documents | MATHPIX_APP_ID + MATHPIX_APP_KEY |
| mineru-cloud | pip install requests |
Chinese docs (online) | MINERU_API_KEY |
| doc2x | pip install requests |
Formulas, LaTeX | DOC2X_API_KEY |
Engine Selection
# Auto-select (default: best available engine)
nuoyi paper.pdf
# Use specific engine
nuoyi paper.pdf --engine mineru # Great for Chinese
nuoyi paper.pdf --engine docling # Balanced quality
nuoyi paper.pdf --engine pymupdf # Fastest, no GPU
nuoyi paper.pdf --engine doc2x # Cloud, best formulas
nuoyi paper.pdf --engine mineru-cloud # Cloud, Chinese docs
# No GPU? Use lightweight engines
nuoyi paper.pdf --engine pymupdf # Digital PDFs, fastest
nuoyi paper.pdf --engine pdfplumber # Tables, lightweight
nuoyi paper.pdf --engine doc2x # Cloud, no local models needed
Usage
Command Line Interface
# Convert a single PDF file
nuoyi paper.pdf
# Specify output file
nuoyi paper.pdf -o output/result.md
# Convert a DOCX file
nuoyi document.docx -o document.md
# Batch convert all files in a directory
nuoyi ./papers --batch
# Batch convert with custom output directory
nuoyi ./papers --batch -o ./output
# Force CPU mode (for low VRAM GPUs)
nuoyi paper.pdf --device cpu
# Force OCR even for digital PDFs
nuoyi paper.pdf --force-ocr
# Specify page range
nuoyi paper.pdf --page-range "0-5,10,15-20"
# Specify languages
nuoyi paper.pdf --langs "zh,en,ja"
# Disable OCR models for digital PDFs (saves ~1.5GB VRAM)
nuoyi paper.pdf --disable-ocr-models
# Low VRAM mode for 4-6GB GPUs
nuoyi paper.pdf --low-vram
GUI Mode
nuoyi --gui
The GUI provides:
- Directory selection for input/output
- File list with status tracking
- Device selection (auto/CPU/CUDA)
- Force OCR option
- Page range and language configuration
- Real-time progress and logging
Startup interface:
Select input directory:
Configure device and options:
Conversion result (viewed in VS Code):
Python API
from nuoyi import MarkerPDFConverter, DocxConverter
# Convert PDF (full models, ~3GB VRAM)
pdf_converter = MarkerPDFConverter(
force_ocr=False,
langs="zh,en",
device="auto" # or "cpu", "cuda", "mps"
)
markdown_text, images = pdf_converter.convert_file("input.pdf")
# Convert PDF (minimal models for digital PDFs, ~1.5GB VRAM)
pdf_converter_minimal = MarkerPDFConverter(
disable_ocr_models=True, # Saves ~1.5GB VRAM
langs="zh,en",
device="auto"
)
markdown_text, images = pdf_converter_minimal.convert_file("digital.pdf")
# Convert PDF (low VRAM mode)
pdf_converter_low_vram = MarkerPDFConverter(
low_vram=True,
langs="zh,en",
device="auto"
)
markdown_text, images = pdf_converter_low_vram.convert_file("input.pdf")
# Convert DOCX
docx_converter = DocxConverter()
markdown_text = docx_converter.convert_file("input.docx")
Supported Languages
| Code | Language |
|---|---|
zh |
Chinese / 中文 |
en |
English |
ja |
Japanese / 日本語 |
fr |
French / Français |
ru |
Russian / Русский |
de |
German / Deutsch |
es |
Spanish / Español |
pt |
Portuguese / Português |
it |
Italian / Italiano |
ko |
Korean / 한국어 |
Use nuoyi --list-langs to see the full list. Default: zh,en.
Command Line Options
| Option | Description |
|---|---|
input |
Input PDF/DOCX file or directory (with --batch) |
-o, --output |
Output file path (single file) or directory (batch mode) |
--force-ocr |
Force OCR even for digital PDFs with embedded text |
--page-range |
Page range to convert, e.g. '0-5,10,15-20' |
--langs |
Comma-separated languages (default: zh,en). See --list-langs |
--list-langs |
List all supported languages and exit |
--batch |
Process all PDF/DOCX files in the input directory |
--device |
Device for model inference: auto (default), cpu, cuda, or mps |
--low-vram |
Enable low VRAM mode for 4-6GB GPUs |
--disable-ocr-models |
Disable OCR models for digital PDFs (~1.5GB VRAM saved) |
--gui |
Launch PySide6 GUI mode |
-V, --version |
Show version and exit |
Cloud Engines
NuoYi supports 4 cloud-based PDF engines that require no local GPU or models:
# LlamaParse - LlamaIndex cloud service
export LLAMA_CLOUD_API_KEY=your_key
nuoyi paper.pdf --engine llamaparse
# Mathpix - Best for math/scientific documents
export MATHPIX_APP_ID=your_app_id
export MATHPIX_APP_KEY=your_app_key
nuoyi paper.pdf --engine mathpix
# MinerU Cloud - Excellent for Chinese documents
export MINERU_API_KEY=your_key
nuoyi paper.pdf --engine mineru-cloud
# Doc2x - Best for formulas, supports PDF/DOCX/PPTX
export DOC2X_API_KEY=your_key
nuoyi paper.pdf --engine doc2x
Large PDFs (>50 pages) are automatically split into chunks for cloud processing.
Memory Management
NuoYi automatically manages GPU memory:
- Auto mode (default): Detects available VRAM and uses GPU if sufficient (>6GB free)
- CPU mode: Forces CPU processing (slower but no VRAM limit)
- CUDA mode: Forces GPU processing (may OOM on large PDFs)
- MPS mode: For Apple Silicon Macs
Low VRAM Options
For GPUs with limited VRAM (4-6GB):
-
Use
--low-vramflag: Enables aggressive memory optimizationnuoyi paper.pdf --low-vram
-
Disable OCR models (for digital PDFs only): Saves ~1.5GB VRAM
nuoyi paper.pdf --disable-ocr-models
⚠️ Warning: This disables OCR features. Only suitable for:
- Digital PDFs with embedded text (not scanned documents)
- PDFs without complex tables requiring OCR
- PDFs without mathematical formulas requiring OCR
-
Use CPU mode: No VRAM limitation but slower
nuoyi paper.pdf --device cpu
-
Use pymupdf engine: Fast, no GPU required
nuoyi paper.pdf --engine pymupdf
If CUDA out of memory occurs during conversion, NuoYi automatically retries with aggressive memory cleanup.
Dependencies
Required
marker-pdf>=1.0.0- PDF conversion enginePyMuPDF>=1.23.0- PDF page countingpython-docx>=0.8.11- DOCX conversionPillow>=9.0.0- Image processing
Optional
PySide6>=6.5.0- GUI support (install withpip install nuoyi[gui])
Model Download
Download Location
Models are downloaded automatically on first run and stored in:
~/.cache/huggingface/hub/
The models are from Hugging Face and include:
vikp/surya_det- Layout detection modelvikp/surya_rec- Text recognition modelvikp/surya_order- Reading order model- Other marker-pdf related models
Total size: approximately 2-3 GB.
For Users in China
Hugging Face may be blocked or slow in mainland China due to GFW. You can use a mirror:
# Set Hugging Face mirror (add to ~/.bashrc or run before nuoyi)
export HF_ENDPOINT=https://hf-mirror.com
# Then run nuoyi normally
nuoyi paper.pdf
Alternatively, you can download models manually and place them in the cache directory.
Custom Model Path
The current version does not support custom model paths to keep the tool simple and avoid configuration complexity. Models are always stored in the default Hugging Face cache location.
Notes
- After initial model download, everything works fully offline
- Use
--device cpuif you encounter CUDA out of memory errors - Legacy
.docformat is not supported; convert to.docxfirst
Agent Integration (OpenAI Function Calling)
NuoYi exposes OpenAI-compatible tools for LLM agents:
from nuoyi.tools import TOOLS, dispatch
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
result = dispatch(
tool_call.function.name,
tool_call.function.arguments,
)
CLI Help
License
GPL-3.0 License - see LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- marker-pdf - The excellent PDF conversion engine
- surya - OCR and layout detection models
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nuoyi-0.4.17.tar.gz.
File metadata
- Download URL: nuoyi-0.4.17.tar.gz
- Upload date:
- Size: 66.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55d3bb1410218bb98d72a71ab7f4d6e8a35bd67e335f0a3095f84bebd2eaa918
|
|
| MD5 |
13853dc86d571ec3478044ea38906e2c
|
|
| BLAKE2b-256 |
56106a20bd22556d4497c5d609b7c7d0b120bb8e8ff7c818bd77a5cf91d26039
|
File details
Details for the file nuoyi-0.4.17-py3-none-any.whl.
File metadata
- Download URL: nuoyi-0.4.17-py3-none-any.whl
- Upload date:
- Size: 70.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cc2832b08586efdd8456a5bcd878a0409136b53ca4d1f3409f65b3113119d84
|
|
| MD5 |
bc9664ea7ac3663ac4e9b9fc803aaabe
|
|
| BLAKE2b-256 |
19a3a22e3b53e645ba724e6f5ac178625a4103ee4c3e811f871ae6c29a2244df
|