Skip to main content

Fast Inference Architecture for MinerU

Project description

Flash-MinerU ⚡️📄

image

issue resolution pr resolution

PyPI version PyPI - Python Version PyPI - Downloads PyPI Downloads Ask DeepWiki

简体中文 | English

Accelerating the VLM Inference Pipeline of the open-source PDF parsing project MinerU with Ray

Flash-MinerU is a lightweight and low-intrusion acceleration project. Its goal is to leverage Ray’s parallel and distributed capabilities to parallelize and accelerate the most time-consuming stage in MinerU — the VLM (Vision-Language Model) inference stage — thereby significantly improving the overall throughput of PDF → Markdown processing.

This project is positioned as a parallelization and engineering accelerator, rather than a reimplementation of MinerU’s core algorithms. Its design goals include:

  • Minimal dependencies, lightweight installation
    • One-click install & run via pip install flash-mineru
    • Tested in domestic computing environments such as METAX
  • Maximum reuse of MinerU’s original logic and data structures
    • Preserving algorithmic behavior and output consistency
  • Multi-GPU / multi-process / multi-cluster friendly
    • Designed for large-scale batch PDF processing, easy to scale up

✨ Features

  • 🚀 Ray-based parallel inference
    PDF pages / images are sliced into batches and dispatched to multiple Ray actors for parallel execution

  • 🧠 VLM inference acceleration
    Focuses on the VLM inference stage in MinerU; currently defaults to vLLM for high-throughput inference

  • 🧩 Low-intrusion design
    Retains MinerU’s original intermediate structures (middle_json) and Markdown generation logic


🎯 How pipeline parallelism helps

MinerU’s PDF→Markdown path is a multi-stage pipeline (e.g. page rendering → VLM → Markdown). If every batch must finish all stages before the next batch starts, workers and GPUs wait on each other—that shows up as idle gaps (“bubbles”) on a timeline and under-used accelerators. Flash-MinerU (default MineruEngine) overlaps several logical batches across those stages: while one batch sits in VLM, another can be rendering or writing Markdown, so compute stays busier end-to-end without changing MinerU’s operators.

Left — bubble schedule (before)
Per-batch serialization; visible GPU idle gaps.

Timeline: pipeline bubbles, GPU not fully utilized
Right — pipelined (Flash-MinerU)
Overlapped batches; GPUs keep working.

Timeline: pipeline parallelism, better GPU utilization

📦 Installation

Basic installation (lightweight mode)

Suitable if you have already installed the inference backend manually (e.g., vLLM), or are using an image with a prebuilt environment:

pip install flash-mineru

Install with vLLM backend enabled (optional)

If you want Flash-MinerU to install vLLM as the inference backend for you:

pip install flash-mineru[vllm]

🚀 Quickstart

Minimal Python API example

from flash_mineru import MineruEngine

# Path to PDFs
pdfs = [
    "resnet.pdf",
    "yolo.pdf",
    "text2sql.pdf",
]

engine = MineruEngine(
    model="<path_to_local>/MinerU2.5-2509-1.2B",
    # Model can be downloaded from https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
    batch_size=2,              # PDFs per logical batch
    replicas=3,                # Parallel vLLM / model instances
    num_gpus_per_replica=0.5, # Fraction of GPU memory per instance (vLLM KV cache)
    save_dir="outputs_mineru", # Output directory for parsed results
    inflight=4,                # Pipeline parallelism depth (v1.0.0 default path; try 8 on large hosts)
)

# Legacy v0.0.4 sequential batching (deprecated): from flash_mineru import MineruEngineLegacy

results = engine.run(pdfs)
print(results)  # list[list[str]], dir name of the output files

Output structure

  • Each PDF’s parsing results will be generated under:

    <save_dir>/<pdf_name>/
    
  • The Markdown file is located by default at:

    <save_dir>/<pdf_name>/vlm/<pdf_name>.md
    

📊 Benchmark

Scripts: English · 简体中文

Results (368 PDFs, ~8× A100 class machine)

Method Inference configuration Total time
Flash-MinerU v1.0.0 MineruEngine, 8 replicas, inflight 8 ~8.5 min
MinerU (vanilla) Eight hand-spawned mineru processes (this repo’s Benchmark-mineru.py parallel mode, one GPU per process, vlm-auto-engine) ~14 min
Flash-MinerU v0.0.4 MineruEngineLegacy, 8 replicas × 1 GPU, batch size 16 ~23 min
MinerU (vanilla) vLLM, single GPU ~65 min

Commands: docs/BENCHMARK.md.

Summary

  • v1.0.0 is about ~1.7× faster wall time than the eight-process baseline (~8.5 min vs ~14 min)
  • v0.0.4 (MineruEngineLegacy) is slower than that baseline (~23 min), which highlights what pipeline parallelism adds versus “many full stacks in parallel”
  • ~65 min single-GPU is the same-corpus reference baseline
Experimental setup (expand)
  • Dataset: 23 paper PDFs (≈9–37 pages each) × 16 copies → 368 files; default folder test/sample_pdfs
  • Versions: MinerU v2.7.5; Flash-MinerU v0.0.4 = MineruEngineLegacy (sequential stages per batch); v1.0.0 = MineruEngine (pipeline parallelism, default API)
  • Hardware: single host, 8 × NVIDIA A100

Note: Throughput-focused. Output shape matches MinerU. Upstream does not ship a polished official multi-GPU “one click” path; the eight-process row is our benchmark script sharding eight separate mineru runs.


🗺️ Roadmap

  • Benchmark scripts & docs — docs/BENCHMARK.md
  • Support for more inference backends (e.g., sglang)
  • Service-oriented deployment (HTTP API / task queue)
  • Sample datasets and more comprehensive documentation

🤝 Acknowledgements

  • MinerU This project is built upon MinerU’s overall algorithm design and engineering practices, and parallelizes its VLM inference pipeline. The mineru_core/ directory contains code logic copied from and adapted to the MinerU project. We extend our sincere respect and gratitude to the original authors and all contributors of MinerU. 🔗 Official repository / homepage: https://github.com/opendatalab/MinerU

  • Ray Provides powerful abstractions for distributed and parallel computing, making multi-GPU and multi-process orchestration simpler and more reliable. 🔗 Official website: https://www.ray.io/ 🔗 Official GitHub: https://github.com/ray-project/ray

  • vLLM Provides a high-throughput, production-ready inference engine (currently the default backend). 🔗 Official website: https://vllm.ai/ 🔗 Official GitHub: https://github.com/vllm-project/vllm


📜 License

AGPL-3.0

Notes: The mineru_core/ directory in this project contains derivative code based on MinerU (AGPL-3.0). In accordance with the AGPL-3.0 license requirements, this repository as a whole is released under AGPL-3.0 as a derivative work. For details, please refer to the root LICENSE file and mineru_core/README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_mineru-1.0.0.tar.gz (332.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_mineru-1.0.0-py3-none-any.whl (395.5 kB view details)

Uploaded Python 3

File details

Details for the file flash_mineru-1.0.0.tar.gz.

File metadata

  • Download URL: flash_mineru-1.0.0.tar.gz
  • Upload date:
  • Size: 332.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flash_mineru-1.0.0.tar.gz
Algorithm Hash digest
SHA256 479c4fea22f9115148c0b87b7daf8a95a2b6a1a6608a748c6f1a7f24b985aa89
MD5 62e4e85211f4b94b8009b47862304514
BLAKE2b-256 b894639c6d7edb2f1c20cf334eae99d6ca45d19d9262e1142eec8245c6390de3

See more details on using hashes here.

Provenance

The following attestation bundles were made for flash_mineru-1.0.0.tar.gz:

Publisher: python-publish.yml on OpenDCAI/Flash-MinerU

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flash_mineru-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: flash_mineru-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 395.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flash_mineru-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8656d79fedad33d48f374b4692e00697b0abc456c5ead68301b6d8a5420970c8
MD5 719f314cb98cf376cd9db079916e2513
BLAKE2b-256 c6146f994c721156f85cfdb7a52b6176382ceeb6d9ebc36dc4db0666f1be79e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for flash_mineru-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on OpenDCAI/Flash-MinerU

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page