Utilities for MinerU Vision-Language models

These details have not been verified by PyPI

Project links

Project description

mineru-vl-utils

A Python package for interacting with the MinerU Vision-Language Model.

It's a lightweight wrapper that simplifies the process of sending requests and handling responses from the MinerU Vision-Language Model.

About Backends

We provides 6 different backends(deployment modes):

http-client: A HTTP client for interacting with the OpenAI-compatible model server.
transformers: A backend for using HuggingFace Transformers models. (slow but simple to install)
mlx-engine: A backend for using Apple Silicon devices with macOS.
lmdeploy-engine: A backend for using the LmDeploy engine.
vllm-engine: A backend for using the VLLM synchronous batching engine.
vllm-async-engine: A backend for using the VLLM asynchronous engine. (requires async programming)

About Output Format

MinerU Vision-Language Model can handle document layout detection and text/table/equation recognition tasks in a same model.

The output of the model is a list of ContentBlock objects, each representing a detected block in the document with its content recognition results.

Each ContentBlock contains the following attributes:

type (str): The type of the block, e.g., 'text', 'image', 'table', 'equation'.
- For a complete list of supported block types, please refer to structs.py.
bbox (list of floats): The bounding box of the block in the format [xmin, ymin, xmax, ymax], with coordinates normalized to the range [0, 1].
angle (int or None): The rotation angle of the block, can be one of [0, 90, 180, 270].
- 0 means upward.
- 90 means rightward.
- 180 means upside down.
- 270 means leftward.
- None means the angle is not specified.
content (str or None): The recognized content of the block, if applicable.
- For 'text' blocks, this is the recognized text.
- For 'table' blocks, this is the recognized table in HTML format.
- For 'equation' blocks, this is the recognized LaTeX code.
- For 'image' blocks, this is None.

Installation

For http-client backend, just install the package via pip:

pip install -U mineru-vl-utils

For transformers backend, install the package with the transformers extra:

pip install -U "mineru-vl-utils[transformers]"

For vllm-engine and vllm-async-engine backend, install the package with the vllm extra:

pip install -U "mineru-vl-utils[vllm]"

For mlx-engine backend, install the package with the mlx extra:

pip install -U "mineru-vl-utils[mlx]"

For lmdeploy-engine backend, install the package with the lmdeploy extra:

pip install -U "mineru-vl-utils[lmdeploy]"

[!NOTE] For using the http-client backend, you still need to have another vllm(or other LLM deployment tool) environment to serve the model as a http server.

Serving the Model (Optional)

This is only needed if you want to use the http-client backend.

You can use vllm or another LLM deployment tool to serve the model. Here we only demonstrate how to use vllm to serve the model.

With vllm>=0.10.1, you can use following command to serve the model. The logits processor is used to support no_repeat_ngram_size sampling param, which can help the model to avoid generating repeated content.

vllm serve opendatalab/MinerU2.5-2509-1.2B --host 127.0.0.1 --port 8000 \
  --logits-processors mineru_vl_utils:MinerULogitsProcessor

If you are using vllm<0.10.1, no_repeat_ngram_size sampling param is not supported. You still can serve the model without logits processor:

vllm serve opendatalab/MinerU2.5-2509-1.2B --host 127.0.0.1 --port 8000

Using `MinerUClient` by Code

Now you can use the MinerUClient class to interact with the model. Following are examples of using different backends.

`http-client` Example

from PIL import Image
from mineru_vl_utils import MinerUClient

client = MinerUClient(
    backend="http-client",
    server_url="http://127.0.0.1:8000"
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

`transformers` Example

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
from mineru_vl_utils import MinerUClient

# for transformers>=4.56.0
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "opendatalab/MinerU2.5-2509-1.2B",
    dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "opendatalab/MinerU2.5-2509-1.2B",
    use_fast=True
)

client = MinerUClient(
    backend="transformers",
    model=model,
    processor=processor
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

If you used an old version of transformers(transformers<4.56.0), you need to use torch_dtype instead of dtype.

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "opendatalab/MinerU2.5-2509-1.2B",
    torch_dtype="auto",
    device_map="auto"
)

`mlx-engine` Example

from mlx_vlm import load as mlx_load
from PIL import Image
from mineru_vl_utils import MinerUClient

model, processor = mlx_load("opendatalab/MinerU2.5-2509-1.2B")

client = MinerUClient(
    backend="mlx-engine",
    model=model,
    processor=processor
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

`lmdeploy-engine` Example

For default inference engine(turbomind by now).

from lmdeploy.serve.vl_async_engine import VLAsyncEngine
from mineru_vl_utils import MinerUClient
from PIL import Image

if __name__ == "__main__":
    lmdeploy_engine = VLAsyncEngine("opendatalab/MinerU2.5-2509-1.2B")

    client = MinerUClient(
        backend="lmdeploy-engine",
        lmdeploy_engine=lmdeploy_engine,
    )

    image = Image.open("/path/to/the/test/image.png")
    extracted_blocks = client.two_step_extract(image)
    print(extracted_blocks)

For pytorch inference engine and ascend accelerator.

from lmdeploy import PytorchEngineConfig
from lmdeploy.serve.vl_async_engine import VLAsyncEngine
from mineru_vl_utils import MinerUClient
from PIL import Image

if __name__ == "__main__":
    lmdeploy_engine = VLAsyncEngine(
        "opendatalab/MinerU2.5-2509-1.2B",
        backend="pytorch",
        backend_config=PytorchEngineConfig(
            device_type="ascend",
        ),
    )

    client = MinerUClient(
        backend="lmdeploy-engine",
        lmdeploy_engine=lmdeploy_engine,
    )

    image = Image.open("/path/to/the/test/image.png")
    extracted_blocks = client.two_step_extract(image)
    print(extracted_blocks)

`vllm-engine` Example

from vllm import LLM
from PIL import Image
from mineru_vl_utils import MinerUClient
from mineru_vl_utils import MinerULogitsProcessor  # if vllm>=0.10.1

llm = LLM(
    model="opendatalab/MinerU2.5-2509-1.2B",
    logits_processors=[MinerULogitsProcessor]  # if vllm>=0.10.1
)

client = MinerUClient(
    backend="vllm-engine",
    vllm_llm=llm
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

`vllm-async-engine` Example

import io
import asyncio
import aiofiles

from vllm.v1.engine.async_llm import AsyncLLM
from vllm.engine.arg_utils import AsyncEngineArgs
from PIL import Image
from mineru_vl_utils import MinerUClient
from mineru_vl_utils import MinerULogitsProcessor  # if vllm>=0.10.1

async_llm = AsyncLLM.from_engine_args(
    AsyncEngineArgs(
        model="opendatalab/MinerU2.5-2509-1.2B",
        logits_processors=[MinerULogitsProcessor]  # if vllm>=0.10.1
    )
)

client = MinerUClient(
  backend="vllm-async-engine",
  vllm_async_llm=async_llm,
)

async def main():
    image_path = "/path/to/the/test/image.png"
    async with aiofiles.open(image_path, "rb") as f:
        image_data = await f.read()
    image = Image.open(io.BytesIO(image_data))
    extracted_blocks = await client.aio_two_step_extract(image)
    print(extracted_blocks)

asyncio.run(main())

async_llm.shutdown()

Other APIs

Besides the two_step_extract method, MinerUClient also provides other APIs for interacting with the model. Following are the main APIs:

class MinerUClient:

    def layout_detect(self, image: Image.Image) -> list[ContentBlock]:
        ...

    def batch_layout_detect(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

    async def aio_layout_detect(self, image: Image.Image) -> list[ContentBlock]:
        ...

    async def aio_batch_layout_detect(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

    def two_step_extract(self, image: Image.Image) -> list[ContentBlock]:
        ...

    def batch_two_step_extract(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

    async def aio_two_step_extract(self, image: Image.Image) -> list[ContentBlock]:
        ...

    async def aio_batch_two_step_extract(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

Limitations

The transformers backend is slow and not suitable for production use.

The MinerUClient only supports standalone image(s) as input. PDF and DOCX files are not planned to be supported. Cross-page and cross-document operations are not planned to be supported, too.

For production use cases, please use MinerU, which is a more complete toolkit for document analyzing and data extraction.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.3

May 28, 2026

1.0.2

May 26, 2026

1.0.1

May 25, 2026

1.0.0

May 21, 2026

0.2.8

May 19, 2026

0.2.7

May 7, 2026

0.2.6

Apr 17, 2026

0.2.5

Apr 17, 2026

0.2.4

Apr 17, 2026

0.2.3

Apr 13, 2026

0.2.2

Apr 13, 2026

0.2.1

Apr 11, 2026

0.2.0

Apr 10, 2026

0.1.22

Jan 22, 2026

0.1.21

Jan 15, 2026

0.1.20

Jan 9, 2026

0.1.19.1

Dec 25, 2025

0.1.19

Dec 24, 2025

This version

0.1.18.1

Dec 23, 2025

0.1.18

Dec 15, 2025

0.1.17.1 yanked

Dec 14, 2025

Reason this release was yanked:

http-client has bug

0.1.17

Nov 26, 2025

0.1.16

Nov 18, 2025

0.1.15

Oct 30, 2025

0.1.14

Oct 20, 2025

0.1.13

Oct 9, 2025

0.1.12

Sep 29, 2025

0.1.11

Sep 23, 2025

0.1.10

Sep 21, 2025

0.1.9

Sep 21, 2025

0.1.8

Sep 19, 2025

0.1.7

Sep 19, 2025

0.1.6

Sep 18, 2025

0.1.5

Sep 17, 2025

0.1.4

Sep 17, 2025

0.1.3

Sep 16, 2025

0.1.2

Sep 15, 2025

0.1.1

Sep 13, 2025

0.1.0

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mineru_vl_utils-0.1.18.1-py3-none-any.whl (58.6 kB view details)

Uploaded Dec 23, 2025 Python 3

File details

Details for the file mineru_vl_utils-0.1.18.1-py3-none-any.whl.

File metadata

Download URL: mineru_vl_utils-0.1.18.1-py3-none-any.whl
Upload date: Dec 23, 2025
Size: 58.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mineru_vl_utils-0.1.18.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6595cd5361d0e22d6c39a89749d719115a3941e4312b7c0e6e386b341bf7b5f`
MD5	`3c0e67ee0bcd47f22ab0f8a9337f5221`
BLAKE2b-256	`fdb71c0f8f080710f33ad5920123a4bcd0f19f712d4d77970466bbfcde8e8192`

See more details on using hashes here.

mineru-vl-utils 0.1.18.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mineru-vl-utils

About Backends

About Output Format

Installation

Serving the Model (Optional)

Using `MinerUClient` by Code

`http-client` Example

`transformers` Example

`mlx-engine` Example

`lmdeploy-engine` Example

`vllm-engine` Example

`vllm-async-engine` Example

Other APIs

Limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

mineru-vl-utils 0.1.18.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mineru-vl-utils

About Backends

About Output Format

Installation

Serving the Model (Optional)

Using MinerUClient by Code

http-client Example

transformers Example

mlx-engine Example

lmdeploy-engine Example

vllm-engine Example

vllm-async-engine Example

Other APIs

Limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Using `MinerUClient` by Code

`http-client` Example

`transformers` Example

`mlx-engine` Example

`lmdeploy-engine` Example

`vllm-engine` Example

`vllm-async-engine` Example