Skip to main content

Utilities for MinerU Vision-Language models

Project description

mineru-vl-utils

A Python package for interacting with the MinerU Vision-Language Model.

It's a lightweight wrapper that simplifies the process of sending requests and handling responses from the MinerU Vision-Language Model.

About Backends

We provides 4 different backends(deployment modes):

  1. http-client: A HTTP client for interacting with the OpenAI-compatible model server.
  2. transformers: A backend for using HuggingFace Transformers models. (slow but simple to install)
  3. vllm-engine: A backend for using the VLLM synchronous batching engine.
  4. vllm-async-engine: A backend for using the VLLM asynchronous engine. (requires async programming)

About Output Format

MinerU Vision-Language Model can handle document layout detection and text/table/equation recognition tasks in a same model.

The output of the model is a list of ContentBlock objects, each representing a detected block in the document with its content recognition results.

Each ContentBlock contains the following attributes:

  • type (str): The type of the block, e.g., 'text', 'image', 'table', 'equation'.
    • For a complete list of supported block types, please refer to structs.py.
  • bbox (list of floats): The bounding box of the block in the format [xmin, ymin, xmax, ymax], with coordinates normalized to the range [0, 1].
  • angle (int or None): The rotation angle of the block, can be one of [0, 90, 180, 270].
    • 0 means upward.
    • 90 means rightward.
    • 180 means upside down.
    • 270 means leftward.
    • None means the angle is not specified.
  • content (str or None): The recognized content of the block, if applicable.
    • For 'text' blocks, this is the recognized text.
    • For 'table' blocks, this is the recognized table in HTML format.
    • For 'equation' blocks, this is the recognized LaTeX code.
    • For 'image' blocks, this is None.

Installation

For http-client backend, just install the package via pip:

pip install mineru-vl-utils

For transformers backend, install the package with the transformers extra:

pip install mineru-vl-utils[transformers]

For vllm-engine and vllm-async-engine backend, install the package with the vllm extra:

pip install mineru-vl-utils[vllm]

Notice:

  • For using the http-client backend, you still need to have another vllm(or other LLM deployment tool) environment to server the model as a http server.

Serving the Model (Optional)

This is only needed if you want to use the http-client backend.

You can use vllm or another LLM deployment tool to serve the model. Here we only demonstrate how to use vllm to serve the model.

vllm serve opendatalab/MinerU2.5-2509-1.2B --host 127.0.0.1 --port 8000

Using MinerUClient by Code

Now you can use the MinerUClient class to interact with the model. Following are examples of using different backends.

http-client Example

from PIL import Image
from mineru_vl_utils import MinerUClient

client = MinerUClient(
    backend="http-client",
    server_url="http://127.0.0.1:8000"
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

transformers Example

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
from mineru_vl_utils import MinerUClient

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "opendatalab/MinerU2.5-2509-1.2B",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "opendatalab/MinerU2.5-2509-1.2B",
    use_fast=True
)

client = MinerUClient(
    backend="transformers",
    model=model,
    processor=processor
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

vllm-engine Example

from vllm import LLM
from PIL import Image
from mineru_vl_utils import MinerUClient

llm = LLM(model="opendatalab/MinerU2.5-2509-1.2B")

client = MinerUClient(
    backend="vllm-engine",
    vllm_llm=llm
)

image = Image.open("/path/to/the/test/image.png")
extracted_blocks = client.two_step_extract(image)
print(extracted_blocks)

vllm-async-engine Example

import io
import asyncio
import aiofiles

from vllm.v1.engine.async_llm import AsyncLLM
from vllm.engine.arg_utils import AsyncEngineArgs
from PIL import Image
from mineru_vl_utils import MinerUClient

async_llm = AsyncLLM.from_engine_args(
    AsyncEngineArgs("opendatalab/MinerU2.5-2509-1.2B")
)

client = MinerUClient(
  backend="vllm-async-engine",
  vllm_async_llm=async_llm,
)

async def main():
    image_path = "/path/to/the/test/image.png"
    async with aiofiles.open(image_path, "rb") as f:
        image_data = await f.read()
    image = Image.open(io.BytesIO(image_data))
    extracted_blocks = await client.aio_two_step_extract(image)
    print(extracted_blocks)

asyncio.run(main())

async_llm.shutdown()

Other APIs

Besides the two_step_extract method, MinerUClient also provides other APIs for interacting with the model. Following are the main APIs:

class MinerUClient:

    def layout_detect(self, image: Image.Image) -> list[ContentBlock]:
        ...

    def batch_layout_detect(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

    async def aio_layout_detect(self, image: Image.Image) -> list[ContentBlock]:
        ...

    async def aio_batch_layout_detect(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

    def two_step_extract(self, image: Image.Image) -> list[ContentBlock]:
        ...

    def batch_two_step_extract(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

    async def aio_two_step_extract(self, image: Image.Image) -> list[ContentBlock]:
        ...

    async def aio_batch_two_step_extract(self, images: list[Image.Image]) -> list[list[ContentBlock]]:
        ...

Limitations

The transformers backend is slow and not suitable for production use.

The MinerUClient only supports standalone image(s) as input. PDF and DOCX files are not planned to be supported. Cross-page and cross-document operations are not planned to be supported, too.

For production use cases, please use MinerU, which is a more complete toolkit for document analyzing and data extraction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mineru_vl_utils-0.1.5.tar.gz (40.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mineru_vl_utils-0.1.5-py3-none-any.whl (45.8 kB view details)

Uploaded Python 3

File details

Details for the file mineru_vl_utils-0.1.5.tar.gz.

File metadata

  • Download URL: mineru_vl_utils-0.1.5.tar.gz
  • Upload date:
  • Size: 40.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for mineru_vl_utils-0.1.5.tar.gz
Algorithm Hash digest
SHA256 6c032e2eb61eacab49ac46b364ed91f81f541a3794ab0f8e175757789e9d196e
MD5 1c1cc2e8ec60cf3dad8e97019a27ec21
BLAKE2b-256 2f91eaeea86d32aa3cf4d9c3d4564c599e819be3ef78745385bf1b64d9a0d039

See more details on using hashes here.

File details

Details for the file mineru_vl_utils-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for mineru_vl_utils-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a2a582533ae4abffb09e794b9354ab8699b9845d34dcea065cb1f5cf7b15485f
MD5 e4cddef745babb1c8e3bc38ae4b9d2ab
BLAKE2b-256 dc0163231dd9695312d640a4c1b25cd6a6e30f56aeb2b4af5bc6c7154fd567b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page