Skip to main content

Practical Python client for MinerU Precision and Agent parsing APIs

Project description

MinerU Python client

A practical wrapper around MinerU's asynchronous APIs, upgraded for production-style usage.

What it handles for you:

  • local file upload via MinerU signed URLs
  • remote URL submission
  • async polling until completion
  • Agent Lightweight API and Precision API
  • HTML routing to MinerU-HTML for precision mode
  • optional markdown download for Agent tasks
  • precision result zip download + auto-unzip
  • easy access to full.md, full.html, layout.json, content/model JSON paths
  • callback checksum generation and verification helpers

Installation

pip install mineru-python-client

Or from source:

git clone https://github.com/JimEverest/mineru-python-client.git
cd mineru-python-client
pip install -e .

Files

  • mineru_client.py — main client implementation
  • run_mineru_demo.py — CLI-style example runner
  • tests/test_mineru_client.py — unit tests using a fake HTTP session

Quick start

from mineru_client import MinerUClient

client = MinerUClient(token='YOUR_TOKEN', poll_interval=5, timeout=600, request_timeout=60)
result = client.precision_parse_local_files(
    ['/path/to/document.pdf'],
    extra_formats=['html'],
)
print(result[0].full_zip_url)

Production bundle example

This is the easiest production-style path for local files because it:

  • uploads
  • waits for completion
  • downloads the zip
  • extracts it
  • gives you direct file paths
from mineru_client import MinerUClient

client = MinerUClient(token='YOUR_TOKEN', poll_interval=5, timeout=600)
bundle = client.precision_parse_local_bundle(
    '/path/to/document.pdf',
    output_dir='./mineru_output',
    extra_formats=['html'],
)

print(bundle.zip_path)
print(bundle.extract_dir)
print(bundle.markdown_path)
print(bundle.html_path)
print(bundle.layout_path)

Callback signature verification

from mineru_client import build_callback_checksum, verify_callback_signature

checksum = build_callback_checksum(uid, seed, content)
assert verify_callback_signature(uid, seed, content, checksum)

CLI examples

Precision local file:

MINERU_TOKEN=*** python3 run_mineru_demo.py \
  --mode precision-local \
  --input '/path/to/document.pdf' \
  --poll-interval 5 \
  --timeout 600 \
  --request-timeout 60 \
  --extra-format html

Precision local bundle download + unzip:

MINERU_TOKEN=*** python3 run_mineru_demo.py \
  --mode precision-local-bundle \
  --input '/path/to/document.pdf' \
  --bundle-output-dir './mineru_output' \
  --poll-interval 5 \
  --timeout 600 \
  --extra-format html

Agent local file:

python3 run_mineru_demo.py \
  --mode agent-local \
  --input '/path/to/small.pdf' \
  --download-markdown

API notes

  • Precision API requires a token.
  • Agent API does not require a token, but is limited to small single files and does not support HTML.
  • MinerU parsing is asynchronous; this wrapper uploads/submits first, then polls until done or failed.
  • Precision local uploads use /api/v4/file-urls/batch even for a single file.
  • Agent local uploads use /api/v1/agent/parse/file and then PUT the file to the returned signed URL.
  • The wrapper validates local files before creating remote tasks.
  • The wrapper requires HTTPS for signed upload URLs and result URLs.
  • Duplicate local basenames are automatically assigned unique data_id values.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mineru_python_client-0.1.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mineru_python_client-0.1.0-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file mineru_python_client-0.1.0.tar.gz.

File metadata

  • Download URL: mineru_python_client-0.1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mineru_python_client-0.1.0.tar.gz
Algorithm Hash digest
SHA256 72ea8b737b04f899d7810ff2b9968bef65a9b69952549bb34ba96c9ca2b31dba
MD5 7fc2168bab558d141aa5ce41bccb0d46
BLAKE2b-256 a7611dfc0cbd75cd128fa6b4e6ad638efbd9809589703c06923f4b77507a69c2

See more details on using hashes here.

File details

Details for the file mineru_python_client-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mineru_python_client-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 164e46588480ec00dfada2190c5b97af26df63a8916e95e5a7c7e0d1ea44a982
MD5 c85f886341a6cd0ff24e5727d7e4e445
BLAKE2b-256 a4b5cd35423b8d549f55ca9fc68c2f1ed6e0c9f9ce7d0dedeb46f9b6c197bc9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page