Official Python SDK for the OCRQueen document extraction API

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Milan_Khati

These details have not been verified by PyPI

Project links

Project description

ocrqueen-python

Official Python SDK for the OCRQueen document and image extraction API.

🚧 Status: Pre-release. APIs and surface area will change before v1.0.0.

Installation

pip install ocrqueen

Requires Python 3.10 or newer.

Supported formats

Category	Formats
Documents	PDF
Presentations	PPTX, PPT (PowerPoint)
Images	PNG, JPEG, WebP, HEIC / HEIF (iPhone photos)

The API returns structured JSON + Markdown for every supported type — text, tables, images, and (with extraction_profile="advanced") diagram graph extraction and image alt-text.

Quickstart

from ocrqueen import OCRQueen

client = OCRQueen(api_key="pk_...")

with open("paper.pdf", "rb") as f:
    job = client.extract.create(file=f)

result = client.jobs.wait(job)
print(result.result["markdown"])

Get an API key from dashboard.ocrqueen.com.

Other file types

# Slide decks — speaker notes are preserved
job = client.extract.create(file=open("pitch.pptx", "rb"))

# iPhone photos — HEIC handled natively, no conversion needed
job = client.extract.create(file=open("receipt.heic", "rb"))

# Scanned document images
job = client.extract.create(file=open("invoice.png", "rb"))

# Deeper extraction profile — diagrams, image alt-text, OCR on
# embedded text
job = client.extract.create(
    file=open("paper.pdf", "rb"),
    profile="advanced",
)

Patent extraction (`domain="patent"`)

Route a PPTX or PDF through the patent-specific pipeline: region classification (cover / abstract / drawings / claims / references), Gemini cover parser, LibreOffice rasterisation for EMF/WMF figures, cross-figure numeral resolution, and an honest per-stage faithfulness_score. Billed flat at $0.05/page regardless of profile.

job = client.extract.create(
    file=open("invention-disclosure.pptx", "rb"),
    options={"domain": "patent"},
)
result = client.jobs.wait(job).result        # response shape changes — discriminator is `domain`
patent = result                              # full PatentExtractionResponse
print(patent["source"]["input_kind"])        # "invention_disclosure" | "published_patent" | "unknown"
print(patent["extraction"]["faithfulness_score"])

# Figures carry a stable proxy URL — never expires until the underlying
# object is purged by your retention window. fetch_image() handles the
# 302 → signed-storage dance for you and returns raw bytes.
for fig in patent["drawings"]:
    bytes_ = client.jobs.fetch_image(fig["image_url"])
    open(f"{fig['figure_number'].replace(' ', '_')}.png", "wb").write(bytes_)

The same fetch_image() helper works for general-domain ImageBlock URLs (pages[].blocks[].url) — useful for snapshotting all figures from a job into your own pipeline.

Documentation

Full API reference: https://ocrqueen.com/docs
Python SDK guide: https://ocrqueen.com/docs/sdks/python
Data retention & deletion: https://ocrqueen.com/docs/data-retention

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Milan_Khati

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

May 21, 2026

0.4.0

May 21, 2026

0.3.2

May 20, 2026

0.3.1

May 20, 2026

0.3.0

May 20, 2026

0.2.1

May 16, 2026

0.2.0

May 16, 2026

0.1.0

May 15, 2026

0.0.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrqueen-0.5.0.tar.gz (71.2 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ocrqueen-0.5.0-py3-none-any.whl (25.0 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file ocrqueen-0.5.0.tar.gz.

File metadata

Download URL: ocrqueen-0.5.0.tar.gz
Upload date: May 21, 2026
Size: 71.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ocrqueen-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`4af4a802014261b1b69232b6b9819be1e1ed9ec798a310bd08ec25fd9c33dd27`
MD5	`568fa78c635ca4e743b9103014963241`
BLAKE2b-256	`ced46a7b9ae532eb8c74e29cdba1962fee26d76adcf3ad3fb1846ebb14f9b77e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ocrqueen-0.5.0.tar.gz:

Publisher: release.yml on ocrqueen/ocrqueen-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ocrqueen-0.5.0.tar.gz
- Subject digest: 4af4a802014261b1b69232b6b9819be1e1ed9ec798a310bd08ec25fd9c33dd27
- Sigstore transparency entry: 1593271308
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: ocrqueen/ocrqueen-python@d61b4c3b6a967aad9e9b306d14b504471b5d6dbf
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/ocrqueen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d61b4c3b6a967aad9e9b306d14b504471b5d6dbf
- Trigger Event: push

File details

Details for the file ocrqueen-0.5.0-py3-none-any.whl.

File metadata

Download URL: ocrqueen-0.5.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ocrqueen-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a69a587599b3eeca9eb8d6f7f960d7b7d06de3197755653bc04c92e78c0a4aff`
MD5	`0920f6df117146545998ba39b829e32b`
BLAKE2b-256	`5e2cf5fa12233648627d32f448f69f77ec5250435f13456167c70ff5b8e56f3c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ocrqueen-0.5.0-py3-none-any.whl:

Publisher: release.yml on ocrqueen/ocrqueen-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ocrqueen-0.5.0-py3-none-any.whl
- Subject digest: a69a587599b3eeca9eb8d6f7f960d7b7d06de3197755653bc04c92e78c0a4aff
- Sigstore transparency entry: 1593271471
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: ocrqueen/ocrqueen-python@d61b4c3b6a967aad9e9b306d14b504471b5d6dbf
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/ocrqueen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d61b4c3b6a967aad9e9b306d14b504471b5d6dbf
- Trigger Event: push

ocrqueen 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ocrqueen-python

Installation

Supported formats

Quickstart

Other file types

Patent extraction (`domain="patent"`)

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

ocrqueen 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ocrqueen-python

Installation

Supported formats

Quickstart

Other file types

Patent extraction (domain="patent")

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Patent extraction (`domain="patent"`)