Skip to main content

Official Python SDK for the Scan Hero document conversion API

Project description

scanhero-python

Official Python SDK for the Scan Hero document conversion API.

Convert PDF, Word, Excel, PowerPoint, images, audio, email, and 20+ other formats to Markdown (or DOCX, CSV, EPUB, and more) via a simple Python interface.

Install

pip install scanhero

Python 3.9+ required. The only dependency is httpx.

Quick start

from scanhero import ScanHero

sh = ScanHero(api_key="sh_...")  # get your key at scanheroai.com/settings/api-keys

# Convert a PDF — sync for files ≤5 MB
task = sh.tasks.create("report.pdf")
print(task.output_markdown)

# Large files process asynchronously — wait until done
task = sh.tasks.create("recording.mp4")
task = sh.tasks.wait(task.task_id)   # polls every 2s, up to 5 minutes
print(task.output_markdown)

# Refine output with an LLM prompt
task = sh.tasks.adjust(task.task_id, "Summarise in bullet points")

# Download as DOCX
docx_bytes = sh.tasks.download(task.task_id, format="docx")
with open("output.docx", "wb") as f:
    f.write(docx_bytes)

Authentication

Generate an API key at scanheroai.com/settings/api-keys.

sh = ScanHero(api_key="sh_your_key_here")

Or set the SCANHERO_API_KEY environment variable and use:

import os
sh = ScanHero(api_key=os.environ["SCANHERO_API_KEY"])

Tasks

# Upload from a path
task = sh.tasks.create("invoice.pdf")

# Upload from a file object
with open("invoice.pdf", "rb") as f:
    task = sh.tasks.create(f)

# Upload raw bytes
task = sh.tasks.create(pdf_bytes, filename="invoice.pdf")

# With options
from scanhero import ProcessingOptions

task = sh.tasks.create(
    "scan.jpg",
    options=ProcessingOptions(
        image_handling="describe",   # ask LLM to describe images
        output_language="pt",        # Portuguese output
        output_format="markdown",
    ),
)

# Check status
task = sh.tasks.get(task.task_id)
print(task.status)      # "pending" | "processing" | "done" | "failed"
print(task.credits_used)

# List recent tasks
tasks = sh.tasks.list()

# Estimate cost before uploading
estimate = sh.tasks.estimate_cost(size_bytes=5_000_000, format="application/pdf")
print(f"Will cost {estimate.credits} credits")

Batch jobs

job = sh.jobs.create(["file1.pdf", "file2.docx", "file3.xlsx"])
print(job.job_id, job.status)

# Check progress
job = sh.jobs.get(job.job_id)
for item in job.items:
    print(item.filename, item.status)

Webhooks

# Register a webhook
wh = sh.webhooks.create(
    "https://your.app/hooks/scanhero",
    events=["task.completed", "task.failed"],
)
print(wh.webhook_id)

# In your web server, verify incoming payloads:
from scanhero import ScanHero
from scanhero.webhooks import WebhooksResource

is_valid = WebhooksResource.verify_signature(
    payload=request.body,
    signature_header=request.headers["X-Scan-Hero-Signature"],
    secret="your_webhook_secret",
)

Templates

from scanhero import ProcessingOptions

tmpl = sh.templates.create(
    "Legal doc pipeline",
    options=ProcessingOptions(output_language="en", image_handling="describe"),
    adjust_prompts=["Format citations as footnotes", "Add an executive summary"],
)

# Use template when creating tasks
task = sh.tasks.create("contract.pdf", template_id=tmpl.template_id)

Error handling

from scanhero import (
    ScanHeroError,
    InsufficientCreditsError,
    AuthenticationError,
    NotFoundError,
)

try:
    task = sh.tasks.create("huge_video.mp4")
except InsufficientCreditsError:
    print("Not enough credits — top up at scanheroai.com/pricing")
except AuthenticationError:
    print("Invalid API key")
except ScanHeroError as e:
    print(f"API error {e.status_code}: {e}")

Regenerating from the OpenAPI spec

This SDK can be regenerated automatically from the live API spec:

# Install the generator
pip install openapi-python-client

# Regenerate (requires the API to be running)
openapi-python-client generate \
    --url https://api.scanheroai.com/openapi.json \
    --output-path sdk/python-generated

For the handcrafted SDK (this package), update sdk/python/ directly.

API reference

Full reference: scanheroai.com/docs
Interactive (OpenAPI): scanheroai.com/docs/reference

Related SDKs

Language Package Docs
Python pip install scanhero This package (sdk/python/)
TypeScript / JavaScript npm install @scanhero/sdk sdk/typescript/ — generated from /openapi.json

Both SDKs are documented together at scanheroai.com/docs.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanheroai-1.0.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scanheroai-1.0.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file scanheroai-1.0.0.tar.gz.

File metadata

  • Download URL: scanheroai-1.0.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scanheroai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 616b8a5e03b84fc9c66dcaf17d52fd69b684aec2a8e85f8f44b7342bfd702041
MD5 91d8951efe0b295d02a974ae8ef851df
BLAKE2b-256 19fa41189b2ed238725f40b32d01751febfb6ef4652f449323428370197940e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for scanheroai-1.0.0.tar.gz:

Publisher: publish-python-sdk.yml on LeoBR84p/scan-hero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scanheroai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: scanheroai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scanheroai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b71a34bff94e6f9e7842f3f0e4857b9d39d024c466bb4efb38745a6d17d2c11
MD5 03337d42defd2ee76e34638247d475cd
BLAKE2b-256 c5c598c6c914f2c3ba9398bdf7a853534582c3ecf41e6ed27a07724118971b20

See more details on using hashes here.

Provenance

The following attestation bundles were made for scanheroai-1.0.0-py3-none-any.whl:

Publisher: publish-python-sdk.yml on LeoBR84p/scan-hero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page