Official Python SDK for the Scan Hero document conversion API
Project description
scanhero-python
Official Python SDK for the Scan Hero document conversion API.
Convert PDF, Word, Excel, PowerPoint, images, audio, email, and 20+ other formats to Markdown (or DOCX, CSV, EPUB, and more) via a simple Python interface.
Install
pip install scanhero
Python 3.9+ required. The only dependency is httpx.
Quick start
from scanhero import ScanHero
sh = ScanHero(api_key="sh_...") # get your key at scanheroai.com/settings/api-keys
# Convert a PDF — sync for files ≤5 MB
task = sh.tasks.create("report.pdf")
print(task.output_markdown)
# Large files process asynchronously — wait until done
task = sh.tasks.create("recording.mp4")
task = sh.tasks.wait(task.task_id) # polls every 2s, up to 5 minutes
print(task.output_markdown)
# Refine output with an LLM prompt
task = sh.tasks.adjust(task.task_id, "Summarise in bullet points")
# Download as DOCX
docx_bytes = sh.tasks.download(task.task_id, format="docx")
with open("output.docx", "wb") as f:
f.write(docx_bytes)
Authentication
Generate an API key at scanheroai.com/settings/api-keys.
sh = ScanHero(api_key="sh_your_key_here")
Or set the SCANHERO_API_KEY environment variable and use:
import os
sh = ScanHero(api_key=os.environ["SCANHERO_API_KEY"])
Tasks
# Upload from a path
task = sh.tasks.create("invoice.pdf")
# Upload from a file object
with open("invoice.pdf", "rb") as f:
task = sh.tasks.create(f)
# Upload raw bytes
task = sh.tasks.create(pdf_bytes, filename="invoice.pdf")
# With options
from scanhero import ProcessingOptions
task = sh.tasks.create(
"scan.jpg",
options=ProcessingOptions(
image_handling="describe", # ask LLM to describe images
output_language="pt", # Portuguese output
output_format="markdown",
),
)
# Check status
task = sh.tasks.get(task.task_id)
print(task.status) # "pending" | "processing" | "done" | "failed"
print(task.credits_used)
# List recent tasks
tasks = sh.tasks.list()
# Estimate cost before uploading
estimate = sh.tasks.estimate_cost(size_bytes=5_000_000, format="application/pdf")
print(f"Will cost {estimate.credits} credits")
Batch jobs
job = sh.jobs.create(["file1.pdf", "file2.docx", "file3.xlsx"])
print(job.job_id, job.status)
# Check progress
job = sh.jobs.get(job.job_id)
for item in job.items:
print(item.filename, item.status)
Webhooks
# Register a webhook
wh = sh.webhooks.create(
"https://your.app/hooks/scanhero",
events=["task.completed", "task.failed"],
)
print(wh.webhook_id)
# In your web server, verify incoming payloads:
from scanhero import ScanHero
from scanhero.webhooks import WebhooksResource
is_valid = WebhooksResource.verify_signature(
payload=request.body,
signature_header=request.headers["X-Scan-Hero-Signature"],
secret="your_webhook_secret",
)
Templates
from scanhero import ProcessingOptions
tmpl = sh.templates.create(
"Legal doc pipeline",
options=ProcessingOptions(output_language="en", image_handling="describe"),
adjust_prompts=["Format citations as footnotes", "Add an executive summary"],
)
# Use template when creating tasks
task = sh.tasks.create("contract.pdf", template_id=tmpl.template_id)
Error handling
from scanhero import (
ScanHeroError,
InsufficientCreditsError,
AuthenticationError,
NotFoundError,
)
try:
task = sh.tasks.create("huge_video.mp4")
except InsufficientCreditsError:
print("Not enough credits — top up at scanheroai.com/pricing")
except AuthenticationError:
print("Invalid API key")
except ScanHeroError as e:
print(f"API error {e.status_code}: {e}")
Regenerating from the OpenAPI spec
This SDK can be regenerated automatically from the live API spec:
# Install the generator
pip install openapi-python-client
# Regenerate (requires the API to be running)
openapi-python-client generate \
--url https://api.scanheroai.com/openapi.json \
--output-path sdk/python-generated
For the handcrafted SDK (this package), update sdk/python/ directly.
API reference
Full reference: scanheroai.com/docs
Interactive (OpenAPI): scanheroai.com/docs/reference
Related SDKs
| Language | Package | Docs |
|---|---|---|
| Python | pip install scanhero |
This package (sdk/python/) |
| TypeScript / JavaScript | npm install @scanhero/sdk |
sdk/typescript/ — generated from /openapi.json |
Both SDKs are documented together at scanheroai.com/docs.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scanheroai-1.0.0.tar.gz.
File metadata
- Download URL: scanheroai-1.0.0.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
616b8a5e03b84fc9c66dcaf17d52fd69b684aec2a8e85f8f44b7342bfd702041
|
|
| MD5 |
91d8951efe0b295d02a974ae8ef851df
|
|
| BLAKE2b-256 |
19fa41189b2ed238725f40b32d01751febfb6ef4652f449323428370197940e1
|
Provenance
The following attestation bundles were made for scanheroai-1.0.0.tar.gz:
Publisher:
publish-python-sdk.yml on LeoBR84p/scan-hero
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scanheroai-1.0.0.tar.gz -
Subject digest:
616b8a5e03b84fc9c66dcaf17d52fd69b684aec2a8e85f8f44b7342bfd702041 - Sigstore transparency entry: 1632423757
- Sigstore integration time:
-
Permalink:
LeoBR84p/scan-hero@7074a62179376f3fe0b90cc21b93baded247f4bb -
Branch / Tag:
refs/tags/sdk-python-v1.0.0 - Owner: https://github.com/LeoBR84p
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python-sdk.yml@7074a62179376f3fe0b90cc21b93baded247f4bb -
Trigger Event:
push
-
Statement type:
File details
Details for the file scanheroai-1.0.0-py3-none-any.whl.
File metadata
- Download URL: scanheroai-1.0.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b71a34bff94e6f9e7842f3f0e4857b9d39d024c466bb4efb38745a6d17d2c11
|
|
| MD5 |
03337d42defd2ee76e34638247d475cd
|
|
| BLAKE2b-256 |
c5c598c6c914f2c3ba9398bdf7a853534582c3ecf41e6ed27a07724118971b20
|
Provenance
The following attestation bundles were made for scanheroai-1.0.0-py3-none-any.whl:
Publisher:
publish-python-sdk.yml on LeoBR84p/scan-hero
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scanheroai-1.0.0-py3-none-any.whl -
Subject digest:
7b71a34bff94e6f9e7842f3f0e4857b9d39d024c466bb4efb38745a6d17d2c11 - Sigstore transparency entry: 1632423772
- Sigstore integration time:
-
Permalink:
LeoBR84p/scan-hero@7074a62179376f3fe0b90cc21b93baded247f4bb -
Branch / Tag:
refs/tags/sdk-python-v1.0.0 - Owner: https://github.com/LeoBR84p
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python-sdk.yml@7074a62179376f3fe0b90cc21b93baded247f4bb -
Trigger Event:
push
-
Statement type: