Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown
Project description
markdownbridge
Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown.
Installation
pip install markdownbridge
Quick Start
from markdownbridge import MarkdownBridge
client = MarkdownBridge(api_key="ocrb_prd_xxx")
# One-liner: URL → Markdown
result = client.ocr("https://example.com/invoice.pdf")
print(result.markdown)
# One-liner: local file → Markdown
result = client.ocr("./receipt.png")
print(result.markdown)
Authentication
Pass your API key directly or set the MARKDOWNBRIDGE_API_KEY environment variable:
export MARKDOWNBRIDGE_API_KEY="ocrb_prd_xxx"
client = MarkdownBridge() # reads from env
Client Options
client = MarkdownBridge(
api_key="ocrb_prd_xxx", # or env MARKDOWNBRIDGE_API_KEY
base_url="https://api.markdownbridge.com", # default
timeout=30.0, # request timeout in seconds
max_retries=3, # retry 5xx errors with backoff
)
API Reference
client.ocr(source, **opts)
The convenience method — give it a URL or file path, get back a ProcessingResult.
result = client.ocr(
"https://example.com/doc.pdf",
language="en",
output_format="markdown",
enhance_quality=True,
poll_interval=2.0, # seconds between status checks
poll_timeout=300.0, # max wait time
)
print(result.markdown)
print(result.page_count)
client.process_url(file_url, **opts)
Submit a URL for processing without waiting for completion.
proc = client.process_url("https://example.com/doc.pdf")
print(proc.process_id) # use with get_status() / wait_for_completion()
client.process_file(file_path, **opts)
Upload a local file and submit it for processing.
proc = client.process_file("./invoice.pdf")
print(proc.process_id)
client.upload_file(file_path)
Upload a file without processing it.
upload = client.upload_file("./photo.png")
print(upload.document_id)
client.get_status(process_id)
Check the current status of a processing job.
status = client.get_status("uuid-here")
print(status.status) # queued | processing | completed | failed
print(status.progress) # 0–100
print(status.stage) # queued | download | ocr | llm_improvement | completed | failed
client.wait_for_completion(process_id, **opts)
Poll until the job completes or fails.
result = client.wait_for_completion(
"uuid-here",
poll_interval=2.0,
poll_timeout=300.0,
on_status_change=lambda s: print(f"Status: {s.status} ({s.stage})"),
)
client.list_results(**filters)
Fetch paginated results.
page = client.list_results(limit=20, offset=0, status="completed")
for item in page.data:
print(item.file_name, item.status)
print(f"Total: {page.pagination.total}")
client.iter_results(**filters)
Auto-paginating iterator over all results.
for item in client.iter_results(status="completed"):
print(item.file_name)
client.get_result(result_id)
Fetch a specific result by ID.
result = client.get_result("uuid-here")
print(result.result.markdown)
client.info()
Get API version and status.
info = client.info()
print(info.version, info.status)
Async Usage
Every method has an async equivalent via AsyncMarkdownBridge:
import asyncio
from markdownbridge import AsyncMarkdownBridge
async def main():
async with AsyncMarkdownBridge(api_key="ocrb_prd_xxx") as client:
result = await client.ocr("https://example.com/invoice.pdf")
print(result.markdown)
# Auto-paginating async iteration
async for item in client.iter_results():
print(item.file_name)
asyncio.run(main())
Error Handling
All exceptions inherit from MarkdownBridgeError and include status_code, error_code, and correlation_id:
from markdownbridge import MarkdownBridge, RateLimitError, AuthenticationError
client = MarkdownBridge(api_key="ocrb_prd_xxx")
try:
result = client.ocr("https://example.com/doc.pdf")
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except MarkdownBridgeError as e:
print(f"API error {e.status_code}: {e}")
Exception Hierarchy
| Exception | HTTP Status | When |
|---|---|---|
AuthenticationError |
401 | Invalid or missing API key |
ValidationError |
400/422 | Invalid request parameters |
NotFoundError |
404 | Resource not found |
RateLimitError |
429 | Too many requests |
InsufficientCreditsError |
402 | Account has no credits |
ServerError |
5xx | Server-side failure |
ProcessingError |
— | OCR job failed |
FileUploadError |
— | Upload failed |
TimeoutError |
— | Polling exceeded timeout |
Data Types
All response types are frozen dataclasses:
ProcessResponse— process_id, status, file_id, stageProcessingStatus— process_id, status, progress, stage, result, errorProcessingResult— text, markdown, json, page_count, processing_timeUploadResponse— file_key, public_url, document_idResultItem— id, process_id, file_name, status, resultResultsPage— data, paginationPagination— total, limit, offset, has_more, next_offsetApiInfo— version, status, endpoints
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdownbridge-0.1.0.tar.gz.
File metadata
- Download URL: markdownbridge-0.1.0.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f912bc8a5688b768d03977594fd83c88799bc4cc9b505a29c61b568a75ddcb6
|
|
| MD5 |
662d77c0945ea26416f2bc3f9ec9169c
|
|
| BLAKE2b-256 |
04d23e6fe61681fa0a469eb71013b446cf419fbaf6ad3242dc210293d40b4c1f
|
File details
Details for the file markdownbridge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: markdownbridge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4b62ae038080c1a2e0a38e74d0fcbde2539b9f3db1937c667048cc22667d8dc
|
|
| MD5 |
de6460cde512d2f80ea50bde9b12b4e7
|
|
| BLAKE2b-256 |
006900bdb22eb5f8900faae4a69fba3c2fcc995c62a3e8c90072b418a42c23d5
|