Skip to main content

Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown

Project description

markdownbridge

Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown.

Installation

pip install markdownbridge

Quick Start

from markdownbridge import MarkdownBridge

client = MarkdownBridge(api_key="ocrb_prd_xxx")

# One-liner: URL → Markdown
result = client.ocr("https://example.com/invoice.pdf")
print(result.markdown)

# One-liner: local file → Markdown
result = client.ocr("./receipt.png")
print(result.markdown)

Authentication

Pass your API key directly or set the MARKDOWNBRIDGE_API_KEY environment variable:

export MARKDOWNBRIDGE_API_KEY="ocrb_prd_xxx"
client = MarkdownBridge()  # reads from env

Client Options

client = MarkdownBridge(
    api_key="ocrb_prd_xxx",                        # or env MARKDOWNBRIDGE_API_KEY
    base_url="https://api.markdownbridge.com",      # default
    timeout=30.0,                                    # request timeout in seconds
    max_retries=3,                                   # retry 5xx errors with backoff
)

API Reference

client.ocr(source, **opts)

The convenience method — give it a URL or file path, get back a ProcessingResult.

result = client.ocr(
    "https://example.com/doc.pdf",
    language="en",
    output_format="markdown",
    enhance_quality=True,
    poll_interval=2.0,     # seconds between status checks
    poll_timeout=300.0,    # max wait time
)
print(result.markdown)
print(result.page_count)

client.process_url(file_url, **opts)

Submit a URL for processing without waiting for completion.

proc = client.process_url("https://example.com/doc.pdf")
print(proc.process_id)  # use with get_status() / wait_for_completion()

client.process_file(file_path, **opts)

Upload a local file and submit it for processing.

proc = client.process_file("./invoice.pdf")
print(proc.process_id)

client.upload_file(file_path)

Upload a file without processing it.

upload = client.upload_file("./photo.png")
print(upload.document_id)

client.get_status(process_id)

Check the current status of a processing job.

status = client.get_status("uuid-here")
print(status.status)   # queued | processing | completed | failed
print(status.progress)  # 0–100
print(status.stage)     # queued | download | ocr | llm_improvement | completed | failed

client.wait_for_completion(process_id, **opts)

Poll until the job completes or fails.

result = client.wait_for_completion(
    "uuid-here",
    poll_interval=2.0,
    poll_timeout=300.0,
    on_status_change=lambda s: print(f"Status: {s.status} ({s.stage})"),
)

client.list_results(**filters)

Fetch paginated results.

page = client.list_results(limit=20, offset=0, status="completed")
for item in page.data:
    print(item.file_name, item.status)
print(f"Total: {page.pagination.total}")

client.iter_results(**filters)

Auto-paginating iterator over all results.

for item in client.iter_results(status="completed"):
    print(item.file_name)

client.get_result(result_id)

Fetch a specific result by ID.

result = client.get_result("uuid-here")
print(result.result.markdown)

client.info()

Get API version and status.

info = client.info()
print(info.version, info.status)

Async Usage

Every method has an async equivalent via AsyncMarkdownBridge:

import asyncio
from markdownbridge import AsyncMarkdownBridge

async def main():
    async with AsyncMarkdownBridge(api_key="ocrb_prd_xxx") as client:
        result = await client.ocr("https://example.com/invoice.pdf")
        print(result.markdown)

        # Auto-paginating async iteration
        async for item in client.iter_results():
            print(item.file_name)

asyncio.run(main())

Error Handling

All exceptions inherit from MarkdownBridgeError and include status_code, error_code, and correlation_id:

from markdownbridge import MarkdownBridge, RateLimitError, AuthenticationError

client = MarkdownBridge(api_key="ocrb_prd_xxx")

try:
    result = client.ocr("https://example.com/doc.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except MarkdownBridgeError as e:
    print(f"API error {e.status_code}: {e}")

Exception Hierarchy

Exception HTTP Status When
AuthenticationError 401 Invalid or missing API key
ValidationError 400/422 Invalid request parameters
NotFoundError 404 Resource not found
RateLimitError 429 Too many requests
InsufficientCreditsError 402 Account has no credits
ServerError 5xx Server-side failure
ProcessingError OCR job failed
FileUploadError Upload failed
TimeoutError Polling exceeded timeout

Data Types

All response types are frozen dataclasses:

  • ProcessResponse — process_id, status, file_id, stage
  • ProcessingStatus — process_id, status, progress, stage, result, error
  • ProcessingResult — text, markdown, json, page_count, processing_time
  • UploadResponse — file_key, public_url, document_id
  • ResultItem — id, process_id, file_name, status, result
  • ResultsPage — data, pagination
  • Pagination — total, limit, offset, has_more, next_offset
  • ApiInfo — version, status, endpoints

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdownbridge-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdownbridge-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file markdownbridge-0.1.0.tar.gz.

File metadata

  • Download URL: markdownbridge-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for markdownbridge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f912bc8a5688b768d03977594fd83c88799bc4cc9b505a29c61b568a75ddcb6
MD5 662d77c0945ea26416f2bc3f9ec9169c
BLAKE2b-256 04d23e6fe61681fa0a469eb71013b446cf419fbaf6ad3242dc210293d40b4c1f

See more details on using hashes here.

File details

Details for the file markdownbridge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: markdownbridge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for markdownbridge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4b62ae038080c1a2e0a38e74d0fcbde2539b9f3db1937c667048cc22667d8dc
MD5 de6460cde512d2f80ea50bde9b12b4e7
BLAKE2b-256 006900bdb22eb5f8900faae4a69fba3c2fcc995c62a3e8c90072b418a42c23d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page