Official Python SDK for the scan-forge OCR service

These details have not been verified by PyPI

Project links

Project description

scanforge

Official Python SDK for the scan-forge OCR service — an on-premise, AI-powered drop-in replacement for ABBYY Recognition Server.

Installation

pip install scanforge

Requires Python 3.11+.

Quick Start

from scanforge import Client

client = Client(api_key="sf_live_...")

# Extract text from a PDF
result = client.ocr("faktura.pdf")
print(result.text)

# Detect barcodes
barcodes = client.barcodes("dokument.pdf")
for b in barcodes:
    print(b.value, b.type)

# Convert a scan to DOCX
client.convert("skan.png", output="wynik.docx")

API Reference

`Client(api_key, base_url=...)`

Creates a new client instance.

Parameter	Type	Required	Default
`api_key`	`str`	Yes	—
`base_url`	`str`	No	`https://api.scanforge.tech`

client = Client(
    api_key="sf_live_...",
    base_url="https://ocr.your-server.com",  # for self-hosted deployments
)

`client.ocr(file_path, *, language=None, page_number=None, page_range=None, separate_pages=False, poll_interval=1.5, timeout=600)`

Extracts text from a PDF or image file.

Internally this submits an asynchronous OCR job and polls it to completion, so the call blocks until the result is ready (or timeout seconds elapse). For full control over the lifecycle use the low-level submit_ocr / get_ocr_job methods instead.

Parameters

Parameter	Type	Default	Description
`file_path`	`str`	—	Path to input file (PDF, PNG, JPG, TIFF)
`language`	`str \| None`	`None`	OCR language code; auto-detected server-side when omitted
`page_number`	`int \| None`	`None`	Process a single page (0-indexed)
`page_range`	`str \| None`	`None`	1-indexed inclusive page range, e.g. `"3"` or `"1-5"`. Takes precedence over `page_number`
`separate_pages`	`bool`	`False`	Return each page separated by form-feed in `text`
`poll_interval`	`float`	`1.5`	Seconds between job-status polls
`timeout`	`float`	`600`	Max seconds to wait for the job before raising `ScanForgeError`

Returns OcrResult

@dataclass
class OcrResult:
    text: str
    pages: int
    metadata: dict[str, Any]

Example

result = client.ocr("invoice.pdf", language="eng")
print(result.text)    # extracted text
print(result.pages)   # number of pages processed

`client.barcodes(file_path, *, page_number=0)`

Detects and decodes barcodes (1D and 2D) in a document.

Parameters

Parameter	Type	Default	Description
`file_path`	`str`	—	Path to input file
`page_number`	`int`	`0`	Page to scan (`0` = all pages)

Returns list[BarcodeResult]

@dataclass
class BarcodeResult:
    value: str   # decoded barcode content
    type: str    # symbology e.g. 'EAN-13', 'QR-Code', 'CODE-128'
    page: int    # 1-indexed page number

Example

barcodes = client.barcodes("shipment.pdf")
for b in barcodes:
    print(b.value, b.type, b.page)

`client.convert(file_path, *, output)`

Converts a PDF or image to an editable document format. The output format is determined by the extension of output (.docx → DOCX, .xlsx → XLSX).

Parameters

Parameter	Type	Default	Description
`file_path`	`str`	—	Path to input file
`output`	`str`	—	Destination path (`.docx` or `.xlsx`)

Returns None — the converted file is downloaded and written to output locally.

Example

# Convert to Word document
client.convert("scan.pdf", output="result.docx")

# Convert to Excel spreadsheet (preserves table structure)
client.convert("table.pdf", output="data.xlsx")

Low-level asynchronous API

ocr() and convert() run on top of the asynchronous OCR backend: they submit a job and poll until it finishes. If you want to drive that lifecycle yourself — e.g. submit many files and poll later, or integrate with your own task queue — use the two low-level methods directly.

`client.submit_ocr(file_path, *, fmt="TextUnicodeDefaults", language=None, page_number=None, page_range=None, separate_pages=False)`

Uploads the file and enqueues an OCR job. Returns the raw response dict {"job_id": str, "status": "queued"}. Pass fmt="DOCX" or fmt="XLSX" for a conversion job.

`client.get_ocr_job(job_id)`

Fetches the current job state. Returns the raw job document:

{
    "job_id": str,
    "status": "queued" | "running" | "succeeded" | "failed",
    "created_at": str,
    "updated_at": str,
    "result": {...},   # present only when status == "succeeded"
    "error": str,      # present only when status == "failed"
}

Example

job = client.submit_ocr("invoice.pdf", page_range="1-5")
print(job["job_id"], job["status"])  # 'a1b2c3' 'queued'

# ...poll on your own schedule...
state = client.get_ocr_job(job["job_id"])
if state["status"] == "succeeded":
    print(state["result"]["text"])
elif state["status"] == "failed":
    print("failed:", state["error"])

Error Handling

All methods raise ScanForgeError on failure.

from scanforge import Client, ScanForgeError

client = Client(api_key="sf_live_...")

try:
    result = client.ocr("document.pdf")
except ScanForgeError as e:
    print(e)              # human-readable message
    print(e.status_code)  # HTTP status code (int or None for network errors)
    print(e.body)         # raw response body from the server

Error condition	`status_code`
Invalid API key	`401`
Unsupported file type	`422`
Server error	`5xx`
Network / connection failure	`None`

Configuration

Self-hosted deployment

Point the client at your own scan-forge server:

client = Client(
    api_key="sf_live_...",
    base_url="https://ocr.internal.example.com",
)

Environment variables (recommended)

import os
from scanforge import Client

client = Client(
    api_key=os.environ["SCANFORGE_API_KEY"],
    base_url=os.environ.get("SCANFORGE_URL", "http://localhost:8000"),
)

Requirements

Python 3.11+
A running scan-forge server — see deployment docs

License

MIT © Moonforge

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Jun 11, 2026

1.0.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanforge-1.1.0.tar.gz (9.0 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scanforge-1.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file scanforge-1.1.0.tar.gz.

File metadata

Download URL: scanforge-1.1.0.tar.gz
Upload date: Jun 11, 2026
Size: 9.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scanforge-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b8e01f33fd7e878c97d5595c84a51e64b7d8b4f0ccfcab288abd9f1b61594f0b`
MD5	`00b57aa981ba3c321496c30e6f9707ab`
BLAKE2b-256	`f213fdabced046ce8952badfb4391cf0bb2714efdd7e6e6238db7dbbac8c3354`

See more details on using hashes here.

File details

Details for the file scanforge-1.1.0-py3-none-any.whl.

File metadata

Download URL: scanforge-1.1.0-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scanforge-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f185a00257f8bf74435895bdbcc1be632d2e412fda30aab53917e5c9a2f34b5e`
MD5	`a57759aa018a0a76e156e27cb2d0c8af`
BLAKE2b-256	`7355b1edf209a9ba379960c2f9080495cc00fcd5e7e60f8b6ca86caf33679a7d`

See more details on using hashes here.

scanforge 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scanforge

Installation

Quick Start

API Reference

Client(api_key, base_url=...)

client.ocr(file_path, *, language=None, page_number=None, page_range=None, separate_pages=False, poll_interval=1.5, timeout=600)

client.barcodes(file_path, *, page_number=0)

client.convert(file_path, *, output)

Low-level asynchronous API

client.submit_ocr(file_path, *, fmt="TextUnicodeDefaults", language=None, page_number=None, page_range=None, separate_pages=False)

client.get_ocr_job(job_id)

Error Handling

Configuration

Self-hosted deployment

Environment variables (recommended)

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`Client(api_key, base_url=...)`

`client.ocr(file_path, *, language=None, page_number=None, page_range=None, separate_pages=False, poll_interval=1.5, timeout=600)`

`client.barcodes(file_path, *, page_number=0)`

`client.convert(file_path, *, output)`

`client.submit_ocr(file_path, *, fmt="TextUnicodeDefaults", language=None, page_number=None, page_range=None, separate_pages=False)`

`client.get_ocr_job(job_id)`