Official Python SDK for the scan-forge OCR service
Project description
scanforge
Official Python SDK for the scan-forge OCR service — an on-premise, AI-powered drop-in replacement for ABBYY Recognition Server.
Installation
pip install scanforge
Requires Python 3.11+.
Quick Start
from scanforge import Client
client = Client(api_key="sf_live_...")
# Extract text from a PDF
result = client.ocr("faktura.pdf")
print(result.text)
# Detect barcodes
barcodes = client.barcodes("dokument.pdf")
for b in barcodes:
print(b.value, b.type)
# Convert a scan to DOCX
client.convert("skan.png", output="wynik.docx")
API Reference
Client(api_key, base_url=...)
Creates a new client instance.
| Parameter | Type | Required | Default |
|---|---|---|---|
api_key |
str |
Yes | — |
base_url |
str |
No | https://api.scanforge.tech |
client = Client(
api_key="sf_live_...",
base_url="https://ocr.your-server.com", # for self-hosted deployments
)
client.ocr(file_path, *, language=None, page_number=None, separate_pages=False)
Extracts text from a PDF or image file.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path |
str |
— | Path to input file (PDF, PNG, JPG, TIFF) |
language |
str | None |
None |
OCR language code; auto-detected server-side when omitted |
page_number |
int | None |
None |
Process a single page (0-indexed) |
separate_pages |
bool |
False |
Return each page separated by form-feed in text |
Returns OcrResult
@dataclass
class OcrResult:
text: str
pages: int
metadata: dict[str, Any]
Example
result = client.ocr("invoice.pdf", language="eng")
print(result.text) # extracted text
print(result.pages) # number of pages processed
client.barcodes(file_path, *, page_number=0)
Detects and decodes barcodes (1D and 2D) in a document.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path |
str |
— | Path to input file |
page_number |
int |
0 |
Page to scan (0 = all pages) |
Returns list[BarcodeResult]
@dataclass
class BarcodeResult:
value: str # decoded barcode content
type: str # symbology e.g. 'EAN-13', 'QR-Code', 'CODE-128'
page: int # 1-indexed page number
Example
barcodes = client.barcodes("shipment.pdf")
for b in barcodes:
print(b.value, b.type, b.page)
client.convert(file_path, *, output)
Converts a PDF or image to an editable document format. The output format is determined by the extension of output (.docx → DOCX, .xlsx → XLSX).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path |
str |
— | Path to input file |
output |
str |
— | Destination path (.docx or .xlsx) |
Returns None — the converted file is downloaded and written to output locally.
Example
# Convert to Word document
client.convert("scan.pdf", output="result.docx")
# Convert to Excel spreadsheet (preserves table structure)
client.convert("table.pdf", output="data.xlsx")
Error Handling
All methods raise ScanForgeError on failure.
from scanforge import Client, ScanForgeError
client = Client(api_key="sf_live_...")
try:
result = client.ocr("document.pdf")
except ScanForgeError as e:
print(e) # human-readable message
print(e.status_code) # HTTP status code (int or None for network errors)
print(e.body) # raw response body from the server
| Error condition | status_code |
|---|---|
| Invalid API key | 401 |
| Unsupported file type | 422 |
| Server error | 5xx |
| Network / connection failure | None |
Configuration
Self-hosted deployment
Point the client at your own scan-forge server:
client = Client(
api_key="sf_live_...",
base_url="https://ocr.internal.example.com",
)
Environment variables (recommended)
import os
from scanforge import Client
client = Client(
api_key=os.environ["SCANFORGE_API_KEY"],
base_url=os.environ.get("SCANFORGE_URL", "http://localhost:8000"),
)
Requirements
- Python 3.11+
- A running scan-forge server — see deployment docs
License
MIT © Moonforge
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scanforge-1.0.0.tar.gz.
File metadata
- Download URL: scanforge-1.0.0.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1ca73487f56c7f8c65734aaeb5beb8b2687558cb4de15391136b6e5d877a1cd
|
|
| MD5 |
5a21cf1da84014ae68222627a7c97d73
|
|
| BLAKE2b-256 |
d440c483dd10e8d3a73b04f161d06e7d735dc792de3aeaafb569b9c3cb08d950
|
File details
Details for the file scanforge-1.0.0-py3-none-any.whl.
File metadata
- Download URL: scanforge-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8ef38ebaf053a3a6bfe3e495e8aa318d09d18e20ebefdda69d4ffcd054ba7d9
|
|
| MD5 |
1ded133ff30ad7e15bb0235e1c08a1d8
|
|
| BLAKE2b-256 |
0e375e950ec6360f677e78064bd3583bc62d38d0b9178b3cf5dd4e140a214fdf
|