Retab official python library
Project description
Retab Python SDK
Official Python SDK for Retab document extraction.
Installation
pip install retab
The client reads RETAB_API_KEY from the environment by default.
Quick Start
import os
from retab import Retab
client = Retab(api_key=os.environ["RETAB_API_KEY"])
invoice_schema = {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"invoice_date": {"type": "string"},
"total_amount": {"type": "number"},
},
"required": ["invoice_number", "total_amount"],
}
result = client.documents.extract(
json_schema=invoice_schema,
document="invoice.pdf",
model="retab-micro",
)
print(result.data)
print(result.text)
print(result.likelihoods)
print(result.extraction_id)
documents.extract(...) returns a RetabParsedChatCompletion.
result.datais the parsed structured outputresult.textis the raw JSON stringresult.likelihoodsmirrors the extracted structure with confidence signalsresult.extraction_idcan be used with theextractionsAPI later
What extract Accepts
json_schema can be:
- a Python
dict - a path to a JSON schema file
document can be:
- a local file path
- a file-like object
- a URL
MIMEData
Useful extraction options:
n_consensus: run multiple passes and reconcile the resultimage_resolution_dpi: control image rendering quality for vision modelsmetadata: attach your own tags for later filteringadditional_messages: add extra instructions or context after the document content
Async Extraction
import os
from retab import AsyncRetab
async def main() -> None:
client = AsyncRetab(api_key=os.environ["RETAB_API_KEY"])
async with client:
result = await client.documents.extract(
json_schema={
"type": "object",
"properties": {
"booking_reference": {"type": "string"},
"guest_name": {"type": "string"},
},
},
document="booking-confirmation.pdf",
model="retab-micro",
)
print(result.data)
Streaming Extraction
extract_stream(...) yields partial RetabParsedChatCompletion objects as the JSON fills in.
from retab import Retab
client = Retab()
with client.documents.extract_stream(
json_schema={
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"total_amount": {"type": "number"},
},
},
document="invoice.pdf",
model="retab-micro",
) as stream:
for partial in stream:
print(partial.data)
For async code:
async with client.documents.extract_stream(
json_schema=invoice_schema,
document="invoice.pdf",
model="retab-micro",
) as stream:
async for partial in stream:
print(partial.data)
Adding Context with additional_messages
The SDK supports the same message structure used in the tests: plain text messages, system or developer guidance, and multipart content.
result = client.documents.extract(
json_schema=invoice_schema,
document="invoice.pdf",
model="retab-micro",
additional_messages=[
{
"role": "developer",
"content": "Extract values exactly as written. Do not normalize vendor names.",
},
{
"role": "user",
"content": "Focus on invoice number, invoice date, and total amount due.",
},
],
)
Working with Stored Extractions
Every extraction can be retrieved later through client.extractions.
result = client.documents.extract(
json_schema=invoice_schema,
document="invoice.pdf",
model="retab-micro",
metadata={"batch_id": "march-2026"},
)
stored = client.extractions.get(result.extraction_id)
print(stored.predictions)
page_sources = client.extractions.sources(result.extraction_id)
print(page_sources.sources)
recent = client.extractions.list(limit=20, metadata={"batch_id": "march-2026"})
for item in recent.items:
print(item.id, item.file.filename)
client.extractions.download(...) returns a pre-signed download URL for jsonl, csv, or xlsx exports.
Workflows
The Python SDK also supports workflow discovery, execution, and step inspection.
from pathlib import Path
from retab import Retab
client = Retab()
workflow = client.workflows.get_entities("wf_abc123")
document_start_id = workflow.start_nodes[0].id
run = client.workflows.runs.create(
workflow_id=workflow.workflow.id,
documents={document_start_id: Path("invoice.pdf")},
)
run = client.workflows.runs.wait_for_completion(run.id, poll_interval_seconds=1.0)
run.raise_for_status()
print(run.output)
step = client.workflows.runs.steps.get(run.id, "extract-node-id")
print(step.extracted_data)
Useful workflow helpers:
client.workflows.get_entities(workflow_id)returns the workflow graph and exposes.start_nodesand.start_json_nodesclient.workflows.runs.wait_for_completion(run.id)polls until the run reachescompleted,error, orcancelledclient.workflows.runs.steps.get(run.id, node_id)returns typed handle inputs and outputsclient.workflows.runs.steps.get_all(run)fetches step outputs for every node in one callclient.workflows.blocks.*andclient.workflows.edges.*let you create or update workflow graphs from code
Notes
n_consensus=1is the fastest option- higher
n_consensususually improves robustness on noisy or ambiguous documents - if schema validation fails,
result.choices[0].message.parsedmay beNone
Links
- Docs: https://docs.retab.com
- API reference: https://docs.retab.com/api-reference/introduction
- Repository: https://github.com/retab-dev/retab
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file retab-0.0.110.tar.gz.
File metadata
- Download URL: retab-0.0.110.tar.gz
- Upload date:
- Size: 135.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0ef7b4a48d68f60249111ea374d0f3a726015689f927e737002e17f5fb5733a
|
|
| MD5 |
b52d038e5234976ae94f96089dfc2025
|
|
| BLAKE2b-256 |
c2627907c63bb693b8976e84d83c231207d6c0836a8c2d29591968b905b47614
|
File details
Details for the file retab-0.0.110-py3-none-any.whl.
File metadata
- Download URL: retab-0.0.110-py3-none-any.whl
- Upload date:
- Size: 153.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2133f4af4b03ad43de89783e75e701a8b56db125183e3737adb55c87011e0ae4
|
|
| MD5 |
04291ac2a8b3fa2ef25bba1f8b9bc45a
|
|
| BLAKE2b-256 |
7cf275f4101d58907dc1dfa7bdd58aeaaa9b8971e8ebf958c5bc815e89030bf7
|