Python SDK for the FlexOrch API — process documents, build LLM-ready datasets

These details have not been verified by PyPI

Project links

Project description

flexorch-sdk

Python SDK for the FlexOrch API.

FlexOrch turns unstructured documents (PDF, DOCX, invoices, emails…) into clean, structured, LLM-ready datasets — with automatic PII detection and masking, quality scoring, and multiple export formats.

Install

pip install flexorch-sdk

Requires Python 3.10+. The only dependency is httpx.

Quick start

from flexorch_sdk import FlexOrchClient

client = FlexOrchClient("fx_your_key_here")

# Upload a document and wait for the pipeline to finish
job = client.process("contract.pdf", locale="tr").wait()

print(job.quality_grade)   # "A"
print(job.quality_score)   # 0.91

# Download the resulting dataset
dataset = job.dataset()
dataset.export("jsonl", path="output.jsonl")

Auth

Pass your API key directly or set the FLEXORCH_API_KEY environment variable:

export FLEXORCH_API_KEY=fx_...

from flexorch_sdk import FlexOrchClient

client = FlexOrchClient()   # reads FLEXORCH_API_KEY automatically

Get your API key from app.flexorch.com → Settings.

Supported input formats

Category	Formats
Documents	PDF (text + scanned), DOCX, TXT
Spreadsheets	XLSX
Email	EML, MSG
E-invoices	XML/UBL (Peppol, GİB TR), FatturaPA (IT), XRechnung (DE), ZUGFeRD/Factur-X
Images	JPG, PNG, TIFF (OCR)
Web	HTML, HTM

Export formats

json · jsonl · csv · parquet · md · xml · xlsx · rag

dataset.export("jsonl", path="output.jsonl")   # write to file
raw = dataset.export("parquet")                # return bytes

The rag format produces LlamaIndex/LangChain-compatible chunks with metadata.

Processing

Single file

job = client.process("invoice.pdf", locale="de").wait()

locale is an IETF language tag used to activate the right PII detectors (tr, de, en, fr, it, nl, es, pl, und = all).

Batch

jobs = client.process_many(["a.pdf", "b.pdf", "c.pdf"], locale="und")
for job in jobs:
    job.wait()
    print(job.quality_grade, job.quality_score)

From S3

# Register a connector once; store conn.id for reuse
conn = client.connectors.create(
    "Production S3", "s3",
    {
        "bucket": "my-bucket",
        "region": "eu-central-1",
        "access_key_id": "AKIA...",
        "secret_access_key": "...",
    },
)

# Verify connectivity
result = client.connectors.test(conn.id)
print(result.success, result.latency_ms)   # True, 38

# Process files from S3
jobs = client.process_from_s3(conn.id, ["invoices/inv-001.pdf", "invoices/inv-002.pdf"])
for job in jobs:
    job.wait()

Job polling

Job.wait() blocks until the pipeline completes or times out.

job = client.process("large-report.pdf").wait(
    timeout=600,       # seconds before TimeoutError (default: 300)
    poll_interval=5,   # polling interval in seconds (default: 2)
)

print(job.status)        # "completed"
print(job.quality_grade) # "A" | "B" | "C" | "D"
print(job.quality_score) # 0.0 – 1.0
print(job.has_dataset)   # True

Dataset operations

ds = job.dataset()          # fetch dataset linked to this job
ds = client.datasets.get("dataset-id")

print(ds.name)              # "contract-2024-q1"
print(ds.row_count)         # 142
print(ds.available_formats) # ["json", "jsonl", "csv", "parquet"]

# Download locally
ds.export("jsonl", path="output.jsonl")

# Push directly to S3
push = ds.export_to_s3(conn.id, "jsonl", prefix="processed/datasets/")
print(push["s3_key"])       # "processed/datasets/contract-2024-q1.jsonl"
print(push["size_bytes"])   # 84320

# Semantic indexing (Pro+)
ds.index()
status = ds.index_status()  # {"status": "ready", "chunks_indexed": 48}

Semantic search (Pro+)

results = client.search(
    "payment terms net 30",
    top_k=10,
    filters={
        "document_type": "invoice",
        "language": "de",
        "quality_grade": "A",
        "pii_masked": True,
    },
)

for r in results:
    print(f"{r.score:.3f}  [{r.dataset_id}]  {r.text[:120]}")

Resources

# Jobs
jobs = client.jobs.list(page=1, page_size=20)
job  = client.jobs.get("job-id")

# Datasets
datasets = client.datasets.list()
ds       = client.datasets.get("dataset-id")

# Usage
usage = client.usage.current()
print(f"{usage.credits_used} / {usage.credits_limit} credits used")
print(f"Plan: {usage.plan}  —  resets {usage.reset_at}")

# Webhooks
client.webhooks.register("https://your-server.com/hook", events=["dataset.ready"])
client.webhooks.list()
client.webhooks.delete("webhook-id")

# Connectors
client.connectors.create("name", "s3", {...})
client.connectors.list()
client.connectors.get("connector-id")
client.connectors.test("connector-id")
client.connectors.delete("connector-id")

Error handling

from flexorch_sdk import (
    FlexOrchClient,
    AuthError,       # 401 — invalid or missing API key
    QuotaError,      # 402 — credit limit reached or trial expired
    RateLimitError,  # 429 — too many requests; has .retry_after (seconds)
    NotFoundError,   # 404
    ValidationError, # 422 — bad request parameters
    ServerError,     # 5xx
    JobFailedError,  # pipeline failed; has .job_id and .failure_reason
    TimeoutError,    # Job.wait() exceeded timeout; has .job_id
)

try:
    job = client.process("doc.pdf").wait(timeout=120)
except AuthError:
    print("Invalid API key — check FLEXORCH_API_KEY")
except QuotaError as e:
    print(f"Out of credits — reset at {e.reset_at}")
except JobFailedError as e:
    print(f"Pipeline failed for job {e.job_id}: {e.failure_reason}")
except TimeoutError as e:
    print(f"Job {e.job_id} still running after timeout — poll manually")

The SDK automatically retries 429 and 5xx responses with exponential backoff (up to 3 attempts by default).

Configuration

client = FlexOrchClient(
    api_key="fx_...",
    base_url="https://api.flexorch.com/v1",  # override for self-hosted
    timeout=60.0,       # HTTP timeout per request in seconds
    max_retries=5,      # retry attempts for transient errors
)

Context manager

with FlexOrchClient() as client:
    job = client.process("report.pdf").wait()
    job.dataset().export("jsonl", path="report.jsonl")
# HTTP connection pool released automatically

Examples

See examples/ for runnable scripts:

File	Description
`basic_process.py`	Process a single document and export as JSONL
`batch_process.py`	Process multiple files with error handling
`s3_import.py`	Import from S3, process, export results back to S3

Development

git clone https://github.com/flexorch/flexorch-sdk
cd flexorch-sdk
pip install -e ".[dev]"
pytest

Tests use respx to mock httpx — no network calls, no API key needed.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexorch_sdk-0.1.0.tar.gz (17.8 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flexorch_sdk-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file flexorch_sdk-0.1.0.tar.gz.

File metadata

Download URL: flexorch_sdk-0.1.0.tar.gz
Upload date: May 24, 2026
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for flexorch_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f754fa86eb575d8b629d243728779a8772ac0f157bb0065de1550935726371ee`
MD5	`404f1d5d442807692f676b907a694737`
BLAKE2b-256	`2c9ad83a39b0c5b2810430fde9d1425c88e721690767c6911a6df7ea47bd9731`

See more details on using hashes here.

File details

Details for the file flexorch_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: flexorch_sdk-0.1.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 18.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for flexorch_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3928cc975ec72b3f583bec8204652c24765f61e5da5f5f325d51694c9fff1de4`
MD5	`4f61fa40fe50169feabaeafea6c5e1f1`
BLAKE2b-256	`86ed6ebfba81ab2787ed13801ccea33b8a75c8d506f4e94a28df8eca1970cf3f`

See more details on using hashes here.

flexorch-sdk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

flexorch-sdk

Install

Quick start

Auth

Supported input formats

Export formats

Processing

Single file

Batch

From S3

Job polling

Dataset operations

Semantic search (Pro+)

Resources

Error handling

Configuration

Context manager

Examples

Development

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes