Skip to main content

Python SDK for the SymageDocs synthetic data API

Project description

SymageDocs Python SDK

Generate synthetic documents, identities, and tabular datasets for testing, ML training, and compliance.

Installation

pip install symagedocs

For progress bars during long jobs:

pip install symagedocs[progress]

Quick Start

from symagedocs import Client

client = Client(api_key="sk_live_...")

# List available forms
forms = client.forms.list()
for f in forms:
    print(f"{f.id}: {f.name} ({f.credit_cost} credits)")

# Generate 100 W-2 documents
job = client.generate.create(
    "irs_w2_2024",
    quantity=100,
    output_formats=["pdf_typed", "json"],
)
result = client.generate.wait(job.job_id)  # polls until complete
client.generate.download(job.job_id, "pdf_typed", "./w2_documents.zip")

# Batch generation with token budget
batch = client.batches.create(
    "Training Data",
    "irs_w2_2024",
    token_budget=5000,
    output_formats=["pdf_typed", "json"],
)
gen = client.batches.generate(batch.batch_id, quantity=10)
for item_id in gen.item_ids:
    files = client.batches.download_urls(batch.batch_id, item_id)
    for f in files:
        print(f"{f.filename}: {f.url}")  # presigned S3 URLs

# Generate tabular data from a description
schema = client.tabular.parse("name, age, SSN, city, state, annual income")
tab_job = client.tabular.generate(columns=schema.columns, quantity=5000)
client.tabular.wait(tab_job.job_id)
client.tabular.download(tab_job.job_id, "csv", "./dataset.csv")

# Check credit balance
balance = client.account.balance()
print(f"Credits used: {balance.credits_used}")

Authentication

Get your API key at symagedocs.ai/account?tab=api.

# Pass directly
client = Client(api_key="sk_live_...")

# Or set environment variable
# export SYMAGEDOCS_API_KEY=sk_live_...
client = Client()  # reads from env

Async Support

from symagedocs import AsyncClient

async with AsyncClient(api_key="sk_live_...") as client:
    forms = await client.forms.list()
    job = await client.generate.create("irs_w2_2024", quantity=10)
    result = await client.generate.wait(job.job_id)

Configuration

client = Client(
    api_key="sk_live_...",
    base_url="https://symagedocs.ai",  # custom server
    timeout=30.0,                       # request timeout (seconds)
    max_retries=3,                      # retry on 429/5xx
)

Method Reference

Forms

Method Description
forms.list(category=None) List available forms, optionally filtered by category
forms.get(form_id) Get detailed form info including field definitions

Generation

Method Description
generate.create(form_id, quantity=1, output_formats=["pdf_typed"], config=None, seed=None) Create an async generation job
generate.list_jobs(limit=50, cursor=None, status=None) List generation jobs (cursor-paginated)
generate.get_job(job_id) Get full job status and progress
generate.download(job_id, format, path) Download job output to a local file
generate.wait(job_id, poll_interval=3.0) Poll until job completes or fails

Identities

Method Description
identities.generate(quantity=1, config=None, seed=None) Generate raw synthetic identities as JSON

Batches

Method Description
batches.create(name, form_id, token_budget=None, output_formats=["pdf_typed"], config=None, label_scheme=None) Create a batch with optional token budget
batches.list(limit=50, cursor=None) List batches (cursor-paginated)
batches.get(batch_id) Get batch status and details
batches.generate(batch_id, quantity=1, seed=None, webhook_url=None) Generate items within a batch
batches.list_items(batch_id, limit=50, cursor=None) List batch items (cursor-paginated)
batches.download_urls(batch_id, item_id) Get presigned S3 URLs for item files
batches.get_bio_labels(batch_id, item_id) Get BIO-tagged token annotations (ML training)
batches.get_word_annotations(batch_id, item_id) Get word-level spatial annotations (ML training)
batches.iter_training_examples(batch_id) Iterate all items as training examples with images, BIO labels, and word annotations
batches.wait(batch_id, poll_interval=3.0) Poll until batch is exhausted or revoked

Tabular

Method Description
tabular.parse(prompt) Convert natural language to a column schema (LLM-powered)
tabular.generate(columns, quantity=100, output_formats=["csv"], seed=None) Create a tabular generation job
tabular.status(job_id) Get tabular job progress and ETA
tabular.download(job_id, format, path) Download tabular output to a local file
tabular.wait(job_id, poll_interval=2.0) Poll until tabular job completes or fails

Account

Method Description
account.balance() Get credit balance (credits_used, credits_allocated)
account.usage(days=30) Get usage summary for the specified period

Error Handling

The SDK raises typed exceptions for API errors and retries automatically on 429 and 5xx:

from symagedocs import Client, AuthenticationError, RateLimitError, NotFoundError

try:
    forms = client.forms.list()
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Too many requests — SDK retries automatically")
except NotFoundError:
    print("Resource not found")

All error classes:

Exception HTTP Code Description
SymageDocsError Base exception for all SDK errors
AuthenticationError 401 Invalid or revoked API key
PermissionDeniedError 403 Key missing required scope
NotFoundError 404 Resource not found
ValidationError 400 Invalid request parameters
InsufficientCreditsError 402 Not enough credits for the operation
ConflictError 409 Resource in unexpected state (e.g., downloading incomplete job)
RateLimitError 429 Rate limit exceeded (SDK retries automatically)
ServerError 5xx Server-side error (SDK retries automatically)

Examples

See examples/ for complete working scripts:

  • list_forms.py — Browse available forms and credit costs
  • generate_w2s.py — Full pipeline: create job, wait, download PDF + JSON
  • tabular_dataset.py — Parse NL description, generate 5k rows, download CSV
  • train_kie_model.py — Create batch with NIST3 labels, iterate training examples with BIO labels and spatial annotations

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symagedocs-1.0.1.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symagedocs-1.0.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file symagedocs-1.0.1.tar.gz.

File metadata

  • Download URL: symagedocs-1.0.1.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for symagedocs-1.0.1.tar.gz
Algorithm Hash digest
SHA256 82e63792ba5765a0fa215e50f5fe9139962b9bc5b7eea557098b04a8abbfb986
MD5 379d6be044dc4d2ca5659504221d3303
BLAKE2b-256 cdb9d9ec51f8705a7109ad3e448cdda0d0f46bf6df2b07dab956e1341562fd33

See more details on using hashes here.

File details

Details for the file symagedocs-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: symagedocs-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for symagedocs-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0c484a754e82763939aaa160130b38739a40aafeae33f3de729ce1a9f6301beb
MD5 fef1601a10efae99d78ccd79378f2106
BLAKE2b-256 70948239606c67819877c6d1b406538b850bd06be818326f673241252c0973cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page