Python SDK for the SymageDocs synthetic data API
Project description
SymageDocs Python SDK
Generate synthetic documents, identities, and tabular datasets for testing, ML training, and compliance.
Installation
pip install symagedocs
For progress bars during long jobs:
pip install symagedocs[progress]
Quick Start
from symagedocs import Client
client = Client(api_key="sk_live_...")
# List available forms
forms = client.forms.list()
for f in forms:
print(f"{f.id}: {f.name} ({f.credit_cost} credits)")
# Generate 100 W-2 documents
job = client.generate.create(
"irs_w2_2024",
quantity=100,
output_formats=["pdf_typed", "json"],
)
result = client.generate.wait(job.job_id) # polls until complete
client.generate.download(job.job_id, "pdf_typed", "./w2_documents.zip")
# Batch generation with token budget
batch = client.batches.create(
"Training Data",
"irs_w2_2024",
token_budget=5000,
output_formats=["pdf_typed", "json"],
)
gen = client.batches.generate(batch.batch_id, quantity=10)
for item_id in gen.item_ids:
files = client.batches.download_urls(batch.batch_id, item_id)
for f in files:
print(f"{f.filename}: {f.url}") # presigned S3 URLs
# Generate tabular data from a description
schema = client.tabular.parse("name, age, SSN, city, state, annual income")
tab_job = client.tabular.generate(columns=schema.columns, quantity=5000)
client.tabular.wait(tab_job.job_id)
client.tabular.download(tab_job.job_id, "csv", "./dataset.csv")
# Check credit balance
balance = client.account.balance()
print(f"Credits used: {balance.credits_used}")
Authentication
Get your API key at symagedocs.ai/account?tab=api.
# Pass directly
client = Client(api_key="sk_live_...")
# Or set environment variable
# export SYMAGEDOCS_API_KEY=sk_live_...
client = Client() # reads from env
Async Support
from symagedocs import AsyncClient
async with AsyncClient(api_key="sk_live_...") as client:
forms = await client.forms.list()
job = await client.generate.create("irs_w2_2024", quantity=10)
result = await client.generate.wait(job.job_id)
Configuration
client = Client(
api_key="sk_live_...",
base_url="https://symagedocs.ai", # custom server
timeout=30.0, # request timeout (seconds)
max_retries=3, # retry on 429/5xx
)
Method Reference
Forms
| Method | Description |
|---|---|
forms.list(category=None) |
List available forms, optionally filtered by category |
forms.get(form_id) |
Get detailed form info including field definitions |
Generation
| Method | Description |
|---|---|
generate.create(form_id, quantity=1, output_formats=["pdf_typed"], config=None, seed=None) |
Create an async generation job |
generate.list_jobs(limit=50, cursor=None, status=None) |
List generation jobs (cursor-paginated) |
generate.get_job(job_id) |
Get full job status and progress |
generate.download(job_id, format, path) |
Download job output to a local file |
generate.wait(job_id, poll_interval=3.0) |
Poll until job completes or fails |
Identities
| Method | Description |
|---|---|
identities.generate(quantity=1, config=None, seed=None) |
Generate raw synthetic identities as JSON |
Batches
| Method | Description |
|---|---|
batches.create(name, form_id, token_budget=None, output_formats=["pdf_typed"], config=None, label_scheme=None) |
Create a batch with optional token budget |
batches.list(limit=50, cursor=None) |
List batches (cursor-paginated) |
batches.get(batch_id) |
Get batch status and details |
batches.generate(batch_id, quantity=1, seed=None, webhook_url=None) |
Generate items within a batch |
batches.list_items(batch_id, limit=50, cursor=None) |
List batch items (cursor-paginated) |
batches.download_urls(batch_id, item_id) |
Get presigned S3 URLs for item files |
batches.get_bio_labels(batch_id, item_id) |
Get BIO-tagged token annotations (ML training) |
batches.get_word_annotations(batch_id, item_id) |
Get word-level spatial annotations (ML training) |
batches.iter_training_examples(batch_id) |
Iterate all items as training examples with images, BIO labels, and word annotations |
batches.wait(batch_id, poll_interval=3.0) |
Poll until batch is exhausted or revoked |
Tabular
| Method | Description |
|---|---|
tabular.parse(prompt) |
Convert natural language to a column schema (LLM-powered) |
tabular.generate(columns, quantity=100, output_formats=["csv"], seed=None) |
Create a tabular generation job |
tabular.status(job_id) |
Get tabular job progress and ETA |
tabular.download(job_id, format, path) |
Download tabular output to a local file |
tabular.wait(job_id, poll_interval=2.0) |
Poll until tabular job completes or fails |
Account
| Method | Description |
|---|---|
account.balance() |
Get credit balance (credits_used, credits_allocated) |
account.usage(days=30) |
Get usage summary for the specified period |
Error Handling
The SDK raises typed exceptions for API errors and retries automatically on 429 and 5xx:
from symagedocs import Client, AuthenticationError, RateLimitError, NotFoundError
try:
forms = client.forms.list()
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Too many requests — SDK retries automatically")
except NotFoundError:
print("Resource not found")
All error classes:
| Exception | HTTP Code | Description |
|---|---|---|
SymageDocsError |
— | Base exception for all SDK errors |
AuthenticationError |
401 | Invalid or revoked API key |
PermissionDeniedError |
403 | Key missing required scope |
NotFoundError |
404 | Resource not found |
ValidationError |
400 | Invalid request parameters |
InsufficientCreditsError |
402 | Not enough credits for the operation |
ConflictError |
409 | Resource in unexpected state (e.g., downloading incomplete job) |
RateLimitError |
429 | Rate limit exceeded (SDK retries automatically) |
ServerError |
5xx | Server-side error (SDK retries automatically) |
Examples
See examples/ for complete working scripts:
list_forms.py— Browse available forms and credit costsgenerate_w2s.py— Full pipeline: create job, wait, download PDF + JSONtabular_dataset.py— Parse NL description, generate 5k rows, download CSVtrain_kie_model.py— Create batch with NIST3 labels, iterate training examples with BIO labels and spatial annotations
Documentation
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file symagedocs-1.0.1.tar.gz.
File metadata
- Download URL: symagedocs-1.0.1.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82e63792ba5765a0fa215e50f5fe9139962b9bc5b7eea557098b04a8abbfb986
|
|
| MD5 |
379d6be044dc4d2ca5659504221d3303
|
|
| BLAKE2b-256 |
cdb9d9ec51f8705a7109ad3e448cdda0d0f46bf6df2b07dab956e1341562fd33
|
File details
Details for the file symagedocs-1.0.1-py3-none-any.whl.
File metadata
- Download URL: symagedocs-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c484a754e82763939aaa160130b38739a40aafeae33f3de729ce1a9f6301beb
|
|
| MD5 |
fef1601a10efae99d78ccd79378f2106
|
|
| BLAKE2b-256 |
70948239606c67819877c6d1b406538b850bd06be818326f673241252c0973cf
|