A Python client library for the Synthetic Data Generation API
Project description
Synthetic Data Client
A Python client library for interacting with the Synthetic Data Generation API Framework Synthgen https://github.com/nasirus/synthgen.
Installation
pip install synthgen-client
Features
- Async/await support
- Type hints and validation using Pydantic
- Comprehensive error handling
- Streaming support for large exports
- Batch operations support
- Rich CLI progress displays
- Token usage and cost tracking
Quick Start
from synthgen import SynthgenClient
from synthgen.models import Task
# Initialize the client
client = SynthgenClient(
base_url="https://api.synthgen.example.com",
api_key="your-api-key"
)
# Example of a task using a local LLM provider
provider = "http://host.docker.internal:11434/v1/chat/completions"
model = "qwen2.5:0.5b"
api_key = "api_key"
# Create a single task
task = Task(
custom_id="test",
method="POST",
url=provider,
api_key=api_key,
body={
"model": model,
"messages": [{"role": "user", "content": "solve 2x + 4 = 10"}],
},
)
# Create a batch of tasks
tasks = [task]
for i in range(1, 10):
tasks.append(Task(
custom_id=f"task-00{i+1}",
method="POST",
url=provider,
api_key=api_key,
body={
"model": model,
"messages": [{"role": "user", "content": f"solve {i}x + 4 = 10"}],
}
)
)
# Submit and monitor batch processing with cost tracking
results = client.monitor_batch(
tasks=tasks,
cost_by_1m_input_token=0.01,
cost_by_1m_output_token=0.03
)
# Process results
for result in results:
print(f"Task {result.message_id}: {result.status}")
if result.body:
print(f"Generated {len(result.body.get('data', []))} records")
Configuration
The client can be configured in multiple ways:
Environment Variables
# Set these environment variables
export SYNTHGEN_BASE_URL="http://localhost:8002"
export SYNTHGEN_API_KEY="your-api-key"
# Then initialize without parameters
client = SynthgenClient()
Direct Parameters
client = SynthgenClient(
base_url="http://localhost:8002",
api_key="your-api-key",
timeout=3600 # Optional request timeout in seconds
)
Configuration File
You can use a JSON configuration file for easier configuration management:
# config.json
# {
# "base_url": "http://localhost:8002",
# "api_key": "your-api-key",
# "timeout": 3600
# }
client = SynthgenClient(config_file="config.json")
The configuration is loaded in the following order of precedence:
- Direct parameters passed to the constructor
- Environment variables
- Configuration file values
This allows for flexible configuration management across different environments.
Batch Processing
The library provides powerful batch processing capabilities:
# Create a batch of tasks
tasks = [
Task(
custom_id="task-001",
method="POST",
url=provider,
api_key=api_key,
body={
"model": model,
"messages": [{"role": "user", "content": "solve 2x + 4 = 10"}],
},
dataset="customers",
use_cache=True,
),
# Add more tasks...
]
# Submit batch and get batch_id
response = client.create_batch(tasks)
batch_id = response.batch_id
# Monitor batch progress with rich UI
results = client.monitor_batch(batch_id=batch_id)
# Or submit and monitor in one step
results = client.monitor_batch(tasks=tasks)
Performance Optimization Options
Two key parameters optimize task execution:
Task(
# Other parameters...
use_cache=True, # Use cached results when available (default: True)
track_progress=True # Enable detailed progress tracking (default: True)
)
Caching (use_cache)
Controls whether to use previously cached results:
- True: Reuses results for identical tasks, reducing API calls and costs
- False: Always executes fresh requests, ensuring up-to-date responses
# Check if results came from cache
if task_result.cached:
print("Retrieved from cache")
Progress Tracking (track_progress)
Controls the level of execution monitoring:
- True: Provides detailed metrics (tokens, duration, status updates)
- False: Minimal tracking for improved performance
Usage Examples
# Optimize for speed with caching
task_cached = Task(custom_id="cached", use_cache=True, track_progress=False, ...)
# Ensure fresh results with metrics
task_fresh = Task(custom_id="fresh", use_cache=False, track_progress=True, ...)
# Mixed batch processing
results = client.monitor_batch(tasks=[task_cached, task_fresh])
Health Checks
# Check system health
health = client.check_health()
print(f"System status: {health.status}")
print(f"API: {health.services.api}")
print(f"RabbitMQ: {health.services.rabbitmq}")
print(f"Elasticsearch: {health.services.elasticsearch}")
print(f"Queue consumers: {health.services.queue_consumers}")
Task Management
# Get task by ID
task = client.get_task("task-message-id")
print(f"Task status: {task.status}")
print(f"Completion time: {task.completed_at}")
# Delete a task
client.delete_task("task-message-id")
Batch Management
# Get all batches
batches = client.get_batches()
print(f"Total batches: {batches.total}")
# Get specific batch
batch = client.get_batch("batch-id")
print(f"Completed tasks: {batch.completed_tasks}/{batch.total_tasks}")
print(f"Token usage: {batch.total_tokens}")
# Get all tasks in a batch
tasks = client.get_batch_tasks("batch-id")
# Get only failed tasks
from synthgen.models import TaskStatus
failed_tasks = client.get_batch_tasks("batch-id", task_status=TaskStatus.FAILED)
# Delete a batch
client.delete_batch("batch-id")
Time-Series Batch Statistics
The client provides detailed time-series statistics for monitoring batch performance over time:
# Get time-series statistics for a batch
stats = client.get_batch_stats(
batch_id="batch-id",
time_range="24h", # Time range to analyze (e.g., "5m", "2h", "7d")
interval=CalendarInterval.HOUR_SHORT # Time bucket size
)
# Access time series data points
for point in stats.time_series:
print(f"Timestamp: {point.timestamp}")
print(f"Completed tasks: {point.completed_tasks}")
print(f"Total tokens: {point.total_tokens}")
print(f"Avg response time: {point.avg_duration_ms}ms")
print(f"Throughput: {point.tokens_per_second} tokens/sec")
# Access summary statistics
summary = stats.summary
print(f"Total tasks: {summary.total_tasks}")
print(f"Cache hit rate: {summary.cache_hit_rate:.2%}")
print(f"Average response time: {summary.average_response_time}ms")
print(f"Overall throughput: {summary.tokens_per_second} tokens/sec")
The interval parameter supports various Elasticsearch calendar intervals:
MINUTE_SHORT/"1m": One minute intervalHOUR_SHORT/"1h": One hour intervalDAY_SHORT/"1d": One day intervalWEEK_SHORT/"1w": One week intervalMONTH_SHORT/"1M": One month interval
This data is useful for:
- Monitoring system performance trends over time
- Analyzing throughput patterns
- Identifying processing bottlenecks
- Evaluating cache efficiency
Context Manager Support
The client supports the context manager protocol for automatic resource cleanup:
with SynthgenClient() as client:
health = client.check_health()
# Client will be automatically closed when exiting the with block
Error Handling
The client provides robust error handling with automatic retries:
from synthgen.exceptions import APIError
try:
result = client.get_task("non-existent-id")
except APIError as e:
print(f"API Error: {e.message}")
print(f"Status code: {e.status_code}")
if e.response:
print(f"Response: {e.response.text}")
Monitoring Existing Batches
# Monitor an existing batch
results = client.monitor_batch(
batch_id="existing-batch-id",
cost_by_1m_input_token=0.01,
cost_by_1m_output_token=0.03
)
Customizing Batch Creation
# Create batch with custom chunk size for large batches
response = client.create_batch(tasks, chunk_size=500)
Token Usage Tracking and Cost Calculation
The client provides detailed token usage statistics and cost calculation capabilities for batches:
# Process a batch with cost tracking
results = client.monitor_batch(
tasks=tasks,
cost_by_1m_input_token=0.01, # Cost per million input tokens
cost_by_1m_output_token=0.03 # Cost per million output tokens
)
# Retrieve batch statistics
batch = client.get_batch(batch_id)
print(f"Input tokens: {batch.prompt_tokens:,}")
print(f"Output tokens: {batch.completion_tokens:,}")
print(f"Total tokens: {batch.total_tokens:,}")
This allows for real-time cost estimation and budget tracking when using pay-per-token LLM services.
Resilient Error Handling and Auto-Retry
The client implements sophisticated error handling with automatic retries for transient network issues:
# The client automatically handles retries with exponential backoff
# Max retries and other parameters are configurable
try:
result = client.get_task("task-id")
except APIError as e:
if e.status_code == 404:
print("Task not found")
elif e.status_code == 401:
print("Authentication failed - check your API key")
else:
print(f"An error occurred: {str(e)}")
Requirements
- Python 3.8+
- httpx>=0.24.0
- pydantic>=2.0.0
- rich (for progress displays)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthgen_client-0.0.7.tar.gz.
File metadata
- Download URL: synthgen_client-0.0.7.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dba5dca985d02a289e343f4439e25e5d66e9350f7f95369d02dac513e3069a02
|
|
| MD5 |
e514e6a8b9c055d1ec4499861416368a
|
|
| BLAKE2b-256 |
e7812800f89b8072b0ab199b2833c39c712e7c2ec3c92d376ebaa9731a16859b
|
File details
Details for the file synthgen_client-0.0.7-py3-none-any.whl.
File metadata
- Download URL: synthgen_client-0.0.7-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b085cbefdac734556b8b14a395afd57fee2731ca73da737fad31477efe3cc6d5
|
|
| MD5 |
84e6742371a9635dcf2106b3092c9b3c
|
|
| BLAKE2b-256 |
3723739706db577e9a2d578a81f2900ee5121bd407b801d39855d80357ad6ee1
|