A Python client library for the Synthetic Data Generation API

These details have not been verified by PyPI

Project links

Project description

Synthetic Data Client

A Python client library for interacting with the Synthetic Data Generation API Framework Synthgen https://github.com/nasirus/synthgen.

Installation

pip install synthgen-client

Features

Async/await support
Type hints and validation using Pydantic
Comprehensive error handling
Streaming support for large exports
Batch operations support
Rich CLI progress displays
Token usage and cost tracking

Quick Start

from synthgen import SynthgenClient
from synthgen.models import Task

# Initialize the client
client = SynthgenClient(
    base_url="https://api.synthgen.example.com",
    api_key="your-api-key"
)


# Example of a task using a local LLM provider
provider = "http://host.docker.internal:11434/v1/chat/completions"
model = "qwen2.5:0.5b"
api_key = "api_key"

# Create a single task
task = Task(
    custom_id="test",
    method="POST",
    url=provider,
    api_key=api_key,
    body={
        "model": model,
        "messages": [{"role": "user", "content": "solve 2x + 4 = 10"}],
    },
)

# Create a batch of tasks
tasks = [task]
for i in range(1, 10):
    tasks.append(Task(
        custom_id=f"task-00{i+1}",
        method="POST",
        url=provider,
        api_key=api_key,
        body={
            "model": model,
            "messages": [{"role": "user", "content": f"solve {i}x + 4 = 10"}],
        }
        )
    )

# Submit and monitor batch processing with cost tracking
results = client.monitor_batch(
    tasks=tasks,
    cost_by_1m_input_token=0.01,
    cost_by_1m_output_token=0.03
)

# Process results
for result in results:
    print(f"Task {result.message_id}: {result.status}")
    if result.body:
        print(f"Generated {len(result.body.get('data', []))} records")

Configuration

The client can be configured in multiple ways:

Environment Variables

# Set these environment variables
export SYNTHGEN_BASE_URL="http://localhost:8002"
export SYNTHGEN_API_KEY="your-api-key"

# Then initialize without parameters
client = SynthgenClient()

Direct Parameters

client = SynthgenClient(
    base_url="http://localhost:8002",
    api_key="your-api-key",
    timeout=3600  # Optional request timeout in seconds
)

Configuration File

You can use a JSON configuration file for easier configuration management:

# config.json
# {
#   "base_url": "http://localhost:8002",
#   "api_key": "your-api-key",
#   "timeout": 3600
# }

client = SynthgenClient(config_file="config.json")

The configuration is loaded in the following order of precedence:

Direct parameters passed to the constructor
Environment variables
Configuration file values

This allows for flexible configuration management across different environments.

Batch Processing

The library provides powerful batch processing capabilities:

# Create a batch of tasks
tasks = [
    Task(
        custom_id="task-001",
        method="POST",
        url=provider,
        api_key=api_key,
        body={
            "model": model,
            "messages": [{"role": "user", "content": "solve 2x + 4 = 10"}],
        },
        dataset="customers",
        use_cache=True,
    ),
    # Add more tasks...
]

# Submit batch and get batch_id
response = client.create_batch(tasks)
batch_id = response.batch_id

# Monitor batch progress with rich UI
results = client.monitor_batch(batch_id=batch_id)

# Or submit and monitor in one step
results = client.monitor_batch(tasks=tasks)

Performance Optimization Options

Two key parameters optimize task execution:

Task(
    # Other parameters...
    use_cache=True,     # Use cached results when available (default: True)
    track_progress=True # Enable detailed progress tracking (default: True)
)

Caching (`use_cache`)

Controls whether to use previously cached results:

True: Reuses results for identical tasks, reducing API calls and costs
False: Always executes fresh requests, ensuring up-to-date responses

# Check if results came from cache
if task_result.cached:
    print("Retrieved from cache")

Progress Tracking (`track_progress`)

Controls the level of execution monitoring:

True: Provides detailed metrics (tokens, duration, status updates)
False: Minimal tracking for improved performance

Usage Examples

# Optimize for speed with caching
task_cached = Task(custom_id="cached", use_cache=True, track_progress=False, ...)

# Ensure fresh results with metrics
task_fresh = Task(custom_id="fresh", use_cache=False, track_progress=True, ...)

# Mixed batch processing
results = client.monitor_batch(tasks=[task_cached, task_fresh])

Health Checks

# Check system health
health = client.check_health()
print(f"System status: {health.status}")
print(f"API: {health.services.api}")
print(f"RabbitMQ: {health.services.rabbitmq}")
print(f"Elasticsearch: {health.services.elasticsearch}")
print(f"Queue consumers: {health.services.queue_consumers}")

Task Management

# Get task by ID
task = client.get_task("task-message-id")
print(f"Task status: {task.status}")
print(f"Completion time: {task.completed_at}")

# Delete a task
client.delete_task("task-message-id")

Batch Management

# Get all batches
batches = client.get_batches()
print(f"Total batches: {batches.total}")

# Get specific batch
batch = client.get_batch("batch-id")
print(f"Completed tasks: {batch.completed_tasks}/{batch.total_tasks}")
print(f"Token usage: {batch.total_tokens}")

# Get all tasks in a batch
tasks = client.get_batch_tasks("batch-id")

# Get only failed tasks
from synthgen.models import TaskStatus
failed_tasks = client.get_batch_tasks("batch-id", task_status=TaskStatus.FAILED)

# Delete a batch
client.delete_batch("batch-id")

Time-Series Batch Statistics

The client provides detailed time-series statistics for monitoring batch performance over time:

# Get time-series statistics for a batch
stats = client.get_batch_stats(
    batch_id="batch-id",
    time_range="24h",            # Time range to analyze (e.g., "5m", "2h", "7d")
    interval=CalendarInterval.HOUR_SHORT  # Time bucket size
)

# Access time series data points
for point in stats.time_series:
    print(f"Timestamp: {point.timestamp}")
    print(f"Completed tasks: {point.completed_tasks}")
    print(f"Total tokens: {point.total_tokens}")
    print(f"Avg response time: {point.avg_duration_ms}ms")
    print(f"Throughput: {point.tokens_per_second} tokens/sec")

# Access summary statistics
summary = stats.summary
print(f"Total tasks: {summary.total_tasks}")
print(f"Cache hit rate: {summary.cache_hit_rate:.2%}")
print(f"Average response time: {summary.average_response_time}ms")
print(f"Overall throughput: {summary.tokens_per_second} tokens/sec")

The interval parameter supports various Elasticsearch calendar intervals:

MINUTE_SHORT / "1m": One minute interval
HOUR_SHORT / "1h": One hour interval
DAY_SHORT / "1d": One day interval
WEEK_SHORT / "1w": One week interval
MONTH_SHORT / "1M": One month interval

This data is useful for:

Monitoring system performance trends over time
Analyzing throughput patterns
Identifying processing bottlenecks
Evaluating cache efficiency

Context Manager Support

The client supports the context manager protocol for automatic resource cleanup:

with SynthgenClient() as client:
    health = client.check_health()
    # Client will be automatically closed when exiting the with block

Error Handling

The client provides robust error handling with automatic retries:

from synthgen.exceptions import APIError

try:
    result = client.get_task("non-existent-id")
except APIError as e:
    print(f"API Error: {e.message}")
    print(f"Status code: {e.status_code}")
    if e.response:
        print(f"Response: {e.response.text}")

Monitoring Existing Batches

# Monitor an existing batch
results = client.monitor_batch(
    batch_id="existing-batch-id",
    cost_by_1m_input_token=0.01,
    cost_by_1m_output_token=0.03
)

Customizing Batch Creation

# Create batch with custom chunk size for large batches
response = client.create_batch(tasks, chunk_size=500)

Token Usage Tracking and Cost Calculation

The client provides detailed token usage statistics and cost calculation capabilities for batches:

# Process a batch with cost tracking
results = client.monitor_batch(
    tasks=tasks,
    cost_by_1m_input_token=0.01,  # Cost per million input tokens
    cost_by_1m_output_token=0.03  # Cost per million output tokens
)

# Retrieve batch statistics
batch = client.get_batch(batch_id)
print(f"Input tokens: {batch.prompt_tokens:,}")
print(f"Output tokens: {batch.completion_tokens:,}")
print(f"Total tokens: {batch.total_tokens:,}")

This allows for real-time cost estimation and budget tracking when using pay-per-token LLM services.

Resilient Error Handling and Auto-Retry

The client implements sophisticated error handling with automatic retries for transient network issues:

# The client automatically handles retries with exponential backoff
# Max retries and other parameters are configurable
try:
    result = client.get_task("task-id")
except APIError as e:
    if e.status_code == 404:
        print("Task not found")
    elif e.status_code == 401:
        print("Authentication failed - check your API key")
    else:
        print(f"An error occurred: {str(e)}")

Requirements

Python 3.8+
httpx>=0.24.0
pydantic>=2.0.0
rich (for progress displays)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.7

Mar 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthgen_client-0.0.7.tar.gz (15.2 kB view details)

Uploaded Mar 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

synthgen_client-0.0.7-py3-none-any.whl (15.1 kB view details)

Uploaded Mar 12, 2025 Python 3

File details

Details for the file synthgen_client-0.0.7.tar.gz.

File metadata

Download URL: synthgen_client-0.0.7.tar.gz
Upload date: Mar 12, 2025
Size: 15.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for synthgen_client-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`dba5dca985d02a289e343f4439e25e5d66e9350f7f95369d02dac513e3069a02`
MD5	`e514e6a8b9c055d1ec4499861416368a`
BLAKE2b-256	`e7812800f89b8072b0ab199b2833c39c712e7c2ec3c92d376ebaa9731a16859b`

See more details on using hashes here.

File details

Details for the file synthgen_client-0.0.7-py3-none-any.whl.

File metadata

Download URL: synthgen_client-0.0.7-py3-none-any.whl
Upload date: Mar 12, 2025
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for synthgen_client-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b085cbefdac734556b8b14a395afd57fee2731ca73da737fad31477efe3cc6d5`
MD5	`84e6742371a9635dcf2106b3092c9b3c`
BLAKE2b-256	`3723739706db577e9a2d578a81f2900ee5121bd407b801d39855d80357ad6ee1`

See more details on using hashes here.

synthgen-client 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Synthetic Data Client

Installation

Features

Quick Start

Configuration

Environment Variables

Direct Parameters

Configuration File

Batch Processing

Performance Optimization Options

Caching (use_cache)

Progress Tracking (track_progress)

Usage Examples

Health Checks

Task Management

Batch Management

Time-Series Batch Statistics

Context Manager Support

Error Handling

Monitoring Existing Batches

Customizing Batch Creation

Token Usage Tracking and Cost Calculation

Resilient Error Handling and Auto-Retry

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Caching (`use_cache`)

Progress Tracking (`track_progress`)