A flexible synthetic data generation service
Project description
GlassGen
GlassGen is a flexible synthetic data generation service that can generate data based on user-defined schemas and send it to various destinations.
Features
- Generate synthetic data based on custom schemas
- Multiple output formats (CSV, Kafka, Webhook)
- Configurable generation rate
- Extensible sink architecture
- CLI and Python SDK interfaces
Installation
pip install glassgen
Local Development Installation
- Clone the repository:
git clone https://github.com/glassflow/glassgen.git
cd glassgen
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
- Install the package in development mode:
pip install -e .
- Install development dependencies:
pip install -r requirements-dev.txt
- Run tests to verify installation:
pytest
Usage
Basic Usage
import glassgen
import json
# Load configuration from file
with open("config.json") as f:
config = json.load(f)
# Start the generator
glassgen.generate(config=config)
Configuration File Format
{
"schema": {
"field1": "$generator_type",
"field2": "$generator_type(param1, param2)"
},
"sink": {
"type": "csv|kafka|webhook",
"params": {
// sink-specific parameters
}
},
"generator": {
"rps": 1000, // records per second
"num_records": 5000 // total number of records to generate
}
}
Supported Sinks
CSV Sink
{
"sink": {
"type": "csv",
"params": {
"path": "output.csv"
}
}
}
WebHook Sink
{
"sink": {
"type": "webhook",
"params": {
"url": "https://your-webhook-url.com",
"headers": {
"Authorization": "Bearer your-token",
"Custom-Header": "value"
},
"timeout": 30 // optional, defaults to 30 seconds
}
}
}
Kafka Sink
GlassGen supports multiple Kafka sink types:
- Confluent Cloud
{
"sink": {
"type": "kafka.confluent",
"params": {
"bootstrap_servers": "your-confluent-bootstrap-server",
"topic": "topic_name",
"security_protocol": "SASL_SSL",
"sasl_mechanism": "PLAIN",
"sasl_plain_username": "your-api-key",
"sasl_plain_password": "your-api-secret"
}
}
}
- Aiven Kafka
{
"sink": {
"type": "kafka.aiven",
"params": {
"bootstrap_servers": "your-aiven-bootstrap-server",
"topic": "topic_name",
"security_protocol": "SASL_SSL",
"sasl.mechanisms": "SCRAM-SHA-256",
"ssl_cafile": "path/to/ca.pem"
}
}
}
Supported Schema Generators
Basic Types
$string: Random string$int: Random integer$intrange(min,max): Random integer within specified range (e.g.,$intrange(1,100)for numbers between 1 and 100)$choice(value1,value2,...): Randomly picks one value from the provided list (e.g.,$choice(red,blue,green)or$choice(1,2,3,4,5))$datetime: Current timestamp in ISO format (e.g., "2024-03-15T14:30:45.123456")$timestamp: Current Unix timestamp in seconds since epoch (e.g., 1710503445)$boolean: Random boolean value$uuid: Random UUID$uuid4: Random UUID4
Personal Information
$name: Random full name$email: Random email address$company_email: Random company email$user_name: Random username$password: Random password$phone_number: Random phone number$ssn: Random Social Security Number
Location
$country: Random country name$city: Random city name$address: Random street address$zipcode: Random zip code
Business
$company: Random company name$job: Random job title$url: Random URL
Other
$text: Random text paragraph$ipv4: Random IPv4 address$currency_name: Random currency name$color_name: Random color name
Example Configuration
{
"schema": {
"name": "$name",
"email": "$email",
"country": "$country",
"id": "$uuid",
"address": "$address",
"phone": "$phone_number",
"job": "$job",
"company": "$company"
},
"sink": {
"type": "webhook",
"params": {
"url": "https://api.example.com/webhook",
"headers": {
"Authorization": "Bearer your-token"
}
}
},
"generator": {
"rps": 1500,
"num_records": 5000
}
}
Creating a New Release
To create a new release:
- Make sure you have the release script installed:
pip install -e .
- Run the release script with the new version:
./scripts/release.py release 0.1.1
This will:
- Update the version in pyproject.toml
- Create a git tag
- Push the changes
- Trigger the GitHub Actions workflow to:
- Build the package
- Publish to PyPI
- Create a GitHub release
The version must follow semantic versioning (X.Y.Z format).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glassgen-0.1.2.tar.gz.
File metadata
- Download URL: glassgen-0.1.2.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dd1365c7ed9fd1ab2af6605e29ee414ac44c76bba901de563002ebd9ea902cb
|
|
| MD5 |
d8d99b0f21eaa5867e085e6e5b838ef2
|
|
| BLAKE2b-256 |
16fa30d0c99f0a14b4621d64b4017e68cee6a24f819100c28912144d02f29251
|
File details
Details for the file glassgen-0.1.2-py3-none-any.whl.
File metadata
- Download URL: glassgen-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24b9c304a946a772f7ddad4c6bc50729deff06ffed6af582ec213ce2f645af10
|
|
| MD5 |
ca41a7213862fa1c0e441740639b0a40
|
|
| BLAKE2b-256 |
5465d1dae71953143db4b208f74fa73676b21d9bde941b5bdc2e6509fceeed27
|