Skip to main content

A flexible synthetic data generation service

Project description

GlassGen

GlassGen is a flexible synthetic data generation service that can generate data based on user-defined schemas and send it to various destinations.

Features

  • Generate synthetic data based on custom schemas
  • Multiple output formats (CSV, Kafka, Webhook)
  • Configurable generation rate
  • Extensible sink architecture
  • CLI and Python SDK interfaces

Installation

pip install glassgen

Local Development Installation

  1. Clone the repository:
git clone https://github.com/glassflow/glassgen.git
cd glassgen
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate
  1. Install the package in development mode:
pip install -e .
  1. Install development dependencies:
pip install -r requirements-dev.txt
  1. Run tests to verify installation:
pytest

Usage

Basic Usage

import glassgen
import json

# Load configuration from file
with open("config.json") as f:
    config = json.load(f)

# Start the generator
glassgen.generate(config=config)

Configuration File Format

{
    "schema": {
        "field1": "$generator_type",
        "field2": "$generator_type(param1, param2)"
    },
    "sink": {
        "type": "csv|kafka|webhook",
        "params": {
            // sink-specific parameters
        }
    },
    "generator": {
        "rps": 1000,  // records per second
        "num_records": 5000  // total number of records to generate
    }
}

Supported Sinks

CSV Sink

{
    "sink": {
        "type": "csv",
        "params": {
            "path": "output.csv"
        }
    }
}

WebHook Sink

{
    "sink": {
        "type": "webhook",
        "params": {
            "url": "https://your-webhook-url.com",
            "headers": {
                "Authorization": "Bearer your-token",
                "Custom-Header": "value"
            },
            "timeout": 30  // optional, defaults to 30 seconds
        }
    }
}

Kafka Sink

GlassGen supports multiple Kafka sink types:

  1. Confluent Cloud
{
    "sink": {
        "type": "kafka.confluent",
        "params": {
            "bootstrap_servers": "your-confluent-bootstrap-server",
            "topic": "topic_name",
            "security_protocol": "SASL_SSL",
            "sasl_mechanism": "PLAIN",
            "sasl_plain_username": "your-api-key",
            "sasl_plain_password": "your-api-secret"
        }
    }
}
  1. Aiven Kafka
{
    "sink": {
        "type": "kafka.aiven",
        "params": {
            "bootstrap_servers": "your-aiven-bootstrap-server",
            "topic": "topic_name",
            "security_protocol": "SASL_SSL",
            "sasl.mechanisms": "SCRAM-SHA-256",
            "ssl_cafile": "path/to/ca.pem"
        }
    }
}

Supported Schema Generators

Basic Types

  • $string: Random string
  • $int: Random integer
  • $intrange(min,max): Random integer within specified range (e.g., $intrange(1,100) for numbers between 1 and 100)
  • $choice(value1,value2,...): Randomly picks one value from the provided list (e.g., $choice(red,blue,green) or $choice(1,2,3,4,5))
  • $datetime: Current timestamp in ISO format (e.g., "2024-03-15T14:30:45.123456")
  • $timestamp: Current Unix timestamp in seconds since epoch (e.g., 1710503445)
  • $boolean: Random boolean value
  • $uuid: Random UUID
  • $uuid4: Random UUID4

Personal Information

  • $name: Random full name
  • $email: Random email address
  • $company_email: Random company email
  • $user_name: Random username
  • $password: Random password
  • $phone_number: Random phone number
  • $ssn: Random Social Security Number

Location

  • $country: Random country name
  • $city: Random city name
  • $address: Random street address
  • $zipcode: Random zip code

Business

  • $company: Random company name
  • $job: Random job title
  • $url: Random URL

Other

  • $text: Random text paragraph
  • $ipv4: Random IPv4 address
  • $currency_name: Random currency name
  • $color_name: Random color name

Example Configuration

{
    "schema": {
        "name": "$name",
        "email": "$email",
        "country": "$country",
        "id": "$uuid",
        "address": "$address",
        "phone": "$phone_number",
        "job": "$job",
        "company": "$company"
    },
    "sink": {
        "type": "webhook",
        "params": {
            "url": "https://api.example.com/webhook",
            "headers": {
                "Authorization": "Bearer your-token"
            }
        }
    },
    "generator": {
        "rps": 1500,
        "num_records": 5000
    }
}

Creating a New Release

To create a new release:

  1. Make sure you have the release script installed:
pip install -e .
  1. Run the release script with the new version:
./scripts/release.py release 0.1.1

This will:

  • Update the version in pyproject.toml
  • Create a git tag
  • Push the changes
  • Trigger the GitHub Actions workflow to:
    • Build the package
    • Publish to PyPI
    • Create a GitHub release

The version must follow semantic versioning (X.Y.Z format).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glassgen-0.1.2.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glassgen-0.1.2-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file glassgen-0.1.2.tar.gz.

File metadata

  • Download URL: glassgen-0.1.2.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for glassgen-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6dd1365c7ed9fd1ab2af6605e29ee414ac44c76bba901de563002ebd9ea902cb
MD5 d8d99b0f21eaa5867e085e6e5b838ef2
BLAKE2b-256 16fa30d0c99f0a14b4621d64b4017e68cee6a24f819100c28912144d02f29251

See more details on using hashes here.

File details

Details for the file glassgen-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: glassgen-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for glassgen-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 24b9c304a946a772f7ddad4c6bc50729deff06ffed6af582ec213ce2f645af10
MD5 ca41a7213862fa1c0e441740639b0a40
BLAKE2b-256 5465d1dae71953143db4b208f74fa73676b21d9bde941b5bdc2e6509fceeed27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page