Skip to main content

Python wrapper for DataSynth synthetic data generation

Project description

datasynth-py

Python wrapper for the DataSynth synthetic data generator.

Installation

From PyPI

pip install datasynth-py[all]

Or install specific extras:

pip install datasynth-py           # Core only (no dependencies)
pip install datasynth-py[cli]      # CLI generation (PyYAML)
pip install datasynth-py[memory]   # In-memory tables (pandas)
pip install datasynth-py[streaming] # Streaming (websockets)
pip install datasynth-py[all]      # All optional dependencies

From Source

cd python
pip install -e ".[all]"

Quick Start

from datasynth_py import DataSynth, CompanyConfig, Config, GlobalSettings, ChartOfAccountsSettings

config = Config(
    global_settings=GlobalSettings(
        industry="retail",
        start_date="2024-01-01",
        period_months=12,
    ),
    companies=[
        CompanyConfig(code="C001", name="Retail Corp", currency="USD", country="US"),
    ],
    chart_of_accounts=ChartOfAccountsSettings(complexity="small"),
)

synth = DataSynth()
result = synth.generate(config=config, output={"format": "csv", "sink": "temp_dir"})
print(result.output_dir)

Using Blueprints

from datasynth_py import DataSynth
from datasynth_py.config import blueprints

config = blueprints.retail_small(companies=4, transactions=10000)
synth = DataSynth()
result = synth.generate(config=config, output={"format": "parquet", "sink": "path", "path": "./output"})

Integration Features (v0.2.2+)

from datasynth_py import (
    Config,
    StreamingSettings,
    RateLimitSettings,
    TemporalAttributeSettings,
    RelationshipSettings,
    GraphExportSettings,
)

config = Config(
    # ... other settings ...

    # Streaming output with backpressure
    streaming=StreamingSettings(
        enabled=True,
        buffer_size=1000,
        backpressure="block",  # block, drop_oldest, drop_newest, buffer
    ),

    # Rate limiting for controlled throughput
    rate_limit=RateLimitSettings(
        enabled=True,
        entities_per_second=10000.0,
        burst_size=100,
    ),

    # Bi-temporal data support
    temporal_attributes=TemporalAttributeSettings(
        enabled=True,
        generate_version_chains=True,
        avg_versions_per_entity=1.5,
    ),

    # Relationship generation with cardinality rules
    relationships=RelationshipSettings(
        enabled=True,
        allow_orphans=True,
        orphan_probability=0.01,
    ),

    # Graph export including RustGraph format
    graph_export=GraphExportSettings(
        enabled=True,
        formats=["pytorch_geometric", "rustgraph"],
    ),
)

Requirements

The wrapper shells out to the datasynth-data CLI binary. Build it with:

cargo build --release
export DATASYNTH_BINARY=target/release/datasynth-data

Or pass binary_path when creating the client:

synth = DataSynth(binary_path="/path/to/datasynth-data")

Documentation

See the Python Wrapper Guide for complete documentation.

License

Apache 2.0 License - see the main project LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasynth_py-0.2.3.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasynth_py-0.2.3-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file datasynth_py-0.2.3.tar.gz.

File metadata

  • Download URL: datasynth_py-0.2.3.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datasynth_py-0.2.3.tar.gz
Algorithm Hash digest
SHA256 20fc9dcf4572dfb9177a70020b3adda2b0fcbe89c87fe7677694bd8b2a4987c9
MD5 be60628a83fb1e6c4577cd3cafa9e918
BLAKE2b-256 107f13766c97396c7e4ff117d8f924a009747ac204f6c127e1471533bdd5608b

See more details on using hashes here.

File details

Details for the file datasynth_py-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: datasynth_py-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for datasynth_py-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 13aae6abfde86f61fdb194318030bc0783810b9e182cbb71656c27d62275d3cb
MD5 8f717b8c66fcf25d4b6e0fbe1f8b3b6f
BLAKE2b-256 16df9e7e9e617f8c255ed2b4559de57813e81fa2e26618944ea535d39f4e9917

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page