Skip to main content

High-performance random data generation with NUMA optimization and zero-copy Python interface

Project description

dgen-py

High-performance random data generation with NUMA optimization and zero-copy Python interface

Version License: MIT OR Apache-2.0 PyPI Python Version Tests

Features

  • 🚀 Blazing Fast: 58+ GB/s streaming throughput, matches Numba JIT performance
  • 🎯 Controllable Characteristics: Configurable deduplication and compression ratios
  • 🔄 Reproducible Data: Optional seed parameter for identical data generation across runs
  • 🔬 Multi-Process NUMA: One Python process per NUMA node for maximum throughput
  • 🐍 True Zero-Copy: Python buffer protocol with direct memory access (no data copying)
  • 📦 Streaming API: Generate terabytes of data with constant 32 MB memory usage
  • 🧵 Thread Pool Reuse: Created once, reused across all operations
  • 🛠️ Built with Rust: Memory-safe, production-quality implementation

Version 0.1.6 Highlights 🎉

NEW: Reproducible Data Generation

  • Optional seed parameter enables identical data generation across runs
  • Perfect for reproducible benchmarking, testing, and CI/CD workflows
  • Fully backward compatible - defaults to non-deterministic (time + urandom)
# Reproducible mode - same seed produces identical data
gen = dgen_py.Generator(size=100*1024**3, seed=12345)

# Non-deterministic mode (default) - different data each run
gen = dgen_py.Generator(size=100*1024**3)  # seed=None

Use cases:

  • 🔬 Reproducible benchmarking: Compare storage systems with identical workloads
  • Consistent testing: Same test data across CI/CD pipeline runs
  • 🐛 Debugging: Regenerate exact data streams for issue investigation
  • 📊 Compliance: Verifiable, reproducible data generation for audits

See Reproducible Data Generation section below for complete examples.


Performance

Version 0.1.5 Highlights

Significant Performance Improvements over v0.1.3:

  • UMA systems: ~50% improvement in per-core throughput (10.80 GB/s vs ~7 GB/s)
  • NUMA systems: Major improvements from bug fixes in multi-process architecture
  • 8-core system: 86.41 GB/s aggregate throughput (C4-16)
  • Maximum aggregate: 324.72 GB/s on 48-core dual-NUMA system (C4-96 with compress=2.0)

Streaming Benchmark (v0.1.5) - 100 GB Test

Comparison of streaming random data generation methods on a 12-core system:

Method Throughput Speedup vs Baseline Memory Required
os.urandom() (baseline) 0.34 GB/s 1.0x Minimal
NumPy Multi-Thread 1.06 GB/s 3.1x 100 GB RAM*
Numba JIT Xoshiro256++ (streaming) 57.11 GB/s 165.7x 32 MB RAM
dgen-py v0.1.5 (streaming) 58.46 GB/s 169.6x 32 MB RAM

* NumPy requires full dataset in memory (10 GB tested, would need 100 GB for 100 GB dataset)

Key Findings:

  • dgen-py matches Numba's streaming performance (58.46 vs 57.11 GB/s)
  • 55x faster than NumPy while using 3,000x less memory (32 MB vs 100 GB)
  • Streaming architecture: Can generate unlimited data with only 32 MB RAM
  • Per-core throughput: 4.87 GB/s (12 cores)

⚠️ Critical for Storage Testing: ONLY dgen-py supports configurable deduplication and compression ratios. All other methods (os.urandom, NumPy, Numba) generate purely random data with maximum entropy, making them unsuitable for realistic storage system testing. Real-world storage workloads require controllable data characteristics to test deduplication engines, compression algorithms, and storage efficiency—capabilities unique to dgen-py.

Multi-NUMA Benchmarks (v0.1.5) - GCP Emerald Rapid

Scalability testing on Google Cloud Platform Intel Emerald Rapid systems (1024 GB workload, compress=1.0):

Instance Physical Cores NUMA Nodes Aggregate Throughput Per-Core Scaling Efficiency
C4-8 4 1 (UMA) 36.26 GB/s 9.07 GB/s Baseline
C4-16 8 1 (UMA) 86.41 GB/s 10.80 GB/s 119%
C4-32 16 1 (UMA) 162.78 GB/s 10.17 GB/s 112%
C4-96 48 2 (NUMA) 248.53 GB/s 5.18 GB/s 51%*

* NUMA penalty: 49% per-core reduction on multi-socket systems, but still achieves highest absolute throughput

Key Findings:

  • Excellent UMA scaling: 112-119% efficiency on single-NUMA systems (super-linear due to larger L3 cache)
  • Per-core performance: 10.80 GB/s on C4-16 (3.0x improvement vs dgen-py v0.1.3's 3.60 GB/s)
  • Compression tradeoff: compress=2.0 provides 1.3-1.5x speedup, but makes data compressible (choose based on your test requirements, not performance)
  • Storage headroom: Even modest 8-core systems exceed 86 GB/s (far beyond typical storage requirements)

See docs/BENCHMARK_RESULTS_V0.1.5.md for complete analysis

Installation

From PyPI (Recommended)

pip install dgen-py

System Requirements

For NUMA support (Linux only):

# Ubuntu/Debian
sudo apt-get install libudev-dev libhwloc-dev

# RHEL/CentOS/Fedora
sudo yum install systemd-devel hwloc-devel

Note: NUMA support is optional. Without these libraries, the package works perfectly on single-NUMA systems (workstations, cloud VMs).

Quick Start

Basic Usage

import dgen_py
import time

# Generate 100 GB of random data with configurable characteristics
gen = dgen_py.Generator(
    size=100 * 1024**3,      # 100 GB
    dedup_ratio=1.0,         # No deduplication 
    compress_ratio=1.0,      # Incompressible 
    numa_mode="auto",        # Auto-detect NUMA topology
    max_threads=None         # Use all available cores
)

# Create buffer (uses optimal chunk size automatically)
buffer = bytearray(gen.chunk_size)

# Stream data in chunks (zero-copy, parallel generation)
start = time.perf_counter()
while not gen.is_complete():
    nbytes = gen.fill_chunk(buffer)
    if nbytes == 0:
        break
    # Write to file/network: buffer[:nbytes]

duration = time.perf_counter() - start
print(f"Throughput: {(100 / duration):.2f} GB/s")

Example output (8-core system):

Throughput: 86.41 GB/s

Reproducible Data Generation (NEW in v0.1.6)

import dgen_py

# Generate reproducible data with a fixed seed
gen1 = dgen_py.Generator(
    size=10 * 1024**3,  # 10 GB
    seed=12345          # Optional: enables reproducibility
)

# Same seed produces identical data
gen2 = dgen_py.Generator(
    size=10 * 1024**3,
    seed=12345          # Same seed = identical data
)

# Without seed (default), data is non-deterministic
gen3 = dgen_py.Generator(
    size=10 * 1024**3   # seed=None (default)
)

Use cases for reproducible mode:

  • Reproducible benchmarking and testing
  • Consistent test data across CI/CD runs
  • Debugging with identical data streams
  • Verifiable data generation for compliance

System Information

import dgen_py

info = dgen_py.get_system_info()
if info:
    print(f"NUMA nodes: {info['num_nodes']}")
    print(f"Physical cores: {info['physical_cores']}")
    print(f"Deployment: {info['deployment_type']}")

Advanced Usage

Multi-Process NUMA (For Multi-NUMA Systems)

For maximum throughput on multi-socket systems, use one Python process per NUMA node with process affinity pinning.

See python/examples/benchmark_numa_multiprocess_v2.py for complete implementation.

Key architecture:

  • One Python process per NUMA node
  • Process pinning via os.sched_setaffinity() to local cores
  • Local memory allocation on each NUMA node
  • Synchronized start with multiprocessing.Barrier

Results:

  • C4-96 (48 cores, 2 NUMA nodes): 248.53 GB/s aggregate
  • C4-32 (16 cores, 1 NUMA node): 162.78 GB/s with 112% scaling efficiency

Performance Notes

Chunk Size Optimization

Default chunk size is automatically optimized for your system. You can override if needed:

gen = dgen_py.Generator(
    size=100 * 1024**3,
    chunk_size=64 * 1024**2  # Override to 64 MB
)

Newer CPUs (Emerald Rapid, Sapphire Rapids) with larger L3 cache benefit from 64 MB chunks.

Deduplication and Compression Ratios

Performance vs Test Accuracy Tradeoff:

# FAST: Incompressible data (1.0x baseline)
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (no performance impact)
    compress_ratio=1.0    # Incompressible data
)

# FASTER: More compressible (1.3-1.5x speedup)
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (no performance impact)
    compress_ratio=2.0    # 2:1 compressible data
)

Important: Higher compress_ratio values improve generation performance (1.3-1.5x faster) BUT make the data more compressible, which may not represent your actual workload:

  • compress_ratio=1.0: Incompressible data (realistic for encrypted files, compressed archives)
  • compress_ratio=2.0: 2:1 compressible data (realistic for text, logs, uncompressed images)
  • compress_ratio=3.0+: Highly compressible data (may not be realistic)

Choose based on YOUR test requirements, not performance numbers. If testing storage with compression enabled, use compress_ratio=1.0 to avoid inflating storage efficiency metrics.

Note: dedup_ratio has zero performance impact (< 1% variance)

NUMA Modes

# Auto-detect topology (recommended)
gen = dgen_py.Generator(..., numa_mode="auto")

# Force UMA (single-socket)
gen = dgen_py.Generator(..., numa_mode="uma")

# Manual NUMA node binding (multi-process only)
gen = dgen_py.Generator(..., numa_node=0)  # Bind to node 0

Architecture

Zero-Copy Implementation

Python buffer protocol with direct memory access:

  • No data copying between Rust and Python
  • GIL released during generation (true parallelism)
  • Memoryview creation < 0.001ms (verified zero-copy)

Parallel Generation

  • 4 MiB internal blocks distributed across all cores
  • Thread pool created once, reused for all operations
  • Xoshiro256++ RNG (5-10x faster than ChaCha20)
  • Optimal for L3 cache performance

NUMA Optimization

  • Multi-process architecture (one process per NUMA node)
  • Local memory allocation on each node
  • Local core affinity (no cross-node traffic)
  • Automatic topology detection via hwloc

Use Cases

  • Storage benchmarking: Generate realistic test data at 40-188 GB/s
  • Network testing: High-throughput data sources
  • AI/ML profiling: Simulate data loading pipelines
  • Compression testing: Validate compressor behavior with controlled ratios
  • Deduplication testing: Test dedup systems with known ratios

License

Dual-licensed under MIT OR Apache-2.0

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgen_py-0.1.6.tar.gz (176.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dgen_py-0.1.6-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (573.3 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

dgen_py-0.1.6-cp314-cp314-manylinux_2_28_x86_64.whl (572.8 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

dgen_py-0.1.6-cp313-cp313-manylinux_2_28_x86_64.whl (573.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

dgen_py-0.1.6-cp312-cp312-manylinux_2_28_x86_64.whl (573.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

dgen_py-0.1.6-cp311-cp311-manylinux_2_28_x86_64.whl (572.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

dgen_py-0.1.6-cp310-cp310-manylinux_2_28_x86_64.whl (573.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file dgen_py-0.1.6.tar.gz.

File metadata

  • Download URL: dgen_py-0.1.6.tar.gz
  • Upload date:
  • Size: 176.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.6.tar.gz
Algorithm Hash digest
SHA256 3806016ea55974155ae07ab4d03b18fc5be5685f1e59a778868da7c7a6552678
MD5 41e46465cd5ce920cccbd5a49c3736d7
BLAKE2b-256 af545d09ebbeb4b7c749c9992a2833c18fd652756c594fad61d7af81972539c2

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.6-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.6-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c67f99c3bd842b2fdb2c7d353246d5592a79b6afbef9d1247a7ccb5a26093fd8
MD5 848e64da4934bd66ea755c60d816b14b
BLAKE2b-256 8f233377659d6f005ace06e4d7c71633d978acdf1130ffa95b4a106823814d0c

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.6-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.6-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2bf22780ac70b7d199f095247a12e2d4ba799a6df5fb1571e8e05d7c97916663
MD5 f837569169c698ed96e4adc41559ef08
BLAKE2b-256 1c4562ecd02b260a2a4535fe4dd066b3d9912752639ee5081adf688b37e960c9

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.6-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.6-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6d3ee5f41ecd0caea72795340c02595d5a7b6d5cd2dbbf2d499771747661288f
MD5 baeedbf0375a361aca5ab82617a6f063
BLAKE2b-256 7034059fad3502ef7ff166851a2b3e6d08b0cff4e85e9ed9cfc8dc6408620d6c

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.6-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.6-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 512e477802df8212657b889c1de6670acdfe8b14753bb988f30f9efecffd524e
MD5 33105baa6d83e81692786592d685077a
BLAKE2b-256 a4dbf6e047457c8f454ffd186874cffa3e07a7c560dcf6998f8f09ed850f51ca

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.6-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.6-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e2cc8519dc80a1000dc71204253eea3be2d476a62ed77c25470a4c584878e3da
MD5 444faae26c81fa6effb74edaf7dccc0e
BLAKE2b-256 7f3b3be9ecc49c4b1ea0980a30f71211868df1fd003014d3d1d8e90293f464ad

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.6-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.6-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 16f5db4772de62d98f766bc7043a0c7d7a267af8c6ead031017c09c97a8a2e89
MD5 ecb9c6755a2f1bed681b504729632a94
BLAKE2b-256 f31ec0492f480b95b8f4eff14d7003ee966ea13835e1655377af6e52b99c99cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page