Skip to main content

High-performance random data generation with NUMA optimization and zero-copy Python interface

Project description

dgen-py

High-performance random data generation with NUMA optimization and zero-copy Python interface

Version License: MIT OR Apache-2.0 PyPI Python Version

Features

  • 🚀 Blazing Fast: 58+ GB/s streaming throughput, matches Numba JIT performance
  • 🎯 Controllable Characteristics: Configurable deduplication and compression ratios
  • 🔬 Multi-Process NUMA: One Python process per NUMA node for maximum throughput
  • 🐍 True Zero-Copy: Python buffer protocol with direct memory access (no data copying)
  • 📦 Streaming API: Generate terabytes of data with constant 32 MB memory usage
  • 🧵 Thread Pool Reuse: Created once, reused across all operations
  • 🛠️ Built with Rust: Memory-safe, production-quality implementation

Performance

Version 0.1.5 Highlights 🎉

NEW: Significant Performance Improvements over v0.1.3:

  • UMA systems: ~50% improvement in per-core throughput (10.80 GB/s vs ~7 GB/s)
  • NUMA systems: Major improvements from bug fixes in multi-process architecture
  • 8-core system: 86.41 GB/s aggregate throughput (C4-16)
  • Maximum aggregate: 324.72 GB/s on 48-core dual-NUMA system (C4-96 with compress=2.0)

Streaming Benchmark (v0.1.5) - 100 GB Test

Comparison of streaming random data generation methods on a 12-core system:

Method Throughput Speedup vs Baseline Memory Required
os.urandom() (baseline) 0.34 GB/s 1.0x Minimal
NumPy Multi-Thread 1.06 GB/s 3.1x 100 GB RAM*
Numba JIT Xoshiro256++ (streaming) 57.11 GB/s 165.7x 32 MB RAM
dgen-py v0.1.5 (streaming) 58.46 GB/s 169.6x 32 MB RAM

* NumPy requires full dataset in memory (10 GB tested, would need 100 GB for 100 GB dataset)

Key Findings:

  • dgen-py matches Numba's streaming performance (58.46 vs 57.11 GB/s)
  • 55x faster than NumPy while using 3,000x less memory (32 MB vs 100 GB)
  • Streaming architecture: Can generate unlimited data with only 32 MB RAM
  • Per-core throughput: 4.87 GB/s (12 cores)

⚠️ Critical for Storage Testing: ONLY dgen-py supports configurable deduplication and compression ratios. All other methods (os.urandom, NumPy, Numba) generate purely random data with maximum entropy, making them unsuitable for realistic storage system testing. Real-world storage workloads require controllable data characteristics to test deduplication engines, compression algorithms, and storage efficiency—capabilities unique to dgen-py.

Multi-NUMA Benchmarks (v0.1.5) - GCP Emerald Rapid

Scalability testing on Google Cloud Platform Intel Emerald Rapid systems (1024 GB workload, compress=1.0):

Instance Physical Cores NUMA Nodes Aggregate Throughput Per-Core Scaling Efficiency
C4-8 4 1 (UMA) 36.26 GB/s 9.07 GB/s Baseline
C4-16 8 1 (UMA) 86.41 GB/s 10.80 GB/s 119%
C4-32 16 1 (UMA) 162.78 GB/s 10.17 GB/s 112%
C4-96 48 2 (NUMA) 248.53 GB/s 5.18 GB/s 51%*

* NUMA penalty: 49% per-core reduction on multi-socket systems, but still achieves highest absolute throughput

Key Findings:

  • Excellent UMA scaling: 112-119% efficiency on single-NUMA systems (super-linear due to larger L3 cache)
  • Per-core performance: 10.80 GB/s on C4-16 (3.0x improvement vs dgen-py v0.1.3's 3.60 GB/s)
  • Compression tradeoff: compress=2.0 provides 1.3-1.5x speedup, but makes data compressible (choose based on your test requirements, not performance)
  • Storage headroom: Even modest 8-core systems exceed 86 GB/s (far beyond typical storage requirements)

See docs/BENCHMARK_RESULTS_V0.1.5.md for complete analysis

Installation

From PyPI (Recommended)

pip install dgen-py

System Requirements

For NUMA support (Linux only):

# Ubuntu/Debian
sudo apt-get install libudev-dev libhwloc-dev

# RHEL/CentOS/Fedora
sudo yum install systemd-devel hwloc-devel

Note: NUMA support is optional. Without these libraries, the package works perfectly on single-NUMA systems (workstations, cloud VMs).

Quick Start

Basic Usage

import dgen_py
import time

# Generate 100 GB of random data with configurable characteristics
gen = dgen_py.Generator(
    size=100 * 1024**3,      # 100 GB
    dedup_ratio=1.0,         # No deduplication 
    compress_ratio=1.0,      # Incompressible 
    numa_mode="auto",        # Auto-detect NUMA topology
    max_threads=None         # Use all available cores
)

# Create buffer (uses optimal chunk size automatically)
buffer = bytearray(gen.chunk_size)

# Stream data in chunks (zero-copy, parallel generation)
start = time.perf_counter()
while not gen.is_complete():
    nbytes = gen.fill_chunk(buffer)
    if nbytes == 0:
        break
    # Write to file/network: buffer[:nbytes]

duration = time.perf_counter() - start
print(f"Throughput: {(100 / duration):.2f} GB/s")

Example output (8-core system):

Throughput: 86.41 GB/s

System Information

import dgen_py

info = dgen_py.get_system_info()
if info:
    print(f"NUMA nodes: {info['num_nodes']}")
    print(f"Physical cores: {info['physical_cores']}")
    print(f"Deployment: {info['deployment_type']}")

Advanced Usage

Multi-Process NUMA (For Multi-NUMA Systems)

For maximum throughput on multi-socket systems, use one Python process per NUMA node with process affinity pinning.

See python/examples/benchmark_numa_multiprocess_v2.py for complete implementation.

Key architecture:

  • One Python process per NUMA node
  • Process pinning via os.sched_setaffinity() to local cores
  • Local memory allocation on each NUMA node
  • Synchronized start with multiprocessing.Barrier

Results:

  • C4-96 (48 cores, 2 NUMA nodes): 248.53 GB/s aggregate
  • C4-32 (16 cores, 1 NUMA node): 162.78 GB/s with 112% scaling efficiency

Performance Notes

Chunk Size Optimization

Default chunk size is automatically optimized for your system. You can override if needed:

gen = dgen_py.Generator(
    size=100 * 1024**3,
    chunk_size=64 * 1024**2  # Override to 64 MB
)

Newer CPUs (Emerald Rapid, Sapphire Rapids) with larger L3 cache benefit from 64 MB chunks.

Deduplication and Compression Ratios

Performance vs Test Accuracy Tradeoff:

# FAST: Incompressible data (1.0x baseline)
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (no performance impact)
    compress_ratio=1.0    # Incompressible data
)

# FASTER: More compressible (1.3-1.5x speedup)
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (no performance impact)
    compress_ratio=2.0    # 2:1 compressible data
)

Important: Higher compress_ratio values improve generation performance (1.3-1.5x faster) BUT make the data more compressible, which may not represent your actual workload:

  • compress_ratio=1.0: Incompressible data (realistic for encrypted files, compressed archives)
  • compress_ratio=2.0: 2:1 compressible data (realistic for text, logs, uncompressed images)
  • compress_ratio=3.0+: Highly compressible data (may not be realistic)

Choose based on YOUR test requirements, not performance numbers. If testing storage with compression enabled, use compress_ratio=1.0 to avoid inflating storage efficiency metrics.

Note: dedup_ratio has zero performance impact (< 1% variance)

NUMA Modes

# Auto-detect topology (recommended)
gen = dgen_py.Generator(..., numa_mode="auto")

# Force UMA (single-socket)
gen = dgen_py.Generator(..., numa_mode="uma")

# Manual NUMA node binding (multi-process only)
gen = dgen_py.Generator(..., numa_node=0)  # Bind to node 0

Architecture

Zero-Copy Implementation

Python buffer protocol with direct memory access:

  • No data copying between Rust and Python
  • GIL released during generation (true parallelism)
  • Memoryview creation < 0.001ms (verified zero-copy)

Parallel Generation

  • 4 MiB internal blocks distributed across all cores
  • Thread pool created once, reused for all operations
  • Xoshiro256++ RNG (5-10x faster than ChaCha20)
  • Optimal for L3 cache performance

NUMA Optimization

  • Multi-process architecture (one process per NUMA node)
  • Local memory allocation on each node
  • Local core affinity (no cross-node traffic)
  • Automatic topology detection via hwloc

Use Cases

  • Storage benchmarking: Generate realistic test data at 40-188 GB/s
  • Network testing: High-throughput data sources
  • AI/ML profiling: Simulate data loading pipelines
  • Compression testing: Validate compressor behavior with controlled ratios
  • Deduplication testing: Test dedup systems with known ratios

License

Dual-licensed under MIT OR Apache-2.0

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgen_py-0.1.5.tar.gz (173.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (577.2 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl (576.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl (577.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file dgen_py-0.1.5.tar.gz.

File metadata

  • Download URL: dgen_py-0.1.5.tar.gz
  • Upload date:
  • Size: 173.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5.tar.gz
Algorithm Hash digest
SHA256 0d4e0240d085d6e12f290aae404e6788dc070a58997701a3ef41390e082abbc7
MD5 8ecb7a6669f82e1369a4d4c5541a2290
BLAKE2b-256 72813bc0e5932826ffec9c42a9ef254aaeeae194a224b996543f1f5792eaca38

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ec0d62bcb55d13ff915131fdcb65c35f556a246db1e6a2b5322e79eeb8ad21f6
MD5 2a088b614a413d450755c31855261198
BLAKE2b-256 5bb1483f8fa338f183a4afc53c9a13380d59492493b11ff52cad8f92810b0433

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fa877390df7654203f7236606ce5c70c12d14ef96972dd8a4afbd8e8ab2ae710
MD5 79c4476983eff97af948c6352f645319
BLAKE2b-256 ab3e7312e505ef33c9d15fe1b350c025be83558b85a3bbba8127665762be986f

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9dfa316e41d5bcff5faa93c029f0d6820dbde4b7fc9dc9378ba1f55781ab83f7
MD5 a951e50b92442ab45dfa3373ff418983
BLAKE2b-256 d3cfd1ecb6813559bad3e06f9bd915e8db398dc37530b0e5eab174f60520ac91

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1c02f2dd2886ab61a71896a81452b005d6a93d48d35fefec69e426ce5b038529
MD5 6b78d23cff5de6083539225cb7251a0d
BLAKE2b-256 2652c85a7ab89329e7e00f967de40f64fc7bdc012724dba662d26ee14b794300

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f6061e383bf660658cf65bbe6b76df909198c014658a315184086e2667f8c032
MD5 375fb4a9588822895f9aa0c907f589fb
BLAKE2b-256 fd7bd972c3bac72d19090ecbe71131f78ee1a6b35112b03fa5df3e79602cfc69

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3bf06a8939c805444db1650fff9b161815ae24c65b77ac59039e2ef4eadc5f70
MD5 8164f73f2e080b5507a61227a42ea47d
BLAKE2b-256 6b9a8b91975758cb7578f9e79d6121a6269a3876e7ce5150fdade0e4f3e635f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page