High-performance random data generation with NUMA optimization and zero-copy Python interface

These details have not been verified by PyPI

Project description

dgen-py

High-performance random data generation with NUMA optimization and zero-copy Python interface

Features

🚀 Blazing Fast: 58+ GB/s streaming throughput, matches Numba JIT performance
🎯 Controllable Characteristics: Configurable deduplication and compression ratios
🔬 Multi-Process NUMA: One Python process per NUMA node for maximum throughput
🐍 True Zero-Copy: Python buffer protocol with direct memory access (no data copying)
📦 Streaming API: Generate terabytes of data with constant 32 MB memory usage
🧵 Thread Pool Reuse: Created once, reused across all operations
🛠️ Built with Rust: Memory-safe, production-quality implementation

Performance

Version 0.1.5 Highlights 🎉

NEW: Significant Performance Improvements over v0.1.3:

UMA systems: ~50% improvement in per-core throughput (10.80 GB/s vs ~7 GB/s)
NUMA systems: Major improvements from bug fixes in multi-process architecture
8-core system: 86.41 GB/s aggregate throughput (C4-16)
Maximum aggregate: 324.72 GB/s on 48-core dual-NUMA system (C4-96 with compress=2.0)

Streaming Benchmark (v0.1.5) - 100 GB Test

Comparison of streaming random data generation methods on a 12-core system:

Method	Throughput	Speedup vs Baseline	Memory Required
os.urandom() (baseline)	0.34 GB/s	1.0x	Minimal
NumPy Multi-Thread	1.06 GB/s	3.1x	100 GB RAM*
Numba JIT Xoshiro256++ (streaming)	57.11 GB/s	165.7x	32 MB RAM
dgen-py v0.1.5 (streaming)	58.46 GB/s	169.6x	32 MB RAM

* NumPy requires full dataset in memory (10 GB tested, would need 100 GB for 100 GB dataset)

Key Findings:

dgen-py matches Numba's streaming performance (58.46 vs 57.11 GB/s)
55x faster than NumPy while using 3,000x less memory (32 MB vs 100 GB)
Streaming architecture: Can generate unlimited data with only 32 MB RAM
Per-core throughput: 4.87 GB/s (12 cores)

⚠️ Critical for Storage Testing: ONLY dgen-py supports configurable deduplication and compression ratios. All other methods (os.urandom, NumPy, Numba) generate purely random data with maximum entropy, making them unsuitable for realistic storage system testing. Real-world storage workloads require controllable data characteristics to test deduplication engines, compression algorithms, and storage efficiency—capabilities unique to dgen-py.

Multi-NUMA Benchmarks (v0.1.5) - GCP Emerald Rapid

Scalability testing on Google Cloud Platform Intel Emerald Rapid systems (1024 GB workload, compress=1.0):

Instance	Physical Cores	NUMA Nodes	Aggregate Throughput	Per-Core	Scaling Efficiency
C4-8	4	1 (UMA)	36.26 GB/s	9.07 GB/s	Baseline
C4-16	8	1 (UMA)	86.41 GB/s	10.80 GB/s	119%
C4-32	16	1 (UMA)	162.78 GB/s	10.17 GB/s	112%
C4-96	48	2 (NUMA)	248.53 GB/s	5.18 GB/s	51%*

* NUMA penalty: 49% per-core reduction on multi-socket systems, but still achieves highest absolute throughput

Key Findings:

Excellent UMA scaling: 112-119% efficiency on single-NUMA systems (super-linear due to larger L3 cache)
Per-core performance: 10.80 GB/s on C4-16 (3.0x improvement vs dgen-py v0.1.3's 3.60 GB/s)
Compression tradeoff: compress=2.0 provides 1.3-1.5x speedup, but makes data compressible (choose based on your test requirements, not performance)
Storage headroom: Even modest 8-core systems exceed 86 GB/s (far beyond typical storage requirements)

See docs/BENCHMARK_RESULTS_V0.1.5.md for complete analysis

Installation

From PyPI (Recommended)

pip install dgen-py

System Requirements

For NUMA support (Linux only):

# Ubuntu/Debian
sudo apt-get install libudev-dev libhwloc-dev

# RHEL/CentOS/Fedora
sudo yum install systemd-devel hwloc-devel

Note: NUMA support is optional. Without these libraries, the package works perfectly on single-NUMA systems (workstations, cloud VMs).

Quick Start

Basic Usage

import dgen_py
import time

# Generate 100 GB of random data with configurable characteristics
gen = dgen_py.Generator(
    size=100 * 1024**3,      # 100 GB
    dedup_ratio=1.0,         # No deduplication 
    compress_ratio=1.0,      # Incompressible 
    numa_mode="auto",        # Auto-detect NUMA topology
    max_threads=None         # Use all available cores
)

# Create buffer (uses optimal chunk size automatically)
buffer = bytearray(gen.chunk_size)

# Stream data in chunks (zero-copy, parallel generation)
start = time.perf_counter()
while not gen.is_complete():
    nbytes = gen.fill_chunk(buffer)
    if nbytes == 0:
        break
    # Write to file/network: buffer[:nbytes]

duration = time.perf_counter() - start
print(f"Throughput: {(100 / duration):.2f} GB/s")

Example output (8-core system):

Throughput: 86.41 GB/s

System Information

import dgen_py

info = dgen_py.get_system_info()
if info:
    print(f"NUMA nodes: {info['num_nodes']}")
    print(f"Physical cores: {info['physical_cores']}")
    print(f"Deployment: {info['deployment_type']}")

Advanced Usage

Multi-Process NUMA (For Multi-NUMA Systems)

For maximum throughput on multi-socket systems, use one Python process per NUMA node with process affinity pinning.

See python/examples/benchmark_numa_multiprocess_v2.py for complete implementation.

Key architecture:

One Python process per NUMA node
Process pinning via os.sched_setaffinity() to local cores
Local memory allocation on each NUMA node
Synchronized start with multiprocessing.Barrier

Results:

C4-96 (48 cores, 2 NUMA nodes): 248.53 GB/s aggregate
C4-32 (16 cores, 1 NUMA node): 162.78 GB/s with 112% scaling efficiency

Performance Notes

Chunk Size Optimization

Default chunk size is automatically optimized for your system. You can override if needed:

gen = dgen_py.Generator(
    size=100 * 1024**3,
    chunk_size=64 * 1024**2  # Override to 64 MB
)

Newer CPUs (Emerald Rapid, Sapphire Rapids) with larger L3 cache benefit from 64 MB chunks.

Deduplication and Compression Ratios

Performance vs Test Accuracy Tradeoff:

# FAST: Incompressible data (1.0x baseline)
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (no performance impact)
    compress_ratio=1.0    # Incompressible data
)

# FASTER: More compressible (1.3-1.5x speedup)
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (no performance impact)
    compress_ratio=2.0    # 2:1 compressible data
)

Important: Higher compress_ratio values improve generation performance (1.3-1.5x faster) BUT make the data more compressible, which may not represent your actual workload:

compress_ratio=1.0: Incompressible data (realistic for encrypted files, compressed archives)
compress_ratio=2.0: 2:1 compressible data (realistic for text, logs, uncompressed images)
compress_ratio=3.0+: Highly compressible data (may not be realistic)

Choose based on YOUR test requirements, not performance numbers. If testing storage with compression enabled, use compress_ratio=1.0 to avoid inflating storage efficiency metrics.

Note: dedup_ratio has zero performance impact (< 1% variance)

NUMA Modes

# Auto-detect topology (recommended)
gen = dgen_py.Generator(..., numa_mode="auto")

# Force UMA (single-socket)
gen = dgen_py.Generator(..., numa_mode="uma")

# Manual NUMA node binding (multi-process only)
gen = dgen_py.Generator(..., numa_node=0)  # Bind to node 0

Architecture

Zero-Copy Implementation

Python buffer protocol with direct memory access:

No data copying between Rust and Python
GIL released during generation (true parallelism)
Memoryview creation < 0.001ms (verified zero-copy)

Parallel Generation

4 MiB internal blocks distributed across all cores
Thread pool created once, reused for all operations
Xoshiro256++ RNG (5-10x faster than ChaCha20)
Optimal for L3 cache performance

NUMA Optimization

Multi-process architecture (one process per NUMA node)
Local memory allocation on each node
Local core affinity (no cross-node traffic)
Automatic topology detection via hwloc

Use Cases

Storage benchmarking: Generate realistic test data at 40-188 GB/s
Network testing: High-throughput data sources
AI/ML profiling: Simulate data loading pipelines
Compression testing: Validate compressor behavior with controlled ratios
Deduplication testing: Test dedup systems with known ratios

License

Dual-licensed under MIT OR Apache-2.0

Credits

Built with PyO3 and Maturin
Uses hwlocality for NUMA topology detection
Xoshiro256++ RNG from rand crate

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.4

May 5, 2026

0.2.3

Apr 16, 2026

0.2.2

Mar 27, 2026

0.2.1

Mar 27, 2026

0.2.0

Feb 12, 2026

0.1.7

Jan 25, 2026

0.1.6

Jan 23, 2026

This version

0.1.5

Jan 19, 2026

0.1.4

Jan 18, 2026

0.1.3

Jan 18, 2026

0.1.2

Jan 8, 2026

0.1.1

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgen_py-0.1.5.tar.gz (173.9 kB view details)

Uploaded Jan 19, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (577.2 kB view details)

Uploaded Jan 19, 2026 PyPymanylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded Jan 19, 2026 CPython 3.14manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded Jan 19, 2026 CPython 3.13manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl (576.7 kB view details)

Uploaded Jan 19, 2026 CPython 3.12manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl (576.9 kB view details)

Uploaded Jan 19, 2026 CPython 3.11manylinux: glibc 2.28+ x86-64

dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl (577.0 kB view details)

Uploaded Jan 19, 2026 CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file dgen_py-0.1.5.tar.gz.

File metadata

Download URL: dgen_py-0.1.5.tar.gz
Upload date: Jan 19, 2026
Size: 173.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`0d4e0240d085d6e12f290aae404e6788dc070a58997701a3ef41390e082abbc7`
MD5	`8ecb7a6669f82e1369a4d4c5541a2290`
BLAKE2b-256	`72813bc0e5932826ffec9c42a9ef254aaeeae194a224b996543f1f5792eaca38`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Upload date: Jan 19, 2026
Size: 577.2 kB
Tags: PyPy, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`ec0d62bcb55d13ff915131fdcb65c35f556a246db1e6a2b5322e79eeb8ad21f6`
MD5	`2a088b614a413d450755c31855261198`
BLAKE2b-256	`5bb1483f8fa338f183a4afc53c9a13380d59492493b11ff52cad8f92810b0433`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl
Upload date: Jan 19, 2026
Size: 576.9 kB
Tags: CPython 3.14, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`fa877390df7654203f7236606ce5c70c12d14ef96972dd8a4afbd8e8ab2ae710`
MD5	`79c4476983eff97af948c6352f645319`
BLAKE2b-256	`ab3e7312e505ef33c9d15fe1b350c025be83558b85a3bbba8127665762be986f`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl
Upload date: Jan 19, 2026
Size: 576.9 kB
Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`9dfa316e41d5bcff5faa93c029f0d6820dbde4b7fc9dc9378ba1f55781ab83f7`
MD5	`a951e50b92442ab45dfa3373ff418983`
BLAKE2b-256	`d3cfd1ecb6813559bad3e06f9bd915e8db398dc37530b0e5eab174f60520ac91`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl
Upload date: Jan 19, 2026
Size: 576.7 kB
Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`1c02f2dd2886ab61a71896a81452b005d6a93d48d35fefec69e426ce5b038529`
MD5	`6b78d23cff5de6083539225cb7251a0d`
BLAKE2b-256	`2652c85a7ab89329e7e00f967de40f64fc7bdc012724dba662d26ee14b794300`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl
Upload date: Jan 19, 2026
Size: 576.9 kB
Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`f6061e383bf660658cf65bbe6b76df909198c014658a315184086e2667f8c032`
MD5	`375fb4a9588822895f9aa0c907f589fb`
BLAKE2b-256	`fd7bd972c3bac72d19090ecbe71131f78ee1a6b35112b03fa5df3e79602cfc69`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl
Upload date: Jan 19, 2026
Size: 577.0 kB
Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`3bf06a8939c805444db1650fff9b161815ae24c65b77ac59039e2ef4eadc5f70`
MD5	`8164f73f2e080b5507a61227a42ea47d`
BLAKE2b-256	`6b9a8b91975758cb7578f9e79d6121a6269a3876e7ce5150fdade0e4f3e635f6`

See more details on using hashes here.

dgen-py 0.1.5

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

dgen-py

Features

Performance

Version 0.1.5 Highlights 🎉

Streaming Benchmark (v0.1.5) - 100 GB Test

Multi-NUMA Benchmarks (v0.1.5) - GCP Emerald Rapid

Installation

From PyPI (Recommended)

System Requirements

Quick Start

Basic Usage

System Information

Advanced Usage

Multi-Process NUMA (For Multi-NUMA Systems)

Performance Notes

Chunk Size Optimization

Deduplication and Compression Ratios

NUMA Modes

Architecture

Zero-Copy Implementation

Parallel Generation

NUMA Optimization

Use Cases

License

Credits

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes