High-performance random data generation with NUMA optimization and zero-copy Python interface
Project description
dgen-py
High-performance random data generation with NUMA optimization and zero-copy Python interface
Features
- 🚀 Blazing Fast: 10 GB/s per core, up to 300 GB/s verified
- 🎯 Controllable Characteristics: Configurable deduplication and compression ratios
- 🔄 Reproducible Data: Optional seed parameter for identical data generation across runs
- 🔬 Multi-Process NUMA: One Python process per NUMA node for maximum throughput
- 🐍 True Zero-Copy: Python buffer protocol with direct memory access (no data copying)
- 📦 Streaming API: Generate terabytes of data with constant 32 MB memory usage
- 🧵 Thread Pool Reuse: Created once, reused across all operations
- 🛠️ Built with Rust: Memory-safe, production-quality implementation
Version 0.1.6 Highlights 🎉
NEW: Reproducible Data Generation
- Optional
seedparameter enables identical data generation across runs - Perfect for reproducible benchmarking, testing, and CI/CD workflows
- Fully backward compatible - defaults to non-deterministic (time + urandom)
# Reproducible mode - same seed produces identical data
gen = dgen_py.Generator(size=100*1024**3, seed=12345)
# Non-deterministic mode (default) - different data each run
gen = dgen_py.Generator(size=100*1024**3) # seed=None
Use cases:
- 🔬 Reproducible benchmarking: Compare storage systems with identical workloads
- ✅ Consistent testing: Same test data across CI/CD pipeline runs
- 🐛 Debugging: Regenerate exact data streams for issue investigation
- 📊 Compliance: Verifiable, reproducible data generation for audits
See Reproducible Data Generation section below for complete examples.
Performance
Version 0.1.5 Highlights
Significant Performance Improvements over v0.1.3:
- UMA systems: ~50% improvement in per-core throughput (10.80 GB/s vs ~7 GB/s)
- NUMA systems: Major improvements from bug fixes in multi-process architecture
- 8-core system: 86.41 GB/s aggregate throughput (C4-16)
- Maximum aggregate: 324.72 GB/s on 48-core dual-NUMA system (C4-96 with compress=2.0)
Streaming Benchmark (v0.1.5) - 100 GB Test
Comparison of streaming random data generation methods on a 12-core system:
| Method | Throughput | Speedup vs Baseline | Memory Required |
|---|---|---|---|
| os.urandom() (baseline) | 0.34 GB/s | 1.0x | Minimal |
| NumPy Multi-Thread | 1.06 GB/s | 3.1x | 100 GB RAM* |
| Numba JIT Xoshiro256++ (streaming) | 57.11 GB/s | 165.7x | 32 MB RAM |
| dgen-py v0.1.5 (streaming) | 58.46 GB/s | 169.6x | 32 MB RAM |
* NumPy requires full dataset in memory (10 GB tested, would need 100 GB for 100 GB dataset)
Key Findings:
- dgen-py matches Numba's streaming performance (58.46 vs 57.11 GB/s)
- 55x faster than NumPy while using 3,000x less memory (32 MB vs 100 GB)
- Streaming architecture: Can generate unlimited data with only 32 MB RAM
- Per-core throughput: 4.87 GB/s (12 cores)
⚠️ Critical for Storage Testing: ONLY dgen-py supports configurable deduplication and compression ratios. All other methods (os.urandom, NumPy, Numba) generate purely random data with maximum entropy, making them unsuitable for realistic storage system testing. Real-world storage workloads require controllable data characteristics to test deduplication engines, compression algorithms, and storage efficiency—capabilities unique to dgen-py.
Multi-NUMA Benchmarks (v0.1.5) - GCP Emerald Rapid
Scalability testing on Google Cloud Platform Intel Emerald Rapid systems (1024 GB workload, compress=1.0):
| Instance | Physical Cores | NUMA Nodes | Aggregate Throughput | Per-Core | Scaling Efficiency |
|---|---|---|---|---|---|
| C4-8 | 4 | 1 (UMA) | 36.26 GB/s | 9.07 GB/s | Baseline |
| C4-16 | 8 | 1 (UMA) | 86.41 GB/s | 10.80 GB/s | 119% |
| C4-32 | 16 | 1 (UMA) | 162.78 GB/s | 10.17 GB/s | 112% |
| C4-96 | 48 | 2 (NUMA) | 248.53 GB/s | 5.18 GB/s | 51%* |
* NUMA penalty: 49% per-core reduction on multi-socket systems, but still achieves highest absolute throughput
Key Findings:
- Excellent UMA scaling: 112-119% efficiency on single-NUMA systems (super-linear due to larger L3 cache)
- Per-core performance: 10.80 GB/s on C4-16 (3.0x improvement vs dgen-py v0.1.3's 3.60 GB/s)
- Compression tradeoff: compress=2.0 provides 1.3-1.5x speedup, but makes data compressible (choose based on your test requirements, not performance)
- Storage headroom: Even modest 8-core systems exceed 86 GB/s (far beyond typical storage requirements)
See docs/BENCHMARK_RESULTS_V0.1.5.md for complete analysis
Installation
From PyPI (Recommended)
pip install dgen-py
System Requirements
For NUMA support (Linux only):
# Ubuntu/Debian
sudo apt-get install libudev-dev libhwloc-dev
# RHEL/CentOS/Fedora
sudo yum install systemd-devel hwloc-devel
Note: NUMA support is optional. Without these libraries, the package works perfectly on single-NUMA systems (workstations, cloud VMs).
Quick Start
Basic Usage
import dgen_py
import time
# Generate 100 GB of random data with configurable characteristics
gen = dgen_py.Generator(
size=100 * 1024**3, # 100 GB
dedup_ratio=1.0, # No deduplication
compress_ratio=1.0, # Incompressible
numa_mode="auto", # Auto-detect NUMA topology
max_threads=None # Use all available cores
)
# Create buffer (uses optimal chunk size automatically)
buffer = bytearray(gen.chunk_size)
# Stream data in chunks (zero-copy, parallel generation)
start = time.perf_counter()
while not gen.is_complete():
nbytes = gen.fill_chunk(buffer)
if nbytes == 0:
break
# Write to file/network: buffer[:nbytes]
duration = time.perf_counter() - start
print(f"Throughput: {(100 / duration):.2f} GB/s")
Example output (8-core system):
Throughput: 86.41 GB/s
Reproducible Data Generation (NEW in v0.1.6)
import dgen_py
# Generate reproducible data with a fixed seed
gen1 = dgen_py.Generator(
size=10 * 1024**3, # 10 GB
seed=12345 # Optional: enables reproducibility
)
# Same seed produces identical data
gen2 = dgen_py.Generator(
size=10 * 1024**3,
seed=12345 # Same seed = identical data
)
# Without seed (default), data is non-deterministic
gen3 = dgen_py.Generator(
size=10 * 1024**3 # seed=None (default)
)
Use cases for reproducible mode:
- Reproducible benchmarking and testing
- Consistent test data across CI/CD runs
- Debugging with identical data streams
- Verifiable data generation for compliance
Dynamic Seed Changes (NEW in v0.1.7)
import dgen_py
gen = dgen_py.Generator(size=100 * 1024**3, seed=1111)
buffer = bytearray(10 * 1024**2)
# Generate data with seed A
gen.set_seed(1111)
gen.fill_chunk(buffer) # Pattern A
# Switch to seed B
gen.set_seed(2222)
gen.fill_chunk(buffer) # Pattern B
# Back to seed A - resets the stream!
gen.set_seed(1111)
gen.fill_chunk(buffer) # SAME as first chunk (pattern A)
Use cases:
- RAID stripe testing with alternating patterns
- Multi-phase AI/ML workloads (header/payload/footer)
- Complex reproducible data patterns
- Low-overhead stream reset
System Information
import dgen_py
info = dgen_py.get_system_info()
if info:
print(f"NUMA nodes: {info['num_nodes']}")
print(f"Physical cores: {info['physical_cores']}")
print(f"Deployment: {info['deployment_type']}")
Advanced Usage
Multi-Process NUMA (For Multi-NUMA Systems)
For maximum throughput on multi-socket systems, use one Python process per NUMA node with process affinity pinning.
See python/examples/benchmark_numa_multiprocess_v2.py for complete implementation.
Key architecture:
- One Python process per NUMA node
- Process pinning via
os.sched_setaffinity()to local cores - Local memory allocation on each NUMA node
- Synchronized start with multiprocessing.Barrier
Results:
- C4-96 (48 cores, 2 NUMA nodes): 248.53 GB/s aggregate
- C4-32 (16 cores, 1 NUMA node): 162.78 GB/s with 112% scaling efficiency
Performance Notes
Chunk Size Optimization
Default chunk size is automatically optimized for your system. You can override if needed:
gen = dgen_py.Generator(
size=100 * 1024**3,
chunk_size=64 * 1024**2 # Override to 64 MB
)
Newer CPUs (Emerald Rapid, Sapphire Rapids) with larger L3 cache benefit from 64 MB chunks.
Deduplication and Compression Ratios
Performance vs Test Accuracy Tradeoff:
# FAST: Incompressible data (1.0x baseline)
gen = dgen_py.Generator(
size=100 * 1024**3,
dedup_ratio=1.0, # No dedup (no performance impact)
compress_ratio=1.0 # Incompressible data
)
# FASTER: More compressible (1.3-1.5x speedup)
gen = dgen_py.Generator(
size=100 * 1024**3,
dedup_ratio=1.0, # No dedup (no performance impact)
compress_ratio=2.0 # 2:1 compressible data
)
Important: Higher compress_ratio values improve generation performance (1.3-1.5x faster) BUT make the data more compressible, which may not represent your actual workload:
- compress_ratio=1.0: Incompressible data (realistic for encrypted files, compressed archives)
- compress_ratio=2.0: 2:1 compressible data (realistic for text, logs, uncompressed images)
- compress_ratio=3.0+: Highly compressible data (may not be realistic)
Choose based on YOUR test requirements, not performance numbers. If testing storage with compression enabled, use compress_ratio=1.0 to avoid inflating storage efficiency metrics.
Note: dedup_ratio has zero performance impact (< 1% variance)
NUMA Modes
# Auto-detect topology (recommended)
gen = dgen_py.Generator(..., numa_mode="auto")
# Force UMA (single-socket)
gen = dgen_py.Generator(..., numa_mode="uma")
# Manual NUMA node binding (multi-process only)
gen = dgen_py.Generator(..., numa_node=0) # Bind to node 0
Architecture
Zero-Copy Implementation
Python buffer protocol with direct memory access:
- No data copying between Rust and Python
- GIL released during generation (true parallelism)
- Memoryview creation < 0.001ms (verified zero-copy)
Parallel Generation
- 4 MiB internal blocks distributed across all cores
- Thread pool created once, reused for all operations
- Xoshiro256++ RNG (5-10x faster than ChaCha20)
- Optimal for L3 cache performance
NUMA Optimization
- Multi-process architecture (one process per NUMA node)
- Local memory allocation on each node
- Local core affinity (no cross-node traffic)
- Automatic topology detection via hwloc
Use Cases
- Storage benchmarking: Generate realistic test data at 40-188 GB/s
- Network testing: High-throughput data sources
- AI/ML profiling: Simulate data loading pipelines
- Compression testing: Validate compressor behavior with controlled ratios
- Deduplication testing: Test dedup systems with known ratios
License
Dual-licensed under MIT OR Apache-2.0
Credits
- Built with PyO3 and Maturin
- Uses hwlocality for NUMA topology detection
- Xoshiro256++ RNG from rand crate
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dgen_py-0.1.7.tar.gz.
File metadata
- Download URL: dgen_py-0.1.7.tar.gz
- Upload date:
- Size: 179.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
380ca20aff020e4fbcda05efaeb858d21d6de07265e811421f8a67b2a18d9d4b
|
|
| MD5 |
071c9d94877253c56de1567325029404
|
|
| BLAKE2b-256 |
a94ded419d30b37be17a99986a7f7222edd353e39e114ba76fcddd9c89e1b9d0
|
File details
Details for the file dgen_py-0.1.7-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.7-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 574.3 kB
- Tags: PyPy, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d2dc8db9ff01d380e31462ac3bc856683e0a805cd6fa8c29b667f389d67e258
|
|
| MD5 |
57567d15d5569a9c53e2fc2f2670f350
|
|
| BLAKE2b-256 |
f0cfb644f53131a1c5536aee0013c024213ead4bfd1468f69e76d711907e6a58
|
File details
Details for the file dgen_py-0.1.7-cp314-cp314-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.7-cp314-cp314-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 573.9 kB
- Tags: CPython 3.14, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24d6d06b1382054753dae1a18a775a4d95dcefd44cebb789e9870bbda58f1bcd
|
|
| MD5 |
971eee351fd62287e3a85d24cea269b1
|
|
| BLAKE2b-256 |
fa278941f9f50fda320bc3976066bdaffa5d8ac1d048360bf4b0829c3bcfd99c
|
File details
Details for the file dgen_py-0.1.7-cp313-cp313-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.7-cp313-cp313-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 574.2 kB
- Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d9a1d08953bdfcb987a25fb8db03a3ae89e749964953f2258c847a71f0631ea
|
|
| MD5 |
56906f51e84375083ce57aeebf887ff1
|
|
| BLAKE2b-256 |
b34e3f719912c1189f7c32ae85aa23d092d87e8fbfb5128aad37a1c0fcb85696
|
File details
Details for the file dgen_py-0.1.7-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.7-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 574.2 kB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d254051613852ff6e11afe33977e21e0e9224853783048ac04a5dd00b933e8ee
|
|
| MD5 |
5c8ad11f5cb8a7b0eaaa45300e30c046
|
|
| BLAKE2b-256 |
7a68b067e9a452f4edf57628470ad5a6fe5fd53680599c5ae015253e53880ee0
|
File details
Details for the file dgen_py-0.1.7-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.7-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 574.0 kB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80399879091ee50dbca45ce6a47dc5f917ee8668fa6564320dc2c5605d9434bb
|
|
| MD5 |
54661b545a327f80a627e12511daaf27
|
|
| BLAKE2b-256 |
87d7ea81c3c1409b72473e7ff325a78f383fb106cd0694a8db2f4e0b60be9c3d
|
File details
Details for the file dgen_py-0.1.7-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.7-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 574.2 kB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02f9b822123a9d33095e9cd18c2c0f4c939da57d63712a6e98068f66ebf71f2d
|
|
| MD5 |
5454eac4451445408d2eb95c37b75c8b
|
|
| BLAKE2b-256 |
3a887a4302080b3d753deed3ab4bef5f0420f4900d73d68ab7d6162fdb34398c
|