High-performance random data generation with NUMA optimization and zero-copy Python interface
Project description
dgen-rs / dgen-py
High-performance random data generation with controllable deduplication, compression, and NUMA optimization
Features
- 🚀 Blazing Fast: 40-50 GB/s on 12 cores (3.5-4 GB/s per core) - scales linearly to 1,500+ GB/s on 384 cores
- 🎯 Controllable Characteristics:
- Deduplication ratios (1:1 to N:1)
- Compression ratios (1:1 to N:1)
- 🔬 NUMA-Aware: Automatic topology detection and optimization on multi-socket systems
- 🐍 True Zero-Copy Python API: Direct buffer writes with GIL release for maximum performance
- 📦 Both One-Shot and Streaming: Single-call or incremental generation with parallel execution
- 🧵 Thread Pool Reuse: Created once, reused for all operations (eliminates overhead)
- 🛠️ Built with Rust: Memory-safe, production-quality code
Performance
Development System (12 cores):
- Python: 43.25 GB/s (3.60 GB/s per core)
- Native Rust: 47.18 GB/s (3.93 GB/s per core)
HPC System (384 cores, projected):
- Expected throughput: 1,384-1,500 GB/s
- Perfect for high-speed storage testing (easily exceeds 80 GB/s targets)
System Requirements
Runtime Dependencies
No runtime dependencies for basic UMA (non-NUMA) usage.
NUMA Support (Optional)
For NUMA-aware allocation and optimization, the following system libraries are required:
Ubuntu/Debian:
sudo apt-get install libudev-dev libhwloc-dev
RHEL/CentOS/Fedora:
sudo yum install systemd-devel hwloc-devel
macOS:
brew install hwloc
Note: Without these libraries, the NUMA feature will not compile. The library will fall back to UMA (uniform memory access) mode, which still provides excellent performance on single-socket systems.
Build Dependencies
- Rust: 1.90 or later
- Python: 3.10 or later (for Python bindings)
- maturin:
pip install maturin(for building Python wheels)
Quick Start
Python Installation
# Install from PyPI (when published)
pip install dgen-py
# Or build from source
cd dgen-rs
./build_pyo3.sh
pip install ./target/wheels/*.whl
Python Usage
Simple API (generate all at once):
import dgen_py
# Generate 100 MiB incompressible data
data = dgen_py.generate_buffer(100 * 1024 * 1024)
print(f"Generated {len(data)} bytes")
# Generate with 2:1 dedup and 3:1 compression
data = dgen_py.generate_buffer(
size=100 * 1024 * 1024,
dedup_ratio=2.0,
compress_ratio=3.0,
numa_mode="auto",
max_threads=None # Use all cores
)
Zero-Copy API (write into existing buffer):
import dgen_py
# Pre-allocate buffer (32 MB is optimal)
buf = bytearray(32 * 1024 * 1024) # 32 MB
# Generate directly into buffer (TRUE zero-copy!)
nbytes = dgen_py.generate_into_buffer(
buf,
dedup_ratio=1.0,
compress_ratio=1.0,
numa_mode="auto",
max_threads=None
)
print(f"Wrote {nbytes} bytes")
Streaming API (incremental generation with parallel execution):
import dgen_py
# Create generator for 1 TB
gen = dgen_py.Generator(
size=1024**4, # 1 TB
dedup_ratio=1.0,
compress_ratio=1.0,
numa_mode="auto", # Auto-detect NUMA topology
max_threads=None # Use all cores
)
# Optimal chunk size: 32 MB (default, empirically tested)
# Can override with chunk_size parameter if needed
buf = bytearray(gen.chunk_size) # Uses recommended 32 MB
while not gen.is_complete():
nbytes = gen.fill_chunk(buf) # Zero-copy parallel generation
if nbytes == 0:
break
# Write to storage (buf[:nbytes])
# file.write(buf[:nbytes])
# Expected performance: 40-50 GB/s on 12 cores, 1,500+ GB/s on 384 cores
Key Performance Tips:
- Default 32 MB chunks provide optimal performance (16% faster than 64 MB)
- Can override with
chunk_sizeparameter:Generator(..., chunk_size=64*1024*1024) - Chunks < 8 MB fall back to sequential generation (much slower)
numa_mode="auto"optimizes for multi-socket systems- Thread pool is reused across all
fill_chunk()calls (zero overhead)
**NUMA Information**:
```python
import dgen_py
info = dgen_py.get_system_info()
if info:
print(f"NUMA nodes: {info['num_nodes']}")
print(f"Physical cores: {info['physical_cores']}")
print(f"Deployment: {info['deployment_type']}")
Rust Usage
use dgen_rs::{generate_data_simple, GeneratorConfig, DataGenerator};
// Simple API
let data = generate_data_simple(100 * 1024 * 1024, 1, 1);
// Full configuration
let config = GeneratorConfig {
size: 100 * 1024 * 1024,
dedup_factor: 2,
compress_factor: 3,
numa_aware: true,
};
let data = dgen_rs::generate_data(config);
// Streaming
let mut gen = DataGenerator::new(config);
let mut chunk = vec![0u8; 8192];
while !gen.is_complete() {
let written = gen.fill_chunk(&mut chunk);
if written == 0 {
break;
}
// Process chunk...
}
How It Works
Deduplication
Deduplication ratio N means:
- Generate
total_blocks / Nunique blocks - Reuse blocks in round-robin fashion
- Example: 100 blocks, dedup=2 → 50 unique blocks, repeated 2x each
Compression
Compression ratio N means:
- Fill block with high-entropy Xoshiro256++ keystream
- Add local back-references to achieve N:1 compressibility
- Example: compress=3 → zstd will compress to ~33% of original size
compress=1: Truly incompressible (zstd ratio ~1.00-1.02)
compress>1: Target ratio via local back-refs, evenly distributed
NUMA Optimization
On multi-socket systems (NUMA nodes > 1):
- Detects topology via
/sys/devices/system/node(Linux) - Can pin rayon threads to specific NUMA nodes (optional)
- Ensures memory locality for maximum bandwidth
Performance
Typical throughput on modern CPUs:
- Incompressible (compress=1): 5-15 GB/s per core
- Compressible (compress=3): 1-4 GB/s per core
- Multi-core: Near-linear scaling with rayon
Benchmark on AMD EPYC 7742 (64 cores):
Incompressible: ~500 GB/s (all cores)
Compress 3:1: ~150 GB/s (all cores)
Algorithm Details
Based on s3dlio's data_gen_alt.rs:
- Block-level generation: 4 MiB blocks processed in parallel
- Xoshiro256++: 5-10x faster than ChaCha20, cryptographically strong
- Integer error accumulation: Even compression distribution
- No cross-block compression: Realistic compressor behavior
- Per-call entropy: Unique data across distributed nodes
Use Cases
- Storage benchmarking: Generate realistic test data
- Network testing: High-throughput data sources
- AI/ML profiling: Simulate data loading pipelines
- Compression testing: Validate compressor behavior
- Deduplication testing: Test dedup ratios
Building from Source
# Clone repository
git clone https://github.com/russfellows/dgen-rs.git
cd dgen-rs
# Build Rust library
cargo build --release
# Build Python wheel
maturin build --release
# Install locally
maturin develop --release
# Run tests
cargo test
python -m pytest python/tests/
Requirements
- Rust: 1.90+ (edition 2021)
- Python: 3.10+ (for Python bindings)
- Platform: Linux (NUMA detection required)
License
Dual-licensed under MIT OR Apache-2.0
Credits
See Also
- s3dlio: High-performance multi-protocol storage I/O
- sai3-bench: Multi-protocol I/O benchmarking suite
- kv-cache-bench: LLM KV cache storage benchmarking
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dgen_py-0.1.3.tar.gz.
File metadata
- Download URL: dgen_py-0.1.3.tar.gz
- Upload date:
- Size: 152.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c50df4f6fb4be43fbc73f90ceb1b6d5958df949b7b1b054e59407b63168e29a
|
|
| MD5 |
4e7914755d12d4dc8beb30fa4e121414
|
|
| BLAKE2b-256 |
18ff645e7dba2d462ea0adfce197e1b45b686238e511f37e631ebbfb7cd22109
|
File details
Details for the file dgen_py-0.1.3-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.3-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 558.1 kB
- Tags: PyPy, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e0574d98c953f984e8bb03d67c0f41dc703f62b97cc9b32cb98184e6f75f87f
|
|
| MD5 |
000fe7c46a0584ddd95934c6439c480e
|
|
| BLAKE2b-256 |
8fe74f1d39addcd07067ebeafacac0ad0d6349607a0ae50992526b7dbf07d780
|
File details
Details for the file dgen_py-0.1.3-cp314-cp314-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.3-cp314-cp314-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 558.0 kB
- Tags: CPython 3.14, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bf75767faf2e9444a8565715e902f1fa3853426cf8e9ac4fbc1c6be01328670
|
|
| MD5 |
f6e3d6c1014f0e8773ba10d5a962adc6
|
|
| BLAKE2b-256 |
f08ccdb0273059c0439fedeb942268bd505169a9f7f4daa71450d085e716a9ab
|
File details
Details for the file dgen_py-0.1.3-cp313-cp313-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.3-cp313-cp313-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 558.2 kB
- Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
920a2e792e02ff35a47aaf68af23478d83f426fb8f2cb082bda83176a0afcd78
|
|
| MD5 |
847b8aafb29966f61dddb29a7d828d5b
|
|
| BLAKE2b-256 |
070d22e860864de94b2d1f57946ef34844002eb6561b2a5d547e1267bf5f0ff9
|
File details
Details for the file dgen_py-0.1.3-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.3-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 558.0 kB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c11465b108e2eb2496fa09dd363533ab841bdb767190415b22bd2f1f0663348
|
|
| MD5 |
46c506579fb8b2dc2c21e6e680cf5cdf
|
|
| BLAKE2b-256 |
ad337ce3b56e0f775cd0a09b6b573c4b767b9c4c54f04fad42ef4e7f3fb2329e
|
File details
Details for the file dgen_py-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 558.0 kB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed57b1a2e0ceec0d62d3cd871144e5cc3a22d1aedebc3c0a6619f3a3478778e1
|
|
| MD5 |
6a75a8509246fa9becafe864a08e8b70
|
|
| BLAKE2b-256 |
f872a96244adc879ca1716958f4af916b9bbebefabac8f7fa564dcae07b481d7
|
File details
Details for the file dgen_py-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dgen_py-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 558.1 kB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cc1b84584dea16731873d1f0f2d9113bdcf6d6617cb3c3eb9f24e4a04440731
|
|
| MD5 |
6b4899e8070de04bb7bdb8ae1e041f1c
|
|
| BLAKE2b-256 |
d684daad7faa79a9b616c1737c7d520d0acc84f688f00a921f04f46bd89d809d
|