Skip to main content

High-performance random data generation with NUMA optimization and zero-copy Python interface

Project description

dgen-rs / dgen-py

High-performance random data generation with controllable deduplication, compression, and NUMA optimization

License: MIT OR Apache-2.0 Rust Version Python Version Version

Features

  • 🚀 Blazing Fast: 40-50 GB/s on 12 cores (3.5-4 GB/s per core) - scales linearly to 1,500+ GB/s on 384 cores
  • 🎯 Controllable Characteristics:
    • Deduplication ratios (1:1 to N:1)
    • Compression ratios (1:1 to N:1)
  • 🔬 NUMA-Aware: Automatic topology detection and optimization on multi-socket systems
  • 🐍 True Zero-Copy Python API: Direct buffer writes with GIL release for maximum performance
  • 📦 Both One-Shot and Streaming: Single-call or incremental generation with parallel execution
  • 🧵 Thread Pool Reuse: Created once, reused for all operations (eliminates overhead)
  • 🛠️ Built with Rust: Memory-safe, production-quality code

Performance

Development System (12 cores):

  • Python: 43.25 GB/s (3.60 GB/s per core)
  • Native Rust: 47.18 GB/s (3.93 GB/s per core)

HPC System (384 cores, projected):

  • Expected throughput: 1,384-1,500 GB/s
  • Perfect for high-speed storage testing (easily exceeds 80 GB/s targets)

System Requirements

Runtime Dependencies

No runtime dependencies for basic UMA (non-NUMA) usage.

NUMA Support (Optional)

For NUMA-aware allocation and optimization, the following system libraries are required:

Ubuntu/Debian:

sudo apt-get install libudev-dev libhwloc-dev

RHEL/CentOS/Fedora:

sudo yum install systemd-devel hwloc-devel

macOS:

brew install hwloc

Note: Without these libraries, the NUMA feature will not compile. The library will fall back to UMA (uniform memory access) mode, which still provides excellent performance on single-socket systems.

Build Dependencies

  • Rust: 1.90 or later
  • Python: 3.10 or later (for Python bindings)
  • maturin: pip install maturin (for building Python wheels)

Quick Start

Python Installation

# Install from PyPI (when published)
pip install dgen-py

# Or build from source
cd dgen-rs
./build_pyo3.sh
pip install ./target/wheels/*.whl

Python Usage

Simple API (generate all at once):

import dgen_py

# Generate 100 MiB incompressible data
data = dgen_py.generate_buffer(100 * 1024 * 1024)
print(f"Generated {len(data)} bytes")

# Generate with 2:1 dedup and 3:1 compression
data = dgen_py.generate_buffer(
    size=100 * 1024 * 1024,
    dedup_ratio=2.0,
    compress_ratio=3.0,
    numa_mode="auto",
    max_threads=None  # Use all cores
)

Zero-Copy API (write into existing buffer):

import dgen_py

# Pre-allocate buffer (32 MB is optimal)
buf = bytearray(32 * 1024 * 1024)  # 32 MB

# Generate directly into buffer (TRUE zero-copy!)
nbytes = dgen_py.generate_into_buffer(
    buf, 
    dedup_ratio=1.0,
    compress_ratio=1.0,
    numa_mode="auto",
    max_threads=None
)
print(f"Wrote {nbytes} bytes")

Streaming API (incremental generation with parallel execution):

import dgen_py

# Create generator for 1 TB
gen = dgen_py.Generator(
    size=1024**4,  # 1 TB
    dedup_ratio=1.0,
    compress_ratio=1.0,
    numa_mode="auto",  # Auto-detect NUMA topology
    max_threads=None   # Use all cores
)

# Optimal chunk size: 32 MB (default, empirically tested)
# Can override with chunk_size parameter if needed
buf = bytearray(gen.chunk_size)  # Uses recommended 32 MB

while not gen.is_complete():
    nbytes = gen.fill_chunk(buf)  # Zero-copy parallel generation
    if nbytes == 0:
        break
    
    # Write to storage (buf[:nbytes])
    # file.write(buf[:nbytes])

# Expected performance: 40-50 GB/s on 12 cores, 1,500+ GB/s on 384 cores

Key Performance Tips:

  • Default 32 MB chunks provide optimal performance (16% faster than 64 MB)
  • Can override with chunk_size parameter: Generator(..., chunk_size=64*1024*1024)
  • Chunks < 8 MB fall back to sequential generation (much slower)
  • numa_mode="auto" optimizes for multi-socket systems
  • Thread pool is reused across all fill_chunk() calls (zero overhead)

**NUMA Information**:

```python
import dgen_py

info = dgen_py.get_system_info()
if info:
    print(f"NUMA nodes: {info['num_nodes']}")
    print(f"Physical cores: {info['physical_cores']}")
    print(f"Deployment: {info['deployment_type']}")

Rust Usage

use dgen_rs::{generate_data_simple, GeneratorConfig, DataGenerator};

// Simple API
let data = generate_data_simple(100 * 1024 * 1024, 1, 1);

// Full configuration
let config = GeneratorConfig {
    size: 100 * 1024 * 1024,
    dedup_factor: 2,
    compress_factor: 3,
    numa_aware: true,
};
let data = dgen_rs::generate_data(config);

// Streaming
let mut gen = DataGenerator::new(config);
let mut chunk = vec![0u8; 8192];
while !gen.is_complete() {
    let written = gen.fill_chunk(&mut chunk);
    if written == 0 {
        break;
    }
    // Process chunk...
}

How It Works

Deduplication

Deduplication ratio N means:

  • Generate total_blocks / N unique blocks
  • Reuse blocks in round-robin fashion
  • Example: 100 blocks, dedup=2 → 50 unique blocks, repeated 2x each

Compression

Compression ratio N means:

  • Fill block with high-entropy Xoshiro256++ keystream
  • Add local back-references to achieve N:1 compressibility
  • Example: compress=3 → zstd will compress to ~33% of original size

compress=1: Truly incompressible (zstd ratio ~1.00-1.02)
compress>1: Target ratio via local back-refs, evenly distributed

NUMA Optimization

On multi-socket systems (NUMA nodes > 1):

  • Detects topology via /sys/devices/system/node (Linux)
  • Can pin rayon threads to specific NUMA nodes (optional)
  • Ensures memory locality for maximum bandwidth

Performance

Typical throughput on modern CPUs:

  • Incompressible (compress=1): 5-15 GB/s per core
  • Compressible (compress=3): 1-4 GB/s per core
  • Multi-core: Near-linear scaling with rayon

Benchmark on AMD EPYC 7742 (64 cores):

Incompressible:  ~500 GB/s (all cores)
Compress 3:1:    ~150 GB/s (all cores)

Algorithm Details

Based on s3dlio's data_gen_alt.rs:

  1. Block-level generation: 4 MiB blocks processed in parallel
  2. Xoshiro256++: 5-10x faster than ChaCha20, cryptographically strong
  3. Integer error accumulation: Even compression distribution
  4. No cross-block compression: Realistic compressor behavior
  5. Per-call entropy: Unique data across distributed nodes

Use Cases

  • Storage benchmarking: Generate realistic test data
  • Network testing: High-throughput data sources
  • AI/ML profiling: Simulate data loading pipelines
  • Compression testing: Validate compressor behavior
  • Deduplication testing: Test dedup ratios

Building from Source

# Clone repository
git clone https://github.com/russfellows/dgen-rs.git
cd dgen-rs

# Build Rust library
cargo build --release

# Build Python wheel
maturin build --release

# Install locally
maturin develop --release

# Run tests
cargo test
python -m pytest python/tests/

Requirements

  • Rust: 1.90+ (edition 2021)
  • Python: 3.10+ (for Python bindings)
  • Platform: Linux (NUMA detection required)

License

Dual-licensed under MIT OR Apache-2.0

Credits

See Also

  • s3dlio: High-performance multi-protocol storage I/O
  • sai3-bench: Multi-protocol I/O benchmarking suite
  • kv-cache-bench: LLM KV cache storage benchmarking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgen_py-0.1.3.tar.gz (152.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dgen_py-0.1.3-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (558.1 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

dgen_py-0.1.3-cp314-cp314-manylinux_2_28_x86_64.whl (558.0 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

dgen_py-0.1.3-cp313-cp313-manylinux_2_28_x86_64.whl (558.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

dgen_py-0.1.3-cp312-cp312-manylinux_2_28_x86_64.whl (558.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

dgen_py-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl (558.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

dgen_py-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl (558.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file dgen_py-0.1.3.tar.gz.

File metadata

  • Download URL: dgen_py-0.1.3.tar.gz
  • Upload date:
  • Size: 152.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8c50df4f6fb4be43fbc73f90ceb1b6d5958df949b7b1b054e59407b63168e29a
MD5 4e7914755d12d4dc8beb30fa4e121414
BLAKE2b-256 18ff645e7dba2d462ea0adfce197e1b45b686238e511f37e631ebbfb7cd22109

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.3-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.3-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1e0574d98c953f984e8bb03d67c0f41dc703f62b97cc9b32cb98184e6f75f87f
MD5 000fe7c46a0584ddd95934c6439c480e
BLAKE2b-256 8fe74f1d39addcd07067ebeafacac0ad0d6349607a0ae50992526b7dbf07d780

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.3-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.3-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1bf75767faf2e9444a8565715e902f1fa3853426cf8e9ac4fbc1c6be01328670
MD5 f6e3d6c1014f0e8773ba10d5a962adc6
BLAKE2b-256 f08ccdb0273059c0439fedeb942268bd505169a9f7f4daa71450d085e716a9ab

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.3-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.3-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 920a2e792e02ff35a47aaf68af23478d83f426fb8f2cb082bda83176a0afcd78
MD5 847b8aafb29966f61dddb29a7d828d5b
BLAKE2b-256 070d22e860864de94b2d1f57946ef34844002eb6561b2a5d547e1267bf5f0ff9

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.3-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2c11465b108e2eb2496fa09dd363533ab841bdb767190415b22bd2f1f0663348
MD5 46c506579fb8b2dc2c21e6e680cf5cdf
BLAKE2b-256 ad337ce3b56e0f775cd0a09b6b573c4b767b9c4c54f04fad42ef4e7f3fb2329e

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ed57b1a2e0ceec0d62d3cd871144e5cc3a22d1aedebc3c0a6619f3a3478778e1
MD5 6a75a8509246fa9becafe864a08e8b70
BLAKE2b-256 f872a96244adc879ca1716958f4af916b9bbebefabac8f7fa564dcae07b481d7

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dgen_py-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5cc1b84584dea16731873d1f0f2d9113bdcf6d6617cb3c3eb9f24e4a04440731
MD5 6b4899e8070de04bb7bdb8ae1e041f1c
BLAKE2b-256 d684daad7faa79a9b616c1737c7d520d0acc84f688f00a921f04f46bd89d809d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page