High-performance random data generation with NUMA optimization and zero-copy Python interface

These details have not been verified by PyPI

Project description

dgen-py

High-performance random data generation with NUMA optimization and zero-copy Python interface

Features

🚀 Blazing Fast: 40+ GB/s on 12 cores, 126 GB/s on 96 cores, 188 GB/s on 368 cores
🎯 Controllable Characteristics: Configurable deduplication and compression ratios
🔬 Multi-Process NUMA: One Python process per NUMA node for maximum throughput
🐍 True Zero-Copy: Python buffer protocol with direct memory access (no data copying)
📦 Streaming API: Generate terabytes of data with constant memory usage
🧵 Thread Pool Reuse: Created once, reused across all operations
🛠️ Built with Rust: Memory-safe, production-quality implementation

Performance

Real-World Benchmarks (v0.1.3)

Multi-NUMA Systems (one Python process per NUMA node):

System	Cores	NUMA Nodes	Throughput	Per-Core	Efficiency
GCP C4-16	16	1 (UMA)	39.87 GB/s	2.49 GB/s	100% (baseline)
GCP C4-96	96	4	126.96 GB/s	1.32 GB/s	53%
Azure HBv5	368	16	188.24 GB/s	0.51 GB/s	20%

Single-NUMA Systems (one Python process):

System	Cores	Throughput	Per-Core	Notes
Workstation	12	41.23 GB/s	3.44 GB/s	Development system, UMA

Key Findings:

Sub-linear scaling is expected for memory-intensive workloads (memory bandwidth bottleneck)
All systems far exceed 80 GB/s storage testing requirements
Maximum throughput: 188 GB/s on 368-core HBv5 system
Excellent single-node performance: 40+ GB/s on commodity hardware

Installation

From PyPI (Recommended)

pip install dgen-py

System Requirements

For NUMA support (Linux only):

# Ubuntu/Debian
sudo apt-get install libudev-dev libhwloc-dev

# RHEL/CentOS/Fedora
sudo yum install systemd-devel hwloc-devel

Note: NUMA support is optional. Without these libraries, the package works perfectly on single-NUMA systems (workstations, cloud VMs).

Quick Start

Basic Usage (Fastest - No Dedup/Compression)

import dgen_py

# Generate 100 GB of random data (incompressible, no dedup)
gen = dgen_py.Generator(
    size=100 * 1024**3,      # 100 GB
    dedup_ratio=1.0,         # No deduplication (fastest)
    compress_ratio=1.0,      # Incompressible (fastest)
    numa_mode="auto",        # Auto-detect NUMA topology
    max_threads=None         # Use all available cores
)

# Create buffer (uses optimal 32 MB chunk size)
buffer = bytearray(gen.chunk_size)

# Stream data in chunks (zero-copy, parallel generation)
while not gen.is_complete():
    nbytes = gen.fill_chunk(buffer)
    if nbytes == 0:
        break
    # Write to file/network: buffer[:nbytes]

Performance Example (Actual Results)

import dgen_py
import time

# 100 GB incompressible test
TEST_SIZE = 100 * 1024**3

gen = dgen_py.Generator(
    size=TEST_SIZE,
    dedup_ratio=1.0,         # No deduplication
    compress_ratio=1.0,      # Incompressible
    numa_mode="auto",
    max_threads=None
)

buffer = bytearray(gen.chunk_size)
start = time.perf_counter()

while not gen.is_complete():
    nbytes = gen.fill_chunk(buffer)
    if nbytes == 0:
        break

duration = time.perf_counter() - start
throughput = (TEST_SIZE / 1024**3) / duration

print(f"Duration: {duration:.2f} seconds")
print(f"Throughput: {throughput:.2f} GB/s")

Complete benchmark output (12-core workstation):

NUMA nodes: 1
Physical cores: 12
Deployment: UMA (single NUMA node - cloud VM or workstation)

Starting Benchmark: 3 runs of 100 GB each
Using ZERO-COPY PARALLEL STREAMING

============================================================
TEST 1: DEFAULT CHUNK SIZE (should use optimal 32 MB)
============================================================
Using chunk size: 32 MB
------------------------------------------------------------
Run 01: 3.0401 seconds | 32.89 GB/s
Run 02: 2.1536 seconds | 46.43 GB/s
Run 03: 2.0826 seconds | 48.02 GB/s
------------------------------------------------------------
AVERAGE DURATION:   2.4254 seconds
AVERAGE THROUGHPUT: 41.23 GB/s
PER-CORE THROUGHPUT: 3.44 GB/s

============================================================
TEST 2: OVERRIDE CHUNK SIZE TO 64 MB
============================================================
Using chunk size: 64 MB
------------------------------------------------------------
Run 01: 2.2696 seconds | 44.06 GB/s
Run 02: 2.2647 seconds | 44.16 GB/s
Run 03: 2.2709 seconds | 44.04 GB/s
------------------------------------------------------------
AVERAGE DURATION:   2.2684 seconds
AVERAGE THROUGHPUT: 44.08 GB/s
PER-CORE THROUGHPUT: 3.67 GB/s

============================================================
COMPARISON
============================================================
32 MB (default): 41.23 GB/s
64 MB (override): 44.08 GB/s
64 MB is 6.5% faster than 32 MB

OPTIMIZATION NOTES:
  - Thread pool created ONCE and reused
  - ZERO-COPY: Generates directly into output buffer
  - Internal parallelization: 4 MiB blocks (optimal for L3 cache)
  - Parallel generation distributes blocks across all available cores

System Information

import dgen_py

info = dgen_py.get_system_info()
if info:
    print(f"NUMA nodes: {info['num_nodes']}")
    print(f"Physical cores: {info['physical_cores']}")
    print(f"Deployment: {info['deployment_type']}")

# Example output (12-core workstation):
# NUMA nodes: 1
# Physical cores: 12
# Deployment: UMA (single NUMA node - cloud VM or workstation)

Advanced Usage

Multi-Process NUMA (For Multi-NUMA Systems)

}


### Multi-Process NUMA (For Multi-NUMA Systems)

For maximum throughput on multi-socket systems, use **one Python process per NUMA node**:

```python
from multiprocessing import Process, Queue, Barrier
import dgen_py

def worker_process(numa_node: int, barrier: Barrier, result_queue: Queue):
    """One process per NUMA node for maximum performance"""
    gen = dgen_py.Generator(
        size=100 * 1024**3,      # 100 GB per process
        dedup_ratio=1.0,         # No deduplication
        compress_ratio=1.0,      # Incompressible
        numa_node=numa_node,     # Bind to specific NUMA node
        max_threads=None
    )
    
    buffer = bytearray(gen.chunk_size)
    barrier.wait()  # Synchronized start
    
    start = time.perf_counter()
    while not gen.is_complete():
        nbytes = gen.fill_chunk(buffer)
        if nbytes == 0:
            break
        # Write buffer[:nbytes] to storage
    
    duration = time.perf_counter() - start
    result_queue.put({'numa_node': numa_node, 'duration': duration})

# Detect NUMA topology
num_numa_nodes = dgen_py.detect_numa_nodes()

# Spawn one process per NUMA node
barrier = Barrier(num_numa_nodes)
result_queue = Queue()

processes = [
    Process(target=worker_process, args=(i, barrier, result_queue))
    for i in range(num_numa_nodes)
]

for p in processes:
    p.start()

for p in processes:
    p.join()

# Collect results
# On C4-96 (4 NUMA nodes): 126.96 GB/s aggregate
# On HBv5 (16 NUMA nodes): 188.24 GB/s aggregate

Performance Notes

Chunk Size Optimization

32 MB chunks are optimal (default), but you can override:

gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,
    compress_ratio=1.0,
    chunk_size=64 * 1024**2  # Override to 64 MB
)

Benchmark results (12-core workstation, 100 GB test):

32 MB chunks: 41.23 GB/s (3.44 GB/s per core)
64 MB chunks: 44.08 GB/s (3.67 GB/s per core)
Difference: 64 MB is 6.5% faster on this system

Deduplication and Compression

For maximum performance, use dedup_ratio=1.0 and compress_ratio=1.0:

# FASTEST: No deduplication, incompressible
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=1.0,      # No dedup (fastest)
    compress_ratio=1.0    # Incompressible (fastest)
)

Higher ratios reduce throughput:

# SLOWER: With dedup and compression
gen = dgen_py.Generator(
    size=100 * 1024**3,
    dedup_ratio=2.0,      # 2:1 deduplication
    compress_ratio=3.0    # 3:1 compression
)
# Throughput will be lower due to processing overhead

NUMA Modes

# Auto-detect topology (recommended)
gen = dgen_py.Generator(..., numa_mode="auto")

# Force UMA (single-socket)
gen = dgen_py.Generator(..., numa_mode="uma")

# Manual NUMA node binding (multi-process only)
gen = dgen_py.Generator(..., numa_node=0)  # Bind to node 0

Architecture

Zero-Copy Implementation

Python buffer protocol with direct memory access:

No data copying between Rust and Python
GIL released during generation (true parallelism)
Memoryview creation < 0.001ms (verified zero-copy)

Parallel Generation

4 MiB internal blocks distributed across all cores
Thread pool created once, reused for all operations
Xoshiro256++ RNG (5-10x faster than ChaCha20)
Optimal for L3 cache performance

NUMA Optimization

Multi-process architecture (one process per NUMA node)
Local memory allocation on each node
Local core affinity (no cross-node traffic)
Automatic topology detection via hwloc

Use Cases

Storage benchmarking: Generate realistic test data at 40-188 GB/s
Network testing: High-throughput data sources
AI/ML profiling: Simulate data loading pipelines
Compression testing: Validate compressor behavior with controlled ratios
Deduplication testing: Test dedup systems with known ratios

License

Dual-licensed under MIT OR Apache-2.0

Credits

Built with PyO3 and Maturin
Uses hwlocality for NUMA topology detection
Xoshiro256++ RNG from rand crate

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.4

May 5, 2026

0.2.3

Apr 16, 2026

0.2.2

Mar 27, 2026

0.2.1

Mar 27, 2026

0.2.0

Feb 12, 2026

0.1.7

Jan 25, 2026

0.1.6

Jan 23, 2026

0.1.5

Jan 19, 2026

This version

0.1.4

Jan 18, 2026

0.1.3

Jan 18, 2026

0.1.2

Jan 8, 2026

0.1.1

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgen_py-0.1.4.tar.gz (152.7 kB view details)

Uploaded Jan 18, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dgen_py-0.1.4-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (558.3 kB view details)

Uploaded Jan 18, 2026 PyPymanylinux: glibc 2.28+ x86-64

dgen_py-0.1.4-cp314-cp314-manylinux_2_28_x86_64.whl (558.2 kB view details)

Uploaded Jan 18, 2026 CPython 3.14manylinux: glibc 2.28+ x86-64

dgen_py-0.1.4-cp313-cp313-manylinux_2_28_x86_64.whl (558.3 kB view details)

Uploaded Jan 18, 2026 CPython 3.13manylinux: glibc 2.28+ x86-64

dgen_py-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl (558.2 kB view details)

Uploaded Jan 18, 2026 CPython 3.12manylinux: glibc 2.28+ x86-64

dgen_py-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl (558.2 kB view details)

Uploaded Jan 18, 2026 CPython 3.11manylinux: glibc 2.28+ x86-64

dgen_py-0.1.4-cp310-cp310-manylinux_2_28_x86_64.whl (558.3 kB view details)

Uploaded Jan 18, 2026 CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file dgen_py-0.1.4.tar.gz.

File metadata

Download URL: dgen_py-0.1.4.tar.gz
Upload date: Jan 18, 2026
Size: 152.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`9ec3c16f58d862b4c33c10455bff649cb04517e9f0b97b3fe0401ef1109fcee5`
MD5	`97631c3975bba9c72b93aac3c2b14df3`
BLAKE2b-256	`f0c1c735f6928bace9708f188fa579aa4b4bbed3c30701387218cb86d7d02e76`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.4-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.4-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Upload date: Jan 18, 2026
Size: 558.3 kB
Tags: PyPy, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`910037ad58f18b2a9d453d318ffdd1b864bc32acedaf58a5a87e8b0543c8d9b2`
MD5	`fa92c2d3896a70eed1c9b2c37a5993d1`
BLAKE2b-256	`c6daa923777f85504fbee5b673c9c2d7ee77603fc4d5c75e55863daa9a7d87cd`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.4-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.4-cp314-cp314-manylinux_2_28_x86_64.whl
Upload date: Jan 18, 2026
Size: 558.2 kB
Tags: CPython 3.14, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`0a9e89003e1440e2d3d503a2f31b91a84a9f3f9c2f542fc5a89b8181059b08e7`
MD5	`3db5a26a895769076ebd4da0ce8836dd`
BLAKE2b-256	`4c72f575fbfd72abe12005fc4211451c9e4939361622bc92cc8f6f969550bb8d`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.4-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.4-cp313-cp313-manylinux_2_28_x86_64.whl
Upload date: Jan 18, 2026
Size: 558.3 kB
Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`9f0ccadf274cb5a8adfa31af351903a9319fe1c710e9278d6fa0ec34d8557ff5`
MD5	`fc12ea13ec1d6f2f7896e9863502878a`
BLAKE2b-256	`6d78d79062c4ad81f1a149d873a2548481c5fd5a30cc36a27cfc056f711bccff`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl
Upload date: Jan 18, 2026
Size: 558.2 kB
Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`30352d1da86a04da763df41dc9dc73f2cf046d6b65aa066d07ff6e622f8c703f`
MD5	`c9be335ec60434d67c9e589f4746d29f`
BLAKE2b-256	`943eb3a1abe6e7cb4f9558786f7765dd0a47770209930c067ee54304f376d38a`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl
Upload date: Jan 18, 2026
Size: 558.2 kB
Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`12575cd186a9446e39523388dee53d837661a25d36a26ebe6d05f00dafa86b98`
MD5	`c7436b3ba71ef3c166372b43e9cf915d`
BLAKE2b-256	`e25489e47f19e13c03fc7be72f233b83ba112aed0f3e3c386432fe3b64f4844a`

See more details on using hashes here.

File details

Details for the file dgen_py-0.1.4-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

Download URL: dgen_py-0.1.4-cp310-cp310-manylinux_2_28_x86_64.whl
Upload date: Jan 18, 2026
Size: 558.3 kB
Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.11.5

File hashes

Hashes for dgen_py-0.1.4-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`b43fb3963080a5c784101f5e8412d8fa219bebe7c915041aae7c22067ac192c2`
MD5	`8b2b1f11fa9f34a622872e9f5b9cb649`
BLAKE2b-256	`8eee8017542b65b431beaad4d56b609066911090a669edd0031bc67fb2495f0a`

See more details on using hashes here.

dgen-py 0.1.4

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

dgen-py

Features

Performance

Real-World Benchmarks (v0.1.3)

Installation

From PyPI (Recommended)

System Requirements

Quick Start

Basic Usage (Fastest - No Dedup/Compression)

Performance Example (Actual Results)

System Information

Advanced Usage

Multi-Process NUMA (For Multi-NUMA Systems)

Performance Notes

Chunk Size Optimization

Deduplication and Compression

NUMA Modes

Architecture

Zero-Copy Implementation

Parallel Generation

NUMA Optimization

Use Cases

License

Credits

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes