Python client for gpuemu GPU-less validation

These details have not been verified by PyPI

Project links

Project description

gpuemu

Catch silently-wrong GPU kernels from Python — before they reach production.

gpuemu is the Python client for gpuemu, a GPU-less correctness oracle for deep-learning kernels. It plugs into PyTorch, JAX, and TensorFlow and validates your CUDA/Triton kernels against a high-precision fp64 reference with op-schema-aware, adversarial inputs — finding the silent numerical bugs that torch.allclose misses.

The problem

The industry-standard correctness check for a GPU kernel is one line:

torch.allclose(my_kernel(x), reference(x), atol=1e-5, rtol=1e-2)

One shape, one dtype, one seed. In a measured 26-op corpus that oracle accepts 9/9 LLM-style buggy kernels — tail-mask leaks, accumulator-scale bugs, missing normalisation, online-softmax rescale errors — as "correct". Those kernels then ship and run at scale: GPU-hours wasted on broken work, quality regressions that survive months of green CI.

gpuemu replaces that one-line check with an operator-aware regime that caught 100% of those bugs across 5 GPU classes with zero false positives on controls (P1).

Install

pip install gpuemu            # core client
pip install gpuemu[torch]     # + PyTorch adapter
pip install gpuemu[jax]       # + JAX adapter
pip install gpuemu[tensorflow]
pip install gpuemu[all]       # everything

The client talks to the gpuemu daemon over IPC and will start one on demand. To run the daemon yourself, install the CLI: cargo install gpuemu.

Quick start

from gpuemu import Client

client = Client()

# Fuzz with op-schema-aware inputs and an fp64 reference oracle.
results = client.fuzz_op_client_side(
    "flash_attention",
    run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]),
    iterations=100,
    value_distribution="adversarial",   # the P3 default — 99% bug recall
)

print(f"Passed: {results.passed}/{results.total}")

A failure reports the seed, dtype, shape, and a base64 snapshot of the failing input — re-run it byte-for-byte from any machine, with or without a GPU. The client's SeededRng is bit-identical to the Rust daemon, so reproduction is exact across languages.

Execution modes

# 1. Client-side (recommended): your code runs the GPU op; gpuemu validates.
results = client.fuzz_op_client_side(
    "matmul",
    run_op=lambda i: torch.matmul(i["a"], i["b"]),
    iterations=100,
)

# 2. Daemon-orchestrated: fetch cases, run them yourself, submit outputs.
for case in client.get_test_batch("my_op", count=50):
    out = my_gpu_op(case["inputs"])
    client.submit_output("my_op", case["inputs"], out, case["seed"])

# 3. Reproduce / minimise a known failure from its seed.
repro = client.reproduce(seed)
small = client.minimize(seed)

What you get

Feature	What it does
fp64 reference oracle	Validates kernel output against a high-precision CPU reference per dtype
Op-schema-aware fuzzing	Boundary + regular + adversarial input distributions, per op
Calibrated tolerances	`calibrate_tolerance()` / `get_recommended_tolerance()` — p95-of-controls × 1.5 envelope (P2: 65% → 82% recall)
Deterministic RNG	`SeededRng` reproduces failures byte-for-byte, identical to the Rust daemon
Framework adapters	PyTorch, JAX, TensorFlow — `from gpuemu.frameworks.pytorch import validate_pytorch`
Static lint	`client.lint_kernel(...)` surfaces PTX/SASS register pressure and spills

The research backing (P1–P4)

Each default is anchored to a measured study — fp64 oracle (P1: 9/9 bugs caught, 0 false positives), calibrated tolerances (P2: +23 pp recall), adversarial fuzzing (P3: 99% recall), and PTX lint (P4). See The Evidence.

Documentation

Quick start: 5-minute first validation
Project docs: docs.skelfresearch.com/gpuemu
Source & issues: github.com/Skelf-Research/gpuemu

Development

pip install -e .[dev]
pytest -v          # 11 tests, +7 daemon-live tests

License

Dual-licensed under MIT or Apache 2.0 at your option.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuemu-0.1.0.tar.gz (44.3 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpuemu-0.1.0-py3-none-any.whl (41.2 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file gpuemu-0.1.0.tar.gz.

File metadata

Download URL: gpuemu-0.1.0.tar.gz
Upload date: Jun 18, 2026
Size: 44.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gpuemu-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7d2fcb9902e632eb1083a5b6a8840c46bf351cdfe53bb6130e00025165170fd0`
MD5	`224df166a03ada20bf66e93167ec182e`
BLAKE2b-256	`e193869d3eb3e1483c460a3a51db01e4789052c4cf6189aa0694a2c9a84fea70`

See more details on using hashes here.

File details

Details for the file gpuemu-0.1.0-py3-none-any.whl.

File metadata

Download URL: gpuemu-0.1.0-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 41.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gpuemu-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a0a353b2b10d5e3849ecec1ce69ecf97a9a6dc69a1ed75e1d4f21ee82f80e9e`
MD5	`b4d2165ac00479953aaf31f45a9aea05`
BLAKE2b-256	`395ee7f5b1e3e43ad949eb11992f14489e246a066817b5e883503317fe0d94c9`

See more details on using hashes here.

gpuemu 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gpuemu

The problem

Install

Quick start

Execution modes

What you get

The research backing (P1–P4)

Documentation

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes