Skip to main content

Python client for gpuemu GPU-less validation

Project description

gpuemu

Catch silently-wrong GPU kernels from Python — before they reach production.

PyPI Python versions License Docs

gpuemu is the Python client for gpuemu, a GPU-less correctness oracle for deep-learning kernels. It plugs into PyTorch, JAX, and TensorFlow and validates your CUDA/Triton kernels against a high-precision fp64 reference with op-schema-aware, adversarial inputs — finding the silent numerical bugs that torch.allclose misses.


The problem

The industry-standard correctness check for a GPU kernel is one line:

torch.allclose(my_kernel(x), reference(x), atol=1e-5, rtol=1e-2)

One shape, one dtype, one seed. In a measured 26-op corpus that oracle accepts 9/9 LLM-style buggy kernels — tail-mask leaks, accumulator-scale bugs, missing normalisation, online-softmax rescale errors — as "correct". Those kernels then ship and run at scale: GPU-hours wasted on broken work, quality regressions that survive months of green CI.

gpuemu replaces that one-line check with an operator-aware regime that caught 100% of those bugs across 5 GPU classes with zero false positives on controls (P1).

Install

pip install gpuemu            # core client
pip install gpuemu[torch]     # + PyTorch adapter
pip install gpuemu[jax]       # + JAX adapter
pip install gpuemu[tensorflow]
pip install gpuemu[all]       # everything

The client talks to the gpuemu daemon over IPC and will start one on demand. To run the daemon yourself, install the CLI: cargo install gpuemu.

Quick start

from gpuemu import Client

client = Client()

# Fuzz with op-schema-aware inputs and an fp64 reference oracle.
results = client.fuzz_op_client_side(
    "flash_attention",
    run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]),
    iterations=100,
    value_distribution="adversarial",   # the P3 default — 99% bug recall
)

print(f"Passed: {results.passed}/{results.total}")

A failure reports the seed, dtype, shape, and a base64 snapshot of the failing input — re-run it byte-for-byte from any machine, with or without a GPU. The client's SeededRng is bit-identical to the Rust daemon, so reproduction is exact across languages.

Execution modes

# 1. Client-side (recommended): your code runs the GPU op; gpuemu validates.
results = client.fuzz_op_client_side(
    "matmul",
    run_op=lambda i: torch.matmul(i["a"], i["b"]),
    iterations=100,
)

# 2. Daemon-orchestrated: fetch cases, run them yourself, submit outputs.
for case in client.get_test_batch("my_op", count=50):
    out = my_gpu_op(case["inputs"])
    client.submit_output("my_op", case["inputs"], out, case["seed"])

# 3. Reproduce / minimise a known failure from its seed.
repro = client.reproduce(seed)
small = client.minimize(seed)

What you get

Feature What it does
fp64 reference oracle Validates kernel output against a high-precision CPU reference per dtype
Op-schema-aware fuzzing Boundary + regular + adversarial input distributions, per op
Calibrated tolerances calibrate_tolerance() / get_recommended_tolerance() — p95-of-controls × 1.5 envelope (P2: 65% → 82% recall)
Deterministic RNG SeededRng reproduces failures byte-for-byte, identical to the Rust daemon
Framework adapters PyTorch, JAX, TensorFlow — from gpuemu.frameworks.pytorch import validate_pytorch
Static lint client.lint_kernel(...) surfaces PTX/SASS register pressure and spills

The research backing (P1–P4)

Each default is anchored to a measured study — fp64 oracle (P1: 9/9 bugs caught, 0 false positives), calibrated tolerances (P2: +23 pp recall), adversarial fuzzing (P3: 99% recall), and PTX lint (P4). See The Evidence.

Documentation

Development

pip install -e .[dev]
pytest -v          # 11 tests, +7 daemon-live tests

License

Dual-licensed under MIT or Apache 2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuemu-0.1.0.tar.gz (44.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpuemu-0.1.0-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file gpuemu-0.1.0.tar.gz.

File metadata

  • Download URL: gpuemu-0.1.0.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gpuemu-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7d2fcb9902e632eb1083a5b6a8840c46bf351cdfe53bb6130e00025165170fd0
MD5 224df166a03ada20bf66e93167ec182e
BLAKE2b-256 e193869d3eb3e1483c460a3a51db01e4789052c4cf6189aa0694a2c9a84fea70

See more details on using hashes here.

File details

Details for the file gpuemu-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gpuemu-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gpuemu-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a0a353b2b10d5e3849ecec1ce69ecf97a9a6dc69a1ed75e1d4f21ee82f80e9e
MD5 b4d2165ac00479953aaf31f45a9aea05
BLAKE2b-256 395ee7f5b1e3e43ad949eb11992f14489e246a066817b5e883503317fe0d94c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page