Skip to main content

PTX Inject and Stack PTX for Python

Project description

mm-ptx (Python)

PTX Inject and Stack PTX with Python bindings.

This package ships two small, header-only C libraries plus Python wrappers:

  • PTX Inject: find marked sites in PTX and inject your own PTX at those sites.
  • Stack PTX: generate PTX stubs you can inject at those sites.

PTX Inject: what you write

Mark a site in CUDA with macros:

#include <ptx_inject.h>

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    PTX_INJECT("func",
        PTX_IN (F32, x, x),
        PTX_MOD(F32, y, y),
        PTX_OUT(F32, z, z)
    );
    out[0] = z;
}

Compile the CUDA to PTX (nvcc or cuda.core), then build and inject a stub in Python:

from mm_ptx.ptx_inject import PTXInject

annotated_ptx = "..."  # PTX from nvcc/cuda.core
inject = PTXInject(annotated_ptx)

func = inject["func"]
stub = (
    f"\tadd.ftz.f32 %{func['y'].reg}, %{func['x'].reg}, %{func['y'].reg};\n"
    f"\tadd.ftz.f32 %{func['z'].reg}, %{func['x'].reg}, %{func['y'].reg};"
)

final_ptx = inject.render_ptx({"func": stub})

This would be equivalent to writing this CUDA kernel directly but without the CUDA to PTX compilation overhead:

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    y = x + y;
    z = x + y;
    out[0] = z;
}

Stack PTX: stack-based instruction compiler

If you do not want to hand-write PTX, you can use Stack PTX to generate the stub:

from mm_ptx.stack_ptx import RegisterRegistry
from stack_ptx_default_types import Stack, PtxInstruction, compiler

# Setup naming associations
registry = RegisterRegistry()
registry.add(func["x"].reg, Stack.f32, name="x")
registry.add(func["y"].reg, Stack.f32, name="y")
registry.add(func["z"].reg, Stack.f32, name="z")
registry.freeze()

# Instructions to run
instructions = [
    registry.x,                     # Push 'x'
    registry.y,                     # Push 'y'
    PtxInstruction.add_ftz_f32,     # Pop 'x', Pop 'y', Push ('x' + 'y')
    registry.x,                     # Push 'x'
    PtxInstruction.add_ftz_f32      # Pop 'x', Pop ('x' + 'y'), Push ('x' + ('x' + 'y')) 
]

# Create ptx stub
ptx_stub = compiler.compile(
    registry=registry,
    instructions=instructions,
    requests=[registry.z],
    ...
)

# Inject the ptx stub in to the ptx inject site/s
final_ptx = inject.render_ptx({"func": ptx_stub})

Printing ptx_stub gives:

    {
    .reg .f32 %_a<2>;
    add.ftz.f32 %_a0, %_x0, %_x2;
    add.ftz.f32 %_a1, %_x2, %_a0;
    mov.f32 %_x1, %_a1;
    }

This would be equivalent to writing this CUDA kernel directly but without the CUDA to PTX compilation overhead:

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    z = x + (x + y);
    out[0] = z;
}

Stack PTX instruction descriptions

The instruction definitions are defined by the user and are not part of the core Stack PTX system. This allows customization of the described instructions to fit the users demands.

Install

pip install mm-ptx

Requires Python 3.9+.

Tests

python -m pip install -e .
python -m unittest discover -s tests

CUDA integration tests are skipped by default. To run them (requires cuda.core, cuda.bindings, and a CUDA-capable GPU):

 MM_PTX_RUN_CUDA_TESTS=1 python -m unittest discover -s tests

Examples

More details

For the C/C++ headers and deeper implementation notes, see the mm-ptx repo:

License

MIT. See LICENSE.

Citation

If you use this software in your work, please cite it using the following BibTeX entry (generated from CITATION.cff):

@software{Durham_mm-ptx_2025,
  author       = {Durham, Charlie},
  title        = {mm-ptx: PTX Inject and Stack PTX for Python},
  version      = {1.0.0},
  date-released = {2025-10-19},
  url          = {https://github.com/MetaMachines/mm-ptx-py}
}

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mm_ptx-1.0.0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mm_ptx-1.0.0-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

mm_ptx-1.0.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (983.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-1.0.0-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

mm_ptx-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (984.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-1.0.0-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

mm_ptx-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (984.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-1.0.0-cp39-cp39-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

mm_ptx-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (984.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file mm_ptx-1.0.0.tar.gz.

File metadata

  • Download URL: mm_ptx-1.0.0.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mm_ptx-1.0.0.tar.gz
Algorithm Hash digest
SHA256 730beed5408841d6064dc4115e5aa3f458299fee3eec6a01d6c54c89781d59a6
MD5 724c24d2773b410e9f32c773f8dd1a41
BLAKE2b-256 0ae7d485718174ce05193448bff4ca2f6720fcdb01ae32c9c6cf7b85edac22f5

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a7a52a811d119ba28c4968a7575d721e1a73b478452d4d61b5474f29976f465e
MD5 52de5931ee3b26bd4d0924e59f9fc245
BLAKE2b-256 6ae7488b88a87b3405d587a33b105e625242a4309dd218ec4ecff714bad5f801

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b4087552a63678c019746f956af88ba3816db8117e8fbc320213ee289638b332
MD5 8cb811702f662b3edbc364355ac46f97
BLAKE2b-256 9c943f8f76d20ad6128bf4dd72f16d5e350f7de46a368c7cc676be68857d5a19

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9073f9db331ce30bd43885a613f6d9aecdd235762ee4a4ac028bd7a93ddc2232
MD5 f929b4a721e8f477b0c3ec82328fa527
BLAKE2b-256 d64d3926782f1deaec5ba4e8d693fed519cc7d1a8300c9bead5f49bbaef79e45

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 99f05107f6239895e7171812644622168f2ca8a66ea9fa76a8cd1f06b8f4f69b
MD5 a98942ac61f912853a780851b0410af4
BLAKE2b-256 a3534ca245d3f3fd115f4d69ff78f4e40a5c864bbe486749b94b7181a972ae5d

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a900ba37d25bb30888d37bbdc637b1acda0cbe759a28fa23ae51201dc7d66a6c
MD5 e0ffb4343a591abe37993a9ce7942d3e
BLAKE2b-256 d82b8d2d2afdfc4048006d7e03eec77208aad3b5c76fce6afda18acca85f8980

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c7991170b164cb5d964de2169eeeec6d72c837040c6b9ab2b3e3467cb931aa83
MD5 9ad8a3b46b2461f7609c483bdabf0814
BLAKE2b-256 09b159f0ad9bb3ed2517eceb5c02084e57162cf16f96c5e8668255754cf46d58

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b8f1b4d54f402e7f075b85213b7339e219ec975e25e4e33b9ed73018db83a87a
MD5 d587ab3ab9aaa3523e5309bf290b3f8a
BLAKE2b-256 fba12d6a2664e9286aaed52e7eb87b26050a61c55c9ff2ccc2000f019be37eb2

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7f603ef66e9c5e48a538722cebbfcbb164e4a786d564b4b021e6ec868d3cf67f
MD5 75ea784ca55a1b91ab5695bc22991f8e
BLAKE2b-256 54fa00e4f76256eb6b3a90fcf7ef2604ded5121f9126c07dabbd4cf46b02083c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page