Skip to main content

PTX Inject and Stack PTX for Python

Project description

mm-ptx (Python)

PTX Inject and Stack PTX with Python bindings.

This package ships two small, header-only C libraries plus Python wrappers:

  • PTX Inject: find marked sites in PTX and inject your own PTX at those sites.
  • Stack PTX: generate PTX stubs you can inject at those sites.

PTX Inject: what you write

Mark a site in CUDA with macros:

#include <ptx_inject.h>

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    PTX_INJECT("func",
        PTX_IN (F32, x, x),
        PTX_MOD(F32, y, y),
        PTX_OUT(F32, z, z)
    );
    out[0] = z;
}

Compile the CUDA to PTX (nvcc or cuda.core), then build and inject a stub in Python:

from mm_ptx.ptx_inject import PTXInject

annotated_ptx = "..."  # PTX from nvcc/cuda.core
inject = PTXInject(annotated_ptx)

func = inject["func"]
stub = (
    f"\tadd.ftz.f32 %{func['y'].reg}, %{func['x'].reg}, %{func['y'].reg};\n"
    f"\tadd.ftz.f32 %{func['z'].reg}, %{func['x'].reg}, %{func['y'].reg};"
)

final_ptx = inject.render_ptx({"func": stub})

This would be equivalent to writing this CUDA kernel directly but without the CUDA to PTX compilation overhead:

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    y = x + y;
    z = x + y;
    out[0] = z;
}

Stack PTX: stack-based instruction compiler

If you do not want to hand-write PTX, you can use Stack PTX to generate the stub:

from mm_ptx.stack_ptx import RegisterRegistry
from stack_ptx_default_types import Stack, PtxInstruction, compiler

# Setup naming associations
registry = RegisterRegistry()
registry.add(func["x"].reg, Stack.f32, name="x")
registry.add(func["y"].reg, Stack.f32, name="y")
registry.add(func["z"].reg, Stack.f32, name="z")
registry.freeze()

# Instructions to run
instructions = [
    registry.x,                     # Push 'x'
    registry.y,                     # Push 'y'
    PtxInstruction.add_ftz_f32,     # Pop 'x', Pop 'y', Push ('x' + 'y')
    registry.x,                     # Push 'x'
    PtxInstruction.add_ftz_f32      # Pop 'x', Pop ('x' + 'y'), Push ('x' + ('x' + 'y')) 
]

# Create ptx stub
ptx_stub = compiler.compile(
    registry=registry,
    instructions=instructions,
    requests=[registry.z],
    ...
)

# Inject the ptx stub in to the ptx inject site/s
final_ptx = inject.render_ptx({"func": ptx_stub})

Printing ptx_stub gives:

    {
    .reg .f32 %_a<2>;
    add.ftz.f32 %_a0, %_x0, %_x2;
    add.ftz.f32 %_a1, %_x2, %_a0;
    mov.f32 %_x1, %_a1;
    }

This would be equivalent to writing this CUDA kernel directly but without the CUDA to PTX compilation overhead:

extern "C"
__global__
void kernel(float* out) {
    float x = 5.0f;
    float y = 3.0f;
    float z = 0.0f;
    z = x + (x + y);
    out[0] = z;
}

Stack PTX instruction descriptions

The instruction definitions are defined by the user and are not part of the core Stack PTX system. This allows customization of the described instructions to fit the users demands.

Install

pip install mm-ptx

Requires Python 3.9+.

Tests

python -m pip install -e .
python -m unittest discover -s tests

CUDA integration tests are skipped by default. To run them (requires cuda.core, cuda.bindings, and a CUDA-capable GPU):

 MM_PTX_RUN_CUDA_TESTS=1 python -m unittest discover -s tests

Examples

More details

For the C/C++ headers and deeper implementation notes, see the mm-ptx repo:

License

MIT. See LICENSE.

Citation

If you use this software in your work, please cite it using the following BibTeX entry (generated from CITATION.cff):

@software{Durham_mm-ptx_2025,
  author       = {Durham, Charlie},
  title        = {mm-ptx: PTX Inject and Stack PTX for Python},
  version      = {1.0.1},
  date-released = {2025-10-19},
  url          = {https://github.com/MetaMachines/mm-ptx-py}
}

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mm_ptx-1.0.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mm_ptx-1.0.1-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

mm_ptx-1.0.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (983.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-1.0.1-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

mm_ptx-1.0.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (984.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-1.0.1-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

mm_ptx-1.0.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (984.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-1.0.1-cp39-cp39-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

mm_ptx-1.0.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (984.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file mm_ptx-1.0.1.tar.gz.

File metadata

  • Download URL: mm_ptx-1.0.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mm_ptx-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d924168a0b4b569aee8dae59b65b8bbcc1b2359fe3028cfd255cd6151680fbef
MD5 b0243e68a87bcce91cbe38707f287477
BLAKE2b-256 7e8c873010d1b65c4233c72737b4def136499ff39a6b33e5d12996df9a5eb953

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 fb77e169d73055d12d6b930200837e44390418a331dd5b609eab9382211e68e7
MD5 8a9db6fce0e651514cf0802f70ca8ec4
BLAKE2b-256 b97f579e09b66fccd724b467b00c2d436c6b2f1b71bf23d80c495e486932832c

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4ac756e56b7a5ffb028d8da328b02d384e39eb939fc7c6e197ab0605618e6700
MD5 c25af64bc68427fce3494f568afe674a
BLAKE2b-256 6f9bd0c5391f69b428f80ff9fb9c5bb2f431abd13a2f045f218808d5f4a531da

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 140f028408f4ca03b1f263dbf062a0e8d6774d9c8419e4768e899f62f67fc433
MD5 3509cd6b390fd0f732df418b29a045f0
BLAKE2b-256 5e670fd9edb13cd397304b69814be45b36d2a8f5e2fef2c7b028ee0cbd75eb90

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7ec7e7eedd8cf48b5ed9249589a2bdd2625e75bb80ec35b9ef0650f8aa829d7a
MD5 7a9c2df5ab1d69fdc78c5f43e4413dca
BLAKE2b-256 396ada6319ad1c28af0d06640a17f1030b7614193d3a34e2eda6dee297a42432

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4a9a33b45666e675cdd8c58dd69a29fbcc8e2041f1c17002266dd529523186bc
MD5 2f67999de28b763ab329ef543c227c81
BLAKE2b-256 c90dd747875a1e25da74aa6641bad1b55d106cf23e55e51e71999f5eed7da76f

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8039d580b44fd9780613cdf128b85e8c80959520bff1328fd7dde6fe0fa6e400
MD5 4f4859134cc016c17dd7a157c8e6acc8
BLAKE2b-256 b2832fa813064140900a005fbff3bf1a1d24e058acfc23dd13f39b94c850fbbe

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 88032a0c06c6f0dd095d5f31220d8439d1b54961c8389c630c6f924368e11d36
MD5 5e9c61d0bbfadb4abb49b3bfc3f15aff
BLAKE2b-256 8c39242ce180915513f09cc560c424f0174c8f9350d9a58ad5b4389553c53a61

See more details on using hashes here.

File details

Details for the file mm_ptx-1.0.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-1.0.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 49ffd4e7d5ab4e901bf050f2ae2f64e6b3c98a10658fa96f053e03c32d7c65be
MD5 5899c7a27acd17c79722bd694f627276
BLAKE2b-256 1ecf3a72c921030b8e496326fce2bbd9757ee58f0025c291f759df19bfb920e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page