Skip to main content

PTX Inject and Stack PTX for Python

Project description

MetaMachines PTX for Python

PTX Inject and Stack PTX for Python

PTX Inject and Stack PTX are lightweight, Python-friendly tools for advanced GPU kernel manipulation and generation using NVIDIA's PTX intermediate language. Designed for developers in high-performance computing, machine learning, and scientific simulations, these tools enable dynamic kernel optimizations, rapid experimentation, and automated code generation without the slowdowns of traditional compilation pipelines.

  • PTX Inject: Dynamically inject custom PTX code into annotated CUDA kernels for ultra-fast variations and tuning.
  • Stack PTX: Generate valid PTX code sequences using a stack-based machine, inspired by evolutionary programming languages like Push, for flexible and error-resilient code construction.

Both tools are built on efficient, header-only C libraries with Python bindings for seamless integration into your workflows. They support high-throughput operations, making them ideal for algorithmic exploration or performance benchmarking on GPUs.

Explore working examples:

The C based header files where most of the functionality is implemented is ptx_inject.h and stack_ptx.h. If you are interested in running these with lower overhead in C/C++ or with parallel compilation see examples in mm-ptx.

mm-kermac-py uses PTX Inject and Stack PTX to allow users to dynamically create custom semiring and semiring gradient PyTorch Tensor kernels with arbitrary amounts of hyperparameters. Recompilation of it's custom CuTe CUDA kernels can take ~3 seconds, however recompiling the CuTe kernel from PTX to SASS with injected PTX code takes as little as 60ms.

Installation

mm-ptx


To instal mm-ptx use:

pip install mm-ptx

This package has no dependency on NVIDIA CUDA toolkit or other tools beyond nanobind. Stack PTX and PTX Inject are pure header-only C libraries relying only on the C standard library.

For dependencies running the mm-ptx examples see examples/README.md

PTX Inject

PTX Inject is a lightweight tool that enables dynamic modification of compiled GPU kernels by injecting custom low-level code (PTX) at user-specified points in annotated CUDA source. This allows for ultra-fast kernel variations and optimizations—ideal for algorithmic tuning, performance testing, or machine-driven experiments—without the overhead of full recompilation using tools like nvcc or nvrtc.

By processing annotated kernels, it extracts register mappings and prepares templates for injection, achieving preparation in milliseconds and supporting tens of thousands of injections per second per CPU core. The result is efficient, parallelizable compilation to executable GPU code (SASS) using ptxas or nvPtxCompiler, making it suitable for high-throughput workflows in compute-intensive applications like machine learning or scientific simulations.

Key features:

  • Annotation-Based Injection: Mark sites in CUDA kernels with simple comments

    extern "C"
    __global__
    void kernel() {
        // PTX Inject will give you the PTX Register name/s for these
        float x = 3.0f; 
        float y = 4.0f;
        float z;
        /* PTX_INJECT func  
            in f32 x
            in f32 y 
            out f32 z
        */
        printf("z: %f\n");
    }
    
  • Register Mapping Extraction: Automatically processes annotations to map CUDA variables to PTX registers, handling multiple site inlining and loop unrolling.

  • High Performance: Prepares templates in ~4ms and supports ~10,000 injections per second per CPU core.

  • Parallel Compilation: Outputs PTX ready for fast compilation to SASS using the PTX Compiler API, with loading times under 1ms.

  • Customizable Data Types: Using Python you can describe the names and types used in the PTX_INJECT annotation. See ptx_inject_default_types.py.

A simple full working example can be found here.

Stack PTX

Stack PTX provides a stack-based interface for generating valid PTX code sequences, making it easy to create, modify, and evolve GPU instructions programmatically. Inspired by the Push language for genetic programming, it treats PTX operations as stack manipulations, ensuring code remains valid even after insertions, deletions, or rearrangements. Stack PTX handles register declarations, dead code elimination

Stack PTX can write 100s instruction PTX stubs in single digit microseconds.

  • Stack Machine Model: Push constants and instructions onto a stack; operations pop operands and push results as abstract syntax trees (ASTs).

    For example:

        instructions = [
            registry.x,
            registry.y,
            PtxInstruction.add_ftz_f32
        ]
        requests = [registry.z]
    

    Will take the register names from the variables 'x' and 'y' and add them together with the add.ftz.f32 PTX instruction and assign the result to the register name for 'z'. Creating a stub like:

        {
        .reg .u32 %_c<1>;
        add.u32 %_c0, %z1, %z2;
        mov.u32 %z0, %_c0;
        }
    
  • Dead Code Elimination: Automatically optimizes by removing irrelevant operations from the final PTX output.

  • Customizable Instructions: Using Python you can describe the names and types used in the Stack PTX compiler. See examples/stack_ptx_default_types.py or look at mm-kermac-py for more complete definitions.

  • No Dependencies: Pure C99 implementation with Python bindings for easy use.

See a simple Stack PTX example here.

Stack PTX Inject

Both systems are meant to be used together to dynamically create new and potentially novel CUDA kernels extremely quickly and safely.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this software in your work, please cite it using the following BibTeX entry (generated from the CITATION.cff file):

@software{Durham_mm-ptx_2025,
  author       = {Durham, Charlie},
  title        = {mm-ptx: PTX Inject and Stack PTX for Python},
  version      = {0.1.0},
  date-released = {2025-10-19},
  url          = {https://github.com/MetaMachines/mm-ptx-py}
}

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mm_ptx-0.1.2.tar.gz (14.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mm_ptx-0.1.2-cp312-cp312-musllinux_1_2_x86_64.whl (360.6 kB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

mm_ptx-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (162.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-0.1.2-cp311-cp311-musllinux_1_2_x86_64.whl (361.4 kB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

mm_ptx-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (162.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-0.1.2-cp310-cp310-musllinux_1_2_x86_64.whl (360.6 kB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

mm_ptx-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (162.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-0.1.2-cp39-cp39-musllinux_1_2_x86_64.whl (360.6 kB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

mm_ptx-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (162.4 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mm_ptx-0.1.2-cp38-cp38-musllinux_1_2_x86_64.whl (359.6 kB view details)

Uploaded CPython 3.8musllinux: musl 1.2+ x86-64

mm_ptx-0.1.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (161.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file mm_ptx-0.1.2.tar.gz.

File metadata

  • Download URL: mm_ptx-0.1.2.tar.gz
  • Upload date:
  • Size: 14.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mm_ptx-0.1.2.tar.gz
Algorithm Hash digest
SHA256 125da0284e905764eb4a4ed780296754731d25b79c20a9eccd6bcb40e82fe0e3
MD5 db777f674404ef8fd41f91c94f8a8adf
BLAKE2b-256 cf6902b25443b4afe1f2f28a14441e560fd7205c585cce6510dda38df96866fc

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2f76ea00c4d56f483730759d831d3e3f95c15d49586c6d9b1d5cc06ed982c9f5
MD5 f4df17448934c9103659aeb65eb407e7
BLAKE2b-256 ecd979d3df3a793cb5c970547002163b92acbcc61f55fd8917495b03dd281d6d

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 43469de5bbca24bee198bcfe8d17b108994d8d6c0e724838213e339e871e70ec
MD5 6848e1649904d5278ca20f78fc8417d9
BLAKE2b-256 4486388a91dbff5324c6f6c30610518098292106b48363382b774729e1a80e59

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9fb4ba4484175f9055d01350b5999aec8565ae7d5fb71e32b9b5eab08842f142
MD5 d4fa636ee7339739bb189ecbfd7ccb04
BLAKE2b-256 3110a4e01daffd911cc8a72e52bd651e150aafac51a76d31839a1fb80bf3dd63

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aa6231a3065da43668ad14895eca7349df603c337debf6221fd376da00bebb45
MD5 751382d95888b87ab17f9f3a66f9b552
BLAKE2b-256 fc3009f5f6e91fbfa10a8b7e33b1bceb17c3382d93c7cd1f46dfca9c06adc829

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 e390cd7a1e66aab9cbdaa3b1177dfd34e71fb9ef0740def965ca5bec49518116
MD5 8faedc08383aa2e3cc47b6989e136dc3
BLAKE2b-256 b46f7166739e062ca8e16fd221d3691afcbea053e5ea3bf115d5b5716da7f38e

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dbbfd7eb2bb15ec4da54b0e8738fdee0540cc119f27a6c2fc8f62dd7fcfefa39
MD5 6cbb0aa78396fe6a576c29a0acf43d5d
BLAKE2b-256 7e9800014020a1a537ae3417e6b62fd0e166741d881ffdf5859f2ce5b74d9cff

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 93da3269528106d5499467318b91a2d10c53c881dce023c60a1d5d340a34366f
MD5 4bdef1c5778c9ba51cb86c93174cdfe2
BLAKE2b-256 9bb6d1d563eba6fcaa7df0d106c40fa2b1344d6344e2e40efac0f95e0424fe2a

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1f4997821ec6aaa28694043739ac0cf66f98f814c9f01dd55ceb5130b1fca11e
MD5 db516ad456921e5d2936709daa4d4b2c
BLAKE2b-256 998d0471e1e015ab0a322fc575966c2686e8fd3db18bba5e94a1638892a56a9a

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp38-cp38-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp38-cp38-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8e4a23acb030640d9d11666fd4c8315f5c61e453eea09b185cd22030d3080a97
MD5 485274f7eea6d12ce7060dbd580eeda2
BLAKE2b-256 add3b161b4869213e71483f694cbf1143b524dde17ccf9b5e1d47b75d9d3465a

See more details on using hashes here.

File details

Details for the file mm_ptx-0.1.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mm_ptx-0.1.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 05bcb57f2e899c56f8a0f3f83f0761fe5a8968945afb5a3b47263bd5ace6aac6
MD5 73b8650a337a0bf5177f912f4742a4ec
BLAKE2b-256 d62e51fb2fe6ad77bd6c0c789a1c8a6ab9561d51ea2b6002ef6001e44fbbf753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page