Skip to main content

A lightweight GPU runtime for Python with NVRTC JIT compilation and NumPy-like API

Project description

PyGPUkit — Lightweight GPU Runtime for Python

A minimal, modular GPU runtime with NVRTC JIT compilation, GPU scheduling, and a clean NumPy-like API.


🚀 Overview

PyGPUkit is a lightweight GPU runtime for Python that provides:

  • NVRTC-based JIT kernel compilation
  • A NumPy-like GPUArray type
  • Kubernetes-inspired GPU scheduler (bandwidth + memory guarantees)
  • Extensible operator set (add/mul/matmul, custom kernels)
  • Minimal dependencies and embeddable runtime

PyGPUkit aims to be the “micro-runtime for GPU computing”: small, fast, and ideal for research, inference tooling, DSP, and real-time systems.


✨ Features

  • Lightweight — no PyTorch/CuPy overhead
  • 🧩 Modular — runtime / memory / scheduler / JIT / ops
  • 📦 GPUArray with NumPy interop
  • 🛠 NVRTC JIT for CUDA kernels
  • 🎼 Advanced Scheduler with memory & bandwidth guarantees
  • 🔌 Optional Triton backend (planned)
  • 🧪 Test-friendly runtime

🔧 Installation

(Available after first PyPI release)

pip install pygpukit

From source:

git clone https://github.com/m96-chan/PyGPUkit
cd PyGPUkit
pip install -e .

Requirements:

  • Python 3.9+
  • CUDA 11+
  • NVRTC available
  • NVIDIA GPU

🧭 Project Goals

  1. Provide the smallest usable GPU runtime for Python
  2. Expose GPU scheduling (bandwidth, memory, partitioning)
  3. Make writing custom GPU kernels easy
  4. Serve as a building block for inference engines, DSP systems, and real-time workloads

📚 Usage Examples

Allocate Arrays

import pygpukit as gp

x = gp.zeros((1024, 1024), dtype="float32")
y = gp.ones((1024, 1024), dtype="float32")

Basic Operations

z = gp.add(x, y)
w = gp.matmul(x, y)

CPU ↔ GPU Transfer

arr = z.to_numpy()
garr = gp.from_numpy(arr)

Custom NVRTC Kernel

extern "C" __global__
void scale(float* x, float factor, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) x[idx] *= factor;
}
kernel = gp.jit(src, func="scale")
kernel(x, factor=0.5, n=x.size)

🎼 Scheduler — Kubernetes‑Inspired GPU Orchestration

PyGPUkit includes an experimental scheduler that treats a single GPU as a multi-tenant compute node, similar to how Kubernetes orchestrates CPU workloads. The goal is to provide resource isolation, guarantees, and fair sharing across multiple GPU tasks.

Core Capabilities


1. GPU Memory Reservation

Tasks may request a guaranteed block of GPU memory.

  • Hard guarantees → task is rejected if memory cannot be allocated
  • Soft guarantees → best‑effort allocation
  • Overcommit strategies (evict to host when pressure is high)
  • Reclaim policies (LRU GPUArray eviction)

Example:

task = scheduler.submit(
    fn,
    memory="512MB",
)

2. GPU Bandwidth Guarantees / Throttling

Tasks may request a specific percentage of GPU compute bandwidth.

Bandwidth control is implemented via:

  • Stream priority
  • Kernel pacing (launch intervals)
  • Micro‑slicing large kernels
  • Cooperative time‑quantized scheduling
  • Persistent dispatcher kernels (planned)

Example:

task = scheduler.submit(
    fn,
    bandwidth=0.20,   # 20% GPU compute share
)

3. Logical GPU Partitioning

PyGPUkit implements software‑defined GPU slicing, similar in spirit to Kubernetes device plugin resource partitioning.

Slices may define:

  • Memory quota
  • Bandwidth share
  • Stream priority band
  • Isolation level

Useful for:

  • Multi‑tenant inference servers
  • Real‑time audio/DSP workloads
  • Background/foreground GPU task separation

4. Scheduling Policies

The scheduler supports multiple policies:

  • Guaranteed — exclusive reservation, strict QoS
  • Burstable — partial guarantees, opportunistic bandwidth
  • BestEffort — uses leftover GPU cycles
  • Priority scheduling
  • Deadline scheduling (planned)
  • Weighted fair sharing

Example:

task = scheduler.submit(
    fn,
    policy="guaranteed",
    memory="1GB",
    bandwidth=0.10,
)

5. Admission Control

Before executing a task, the scheduler performs:

  • Resource validation
  • Quota check
  • QoS matching
  • Scheduling feasibility

Results in:

  • admitted
  • queued
  • rejected

6. Monitoring & Introspection

PyGPUkit exposes live metrics:

  • Memory usage per task
  • SM occupancy and GPU utilization
  • Throttling / pacing logs
  • Queue position / execution state
  • Reclaim/eviction count

Example:

stats = scheduler.stats(task_id)

7. Soft Isolation Model

While not OS‑level isolation, each GPU task is provided:

  • Dedicated stream groups
  • Guaranteed memory pools
  • Kernel pacing to enforce bandwidth
  • Optional sandboxed GPUArray region

This provides practical multi‑tenant safety without MIG/MPS.


🏗 Proposed Directory Structure

PyGPUkit/
  core/         # NVRTC wrapper, device info
  memory/       # GPUArray, allocators
  scheduler/    # orchestration, partitioning, throttling
  ops/          # built-in kernels
  jit/          # JIT compiler + cache
  python/       # high-level Python API
  examples/
  tests/

🧪 Roadmap

v0.1 (MVP)

  • GPUArray
  • NVRTC JIT
  • add/mul/matmul ops
  • Basic stream manager
  • Packaging + wheels

v0.2

  • Scheduler (memory + bandwidth guarantees)
  • Kernel cache
  • NumPy interop
  • Benchmarks

v0.3

  • Triton optional backend
  • Advanced ops (softmax, layernorm)
  • Inference‑oriented plugin system

🤝 Contributing

Contributions and discussions are welcome!
Please open Issues for feature requests, bugs, or design proposals.


📄 License

MIT License


⭐ Acknowledgements

Inspired by:

  • CUDA Runtime
  • NVRTC
  • PyCUDA
  • CuPy
  • Triton

PyGPUkit aims to fill the gap for a tiny, embeddable GPU runtime for Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygpukit-0.1.3.tar.gz (45.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pygpukit-0.1.3-cp312-cp312-win_amd64.whl (174.5 kB view details)

Uploaded CPython 3.12Windows x86-64

pygpukit-0.1.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (203.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file pygpukit-0.1.3.tar.gz.

File metadata

  • Download URL: pygpukit-0.1.3.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygpukit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f09ac554019fb706676778f431f16d086949ee6359309de8073b8a3d00266c6d
MD5 68f4aef0ae5fa545ad8f186157b5c2dd
BLAKE2b-256 d07c603b55792054537d2a65a745d549b37bb2bd477968d2a0737427d4c10b29

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygpukit-0.1.3.tar.gz:

Publisher: release.yml on m96-chan/PyGPUkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pygpukit-0.1.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pygpukit-0.1.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 174.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygpukit-0.1.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c9dfe28b78e1c7b30e10f0b7313c4f0a491cd6b211acc3623e9140144181f3eb
MD5 dddd3c00dcdf226b45214defd87eb5e5
BLAKE2b-256 d7fdcc2e495b77bfdfddba7c386c5eb451daa69194fd94e1ab01e75b7efa7066

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygpukit-0.1.3-cp312-cp312-win_amd64.whl:

Publisher: release.yml on m96-chan/PyGPUkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pygpukit-0.1.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for pygpukit-0.1.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 8c25a4e3619cd3c3e0d258d97d74415a7db7ee55d288fb7ffdf6394c06fce71f
MD5 10f59233de8680768a470b387b9cb24a
BLAKE2b-256 af6330cb41fa6f9d4a059c3983a7ec9ae0e5afa68179e93ba3fbabae53d64935

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygpukit-0.1.3-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: release.yml on m96-chan/PyGPUkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page