A lightweight GPU runtime for Python with Rust-powered scheduler, NVRTC JIT compilation, and NumPy-like API

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

m96-chan

These details have not been verified by PyPI

Project description

PyGPUkit — Lightweight GPU Runtime for Python

A minimal, modular GPU runtime with Rust-powered scheduler, NVRTC JIT compilation, and a clean NumPy-like API.

Overview

PyGPUkit is a lightweight GPU runtime for Python that provides:

Rust-powered scheduler with admission control, QoS, and resource partitioning
NVRTC-based JIT kernel compilation
A NumPy-like GPUArray type
Kubernetes-inspired GPU scheduling (bandwidth + memory guarantees)
Extensible operator set (add/mul/matmul, custom kernels)
Minimal dependencies and embeddable runtime

PyGPUkit aims to be the "micro-runtime for GPU computing": small, fast, and ideal for research, inference tooling, DSP, and real-time systems.

v0.2.2 Features (NEW)

Ampere-Optimized SGEMM

Feature	Description
cp.async Pipeline	4-stage software pipeline with async memory transfers
Vectorized Loads	float4 (16-byte) loads for A and B matrices
Shared Memory Tiling	BM=128, BN=128, BK=16 with 8x8 thread tiles
SM 80+ Required	Ampere architecture (RTX 30XX+) required

Performance (RTX 3090 Ti)

Matrix Size	TFLOPS	Efficiency	vs NumPy
2048x2048	7.6	19%	10x
4096x4096	13.2	33%	16x
8192x8192	18.2	46%	22x

Core Infrastructure (Rust)

Feature	Description
Memory Pool	LRU eviction, size-class free lists
Scheduler	Priority queue, memory reservation
Transfer Engine	Separate H2D/D2H streams, priority
Kernel Dispatch	Per-stream limits, lifecycle tracking

Advanced Features (Rust)

Feature	Description
Admission Control	Deterministic admission, quota enforcement
QoS Policy	Guaranteed/Burstable/BestEffort tiers
Kernel Pacing	Bandwidth-based throttling per stream
Micro-Slicing	Kernel splitting, round-robin fairness
Pinned Memory	Page-locked host memory with pooling
Kernel Cache	PTX caching, LRU eviction, TTL
GPU Partitioning	Resource isolation, multi-tenant support

Features

Lightweight — no PyTorch/CuPy overhead
Modular — runtime / memory / scheduler / JIT / ops
Rust Backend — memory pool, scheduler, dispatch in Rust
GPUArray with NumPy interop
NVRTC JIT for CUDA kernels
Advanced Scheduler with memory & bandwidth guarantees
106 Rust tests for core components

Installation

pip install pygpukit

From source:

git clone https://github.com/m96-chan/PyGPUkit
cd PyGPUkit
pip install -e .

Requirements:

Python 3.10+
CUDA 11+
NVRTC available
NVIDIA GPU

Supported GPUs:

RTX 30XX series (Ampere) and above
Performance tuning is optimized for GPUs with large L2 cache (6MB+)
Older GPUs (RTX 20XX, GTX 10XX, etc.) are NOT tuned and may have suboptimal performance

Project Goals

Provide the smallest usable GPU runtime for Python
Expose GPU scheduling (bandwidth, memory, partitioning)
Make writing custom GPU kernels easy
Serve as a building block for inference engines, DSP systems, and real-time workloads

Usage Examples

Allocate Arrays

import pygpukit as gp

x = gp.zeros((1024, 1024), dtype="float32")
y = gp.ones((1024, 1024), dtype="float32")

Basic Operations

z = gp.add(x, y)
w = gp.matmul(x, y)

CPU <-> GPU Transfer

arr = z.to_numpy()
garr = gp.from_numpy(arr)

Custom NVRTC Kernel

extern "C" __global__
void scale(float* x, float factor, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) x[idx] *= factor;
}

kernel = gp.jit(src, func="scale")
kernel(x, factor=0.5, n=x.size)

Rust Scheduler (v0.2)

import _pygpukit_rust as rust

# Memory Pool with LRU eviction
pool = rust.MemoryPool(quota=100 * 1024 * 1024, enable_eviction=True)
block = pool.allocate(4096)

# QoS-aware task scheduling
evaluator = rust.QosPolicyEvaluator(total_memory=8*1024**3, total_bandwidth=1.0)
task = rust.QosTaskMeta.guaranteed("task-1", "Critical Task", 256*1024*1024)
result = evaluator.evaluate(task)

# GPU Partitioning
manager = rust.PartitionManager(rust.PartitionConfig(total_memory=8*1024**3))
manager.create_partition("inference", "Inference",
    rust.PartitionLimits().memory(4*1024**3).compute(0.5))

Scheduler — Kubernetes-Inspired GPU Orchestration

PyGPUkit includes an experimental scheduler that treats a single GPU as a multi-tenant compute node, similar to how Kubernetes orchestrates CPU workloads. The goal is to provide resource isolation, guarantees, and fair sharing across multiple GPU tasks.

Core Capabilities

1. GPU Memory Reservation

Tasks may request a guaranteed block of GPU memory.

Hard guarantees -> task is rejected if memory cannot be allocated
Soft guarantees -> best-effort allocation
Overcommit strategies (evict to host when pressure is high)
Reclaim policies (LRU GPUArray eviction)

Example:

task = scheduler.submit(
    fn,
    memory="512MB",
)

2. GPU Bandwidth Guarantees / Throttling

Tasks may request a specific percentage of GPU compute bandwidth.

Bandwidth control is implemented via:

Stream priority
Kernel pacing (launch intervals)
Micro-slicing large kernels
Cooperative time-quantized scheduling
Persistent dispatcher kernels (planned)

Example:

task = scheduler.submit(
    fn,
    bandwidth=0.20,   # 20% GPU compute share
)

3. Logical GPU Partitioning

PyGPUkit implements software-defined GPU slicing, similar in spirit to Kubernetes device plugin resource partitioning.

Slices may define:

Memory quota
Bandwidth share
Stream priority band
Isolation level

Useful for:

Multi-tenant inference servers
Real-time audio/DSP workloads
Background/foreground GPU task separation

4. Scheduling Policies

The scheduler supports multiple policies:

Guaranteed — exclusive reservation, strict QoS
Burstable — partial guarantees, opportunistic bandwidth
BestEffort — uses leftover GPU cycles
Priority scheduling
Deadline scheduling (planned)
Weighted fair sharing

Example:

task = scheduler.submit(
    fn,
    policy="guaranteed",
    memory="1GB",
    bandwidth=0.10,
)

5. Admission Control

Before executing a task, the scheduler performs:

Resource validation
Quota check
QoS matching
Scheduling feasibility

Results in:

admitted
queued
rejected

6. Monitoring & Introspection

PyGPUkit exposes live metrics:

Memory usage per task
SM occupancy and GPU utilization
Throttling / pacing logs
Queue position / execution state
Reclaim/eviction count

Example:

stats = scheduler.stats(task_id)

7. Soft Isolation Model

While not OS-level isolation, each GPU task is provided:

Dedicated stream groups
Guaranteed memory pools
Kernel pacing to enforce bandwidth
Optional sandboxed GPUArray region

This provides practical multi-tenant safety without MIG/MPS.

Project Structure

PyGPUkit/
  src/pygpukit/    # Python API (NumPy-compatible)
  native/          # C++ backend (CUDA Driver/Runtime/NVRTC)
  rust/            # Rust backend (memory pool, scheduler, dispatch)
    pygpukit-core/   # Pure Rust core logic
    pygpukit-python/ # PyO3 bindings
  examples/        # Demo scripts
  tests/           # Test suite

Roadmap

v0.1 (Released)

GPUArray
NVRTC JIT
add/mul/matmul ops
Basic stream manager
Packaging + wheels

v0.2.0 (Released)

Rust Memory Pool (LRU, size-class)
Rust Scheduler (priority, memory reservation)
Rust Transfer Engine (async H2D/D2H)
Rust Kernel Dispatch Controller
Admission Control
QoS Policy Framework (Guaranteed/Burstable/BestEffort)
Kernel Pacing Engine
Micro-Slicing Framework
Pinned Memory Support
Kernel Cache (PTX caching)
GPU Partitioning
Tiled Matmul (shared memory)
106 Rust tests

v0.2.1 — Stabilization Phase (Released)

Admission / QoS spec finalization
Python API inconsistency fixes
Rust error propagation unification

v0.2.2 — Performance Phase (Released)

Ampere-optimized SGEMM with cp.async pipeline
4-stage software pipelining for latency hiding
float4 vectorized memory loads
18.2 TFLOPS on RTX 3090 Ti (46% efficiency)
SM 80+ (Ampere) architecture requirement

v0.2.3 — Reliability Phase

Kernel cache LRU completion
Driver-only mode stabilization
Windows/Linux full support
Large GPU memory test (16GB continuous alloc/free)

v0.2.4 — Distributed Phase

Multi-GPU Detection
NCCL / peer-to-peer preliminary support
Scheduler multi-device support

v0.2.5 — Pre-v0.3 Finalization

Full API review
Backward compatibility policy
JIT build options, safety measures, env vars cleanup
Documentation

v0.3 (Planned)

Triton optional backend
Advanced ops (softmax, layernorm)
Inference-oriented plugin system
MPS/MIG integration

Contributing

Contributions and discussions are welcome! Please open Issues for feature requests, bugs, or design proposals.

License

MIT License

Acknowledgements

Inspired by:

CUDA Runtime
NVRTC
PyCUDA
CuPy
Triton

PyGPUkit aims to fill the gap for a tiny, embeddable GPU runtime for Python.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

m96-chan

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.19

Jan 1, 2026

0.2.18

Dec 30, 2025

0.2.17

Dec 28, 2025

0.2.16

Dec 28, 2025

0.2.15

Dec 26, 2025

0.2.14

Dec 23, 2025

0.2.13

Dec 23, 2025

0.2.12

Dec 22, 2025

0.2.11

Dec 22, 2025

0.2.10

Dec 18, 2025

0.2.9

Dec 16, 2025

0.2.8

Dec 15, 2025

0.2.7

Dec 15, 2025

0.2.6

Dec 15, 2025

0.2.5

Dec 15, 2025

0.2.4

Dec 14, 2025

0.2.3

Dec 14, 2025

This version

0.2.2

Dec 13, 2025

0.2.0

Dec 12, 2025

0.1.3

Dec 12, 2025

0.1.1

Dec 12, 2025

0.1.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygpukit-0.2.2.tar.gz (169.8 kB view details)

Uploaded Dec 13, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pygpukit-0.2.2-cp312-cp312-win_amd64.whl (799.8 kB view details)

Uploaded Dec 13, 2025 CPython 3.12Windows x86-64

pygpukit-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (830.2 kB view details)

Uploaded Dec 13, 2025 CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file pygpukit-0.2.2.tar.gz.

File metadata

Download URL: pygpukit-0.2.2.tar.gz
Upload date: Dec 13, 2025
Size: 169.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygpukit-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`b541fda1510d9e5be3de867b0c3547038e728b448e4a42267fc768fc857454e8`
MD5	`56c11ba67f1ab885041ecf320da1cdf8`
BLAKE2b-256	`307cc924986da1d67045ea7d7e5e502dc9ef8e3e15645c4b91dc0845c94a965a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygpukit-0.2.2.tar.gz:

Publisher: release.yml on m96-chan/PyGPUkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pygpukit-0.2.2.tar.gz
- Subject digest: b541fda1510d9e5be3de867b0c3547038e728b448e4a42267fc768fc857454e8
- Sigstore transparency entry: 763109509
- Sigstore integration time: Dec 13, 2025
Source repository:
- Permalink: m96-chan/PyGPUkit@412b5507b4d7676a1516c4fc0516e192c22913cb
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/m96-chan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@412b5507b4d7676a1516c4fc0516e192c22913cb
- Trigger Event: push

File details

Details for the file pygpukit-0.2.2-cp312-cp312-win_amd64.whl.

File metadata

Download URL: pygpukit-0.2.2-cp312-cp312-win_amd64.whl
Upload date: Dec 13, 2025
Size: 799.8 kB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygpukit-0.2.2-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`a9d5f56ea7094aa75077ab9eca516b5d8252f31c846974466fb8030327428f5c`
MD5	`81f94f2130c2a9363118568770620581`
BLAKE2b-256	`12173badb1dbf43e6c29a00faf1dddf4aee043dff8fd57323b66af3d5c936831`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygpukit-0.2.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on m96-chan/PyGPUkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pygpukit-0.2.2-cp312-cp312-win_amd64.whl
- Subject digest: a9d5f56ea7094aa75077ab9eca516b5d8252f31c846974466fb8030327428f5c
- Sigstore transparency entry: 763109511
- Sigstore integration time: Dec 13, 2025
Source repository:
- Permalink: m96-chan/PyGPUkit@412b5507b4d7676a1516c4fc0516e192c22913cb
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/m96-chan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@412b5507b4d7676a1516c4fc0516e192c22913cb
- Trigger Event: push

File details

Details for the file pygpukit-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

Download URL: pygpukit-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Upload date: Dec 13, 2025
Size: 830.2 kB
Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64, manylinux: glibc 2.35+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygpukit-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm	Hash digest
SHA256	`92d300a251266924141a8d6c7362a7ed0e6166381577c1dc808622e9c7311020`
MD5	`b82fe28cc5d20340f290b2109603762a`
BLAKE2b-256	`9d6d2a3a108c55765e743297d4a7ccfb1fc0b69b59129c11e4386e6ce024617b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygpukit-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: release.yml on m96-chan/PyGPUkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pygpukit-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
- Subject digest: 92d300a251266924141a8d6c7362a7ed0e6166381577c1dc808622e9c7311020
- Sigstore transparency entry: 763109514
- Sigstore integration time: Dec 13, 2025
Source repository:
- Permalink: m96-chan/PyGPUkit@412b5507b4d7676a1516c4fc0516e192c22913cb
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/m96-chan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@412b5507b4d7676a1516c4fc0516e192c22913cb
- Trigger Event: push

PyGPUkit 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PyGPUkit — Lightweight GPU Runtime for Python

Overview

v0.2.2 Features (NEW)

Ampere-Optimized SGEMM

Performance (RTX 3090 Ti)

Core Infrastructure (Rust)

Advanced Features (Rust)

Features

Installation

Project Goals

Usage Examples

Allocate Arrays

Basic Operations

CPU <-> GPU Transfer

Custom NVRTC Kernel

Rust Scheduler (v0.2)

Scheduler — Kubernetes-Inspired GPU Orchestration

Core Capabilities

1. GPU Memory Reservation

2. GPU Bandwidth Guarantees / Throttling

3. Logical GPU Partitioning

4. Scheduling Policies

5. Admission Control

6. Monitoring & Introspection

7. Soft Isolation Model

Project Structure

Roadmap

v0.1 (Released)

v0.2.0 (Released)

v0.2.1 — Stabilization Phase (Released)

v0.2.2 — Performance Phase (Released)

v0.2.3 — Reliability Phase

v0.2.4 — Distributed Phase

v0.2.5 — Pre-v0.3 Finalization

v0.3 (Planned)

Contributing

License

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance