Skip to main content

Predict GPU execution time & memory for PyTorch models โ€” without running them.

Project description

Blink ๐Ÿ”ญ

PyPI version CI Python 3.9+ License: MIT

GPU Performance Predictor for Deep Learning Models

Blink predicts the execution time and peak memory usage of PyTorch neural networks on GPU hardware before you actually run or deploy them.

It combines classical ML (XGBoost, Random Forest) with a Graph Neural Network (GNN) that encodes the computational graph of any model architecture, acting as a "virtual profiler."


โšก Quick Start

Installation

Blink is published on PyPI. You can install the core API, or install with optional dependency groups:

# Core prediction API only
pip install blink-gpu

# Include Streamlit Dashboard, SHAP explainability, and Plotly
pip install "blink-gpu[full]"

# Include FastAPI REST Server
pip install "blink-gpu[api]"

# Install everything
pip install "blink-gpu[all]"

Note: You must install PyTorch (torch, torchvision) separately according to your CUDA hardware.

Python Usage

import torchvision.models as tv
from blink import BlinkPredictor, BlinkAnalyzer

# 1. Analyze any PyTorch model architecture
model = tv.resnet18(weights=None)
print(BlinkAnalyzer().summary(model))
# โž” Parameters: 11,689,512 | FLOPs: 1,814 M | Conv layers: 20 | Size: 44.59 MB

# 2. Predict execution time and memory for a batch size
predictor = BlinkPredictor()
result = predictor.predict(model, batch_size=32)

print(f"Exec time: {result['exec_time_ms']:.1f} ms")
print(f"Memory   : {result['memory_mb']:.1f} MB")
# โž” Exec time: 18.3 ms | Memory: 184.3 MB

# 3. Sweep multiple batch sizes
sweep = predictor.predict_batch("resnet50", batch_sizes=[1, 16, 32, 64])

๐Ÿ’ป Command Line Interface (CLI)

Blink comes with a built-in CLI for quick profiling without writing scripts:

# Predict via CLI
$ blink predict resnet50 --batch-size 32
๐Ÿ”ฎ Blink prediction for 'resnet50'
 Batch   Exec (ms)   Memory (MB)  CI-Exec (80%)
------------------------------------------------------------
    32       28.45         294.5  [22.1 - 36.6]

# Launch the Streamlit Dashboard
$ blink dashboard --port 8501

# Launch the FastAPI REST Server
$ blink server --host 0.0.0.0 --port 8000

๐Ÿ“Š Streamlit Dashboard & Explainability

Blink includes a rich, interactive web dashboard. Run blink dashboard to access:

Blink Dashboard SHAP Explainability Demo

  • Live Predictions: Instantly predict performance for custom PyTorch code or TorchVision models.
  • ๐Ÿ” SHAP Explainability ("Why this prediction?"): Interactive waterfall charts explaining exactly which architectural features (e.g., FLOPs, Conv layers, Model Depth) drove the predicted execution time and memory footprint up or down.

Blink Batch Optimizer Demo

  • Batch Size Optimizer: Find the maximum batch size that fits within your specific GPU memory budget (e.g., 8GB, 16GB, 24GB).
  • Compare Architectures: Side-by-side performance comparison of different models.

๐ŸŒ REST API & Docker Deployment

Blink can be deployed as a microservice to provide GPU cost estimates to other applications.

Docker Compose (Recommended)

You can spin up both the Streamlit Dashboard and the FastAPI backend instantly using Docker.

git clone https://github.com/Aniketxmishra/Blink_Main.git
cd Blink_Main
docker compose up -d
  • Dashboard: http://localhost:8501
  • REST API: http://localhost:8000/docs (Swagger UI)

REST API Example

curl -X POST "http://localhost:8000/api/v2/predict" \
     -H "Content-Type: application/json" \
     -d '{"model_name": "resnet50", "batch_size": 32}'

# Response:
# {
#   "model_name": "resnet50",
#   "batch_size": 32,
#   "predictions": {
#     "exec_time_ms": 28.45,
#     "exec_time_bounds": [22.1, 36.6],
#     "memory_usage_mb": 294.5,
#     ...
#   }
# }

๐Ÿง  How it Works (Architecture)

PyTorch Model
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Feature Extractor  โ”‚  โ† layer counts, FLOPs, params, depth, width, skip connections
โ”‚  + GNN Extractor    โ”‚  โ† graph-based architecture encoding (ArchitectureGNN)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ”‚
          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Prediction Models  โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚  ยท XGBoost (tuned)  โ”‚  โ† main predictor (best MAPE) + SHAP Explainer
โ”‚  ยท Random Forest    โ”‚  โ† latency confidence intervals (Quantile Regression)
โ”‚  ยท GNN Predictor    โ”‚  โ† graph-native, generalizes across architectures
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ”‚
          โ–ผ
   Predicted: exec_time_ms, memory_mb

Model Performance on Held-out Data:

  • Execution Time (XGBoost): ~8% MAPE
  • Memory Usage (XGBoost): ~6% MAPE

๐Ÿ”ฌ Development & Paper Reproducibility

Blink was developed alongside a research study evaluating the efficacy of static and graph-based features for GPU performance prediction.

To reproduce the paper's figures and ablation study:

git clone https://github.com/Aniketxmishra/Blink_Main.git
cd Blink_Main
pip install -e ".[full]"

python scripts/ablation_study.py
python scripts/generate_paper_figures.py

Outputs will be saved to the results/ directory.


๐Ÿ“„ License

MIT License โ€” see LICENSE for details. Made by Aniket Mishra.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blink_gpu-0.1.7.tar.gz (329.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blink_gpu-0.1.7-py3-none-any.whl (345.7 kB view details)

Uploaded Python 3

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page