PyTorch 2.10 with native SM 12.0 (Blackwell) support for NVIDIA RTX 50-series GPUs
Project description
PyTorch 2.10.0a0 with SM 12.0 Support for RTX 50-series GPUs
🚀 Complete Optimization Stack for NVIDIA Blackwell Architecture 🚀
Available 12 months before PyTorch 2.10 official release.
Native Blackwell architecture support for NVIDIA GeForce RTX 5090, 5080, 5070 Ti, and 5070 GPUs.
📖 Quick Start Guide | 🔧 Triton Build Guide | 📋 Complete Feature List | 🐍 Python Versions | 🐳 Docker Install
Overview
This is a custom-built PyTorch 2.10.0a0 wheel compiled with native SM 12.0 (Blackwell) support. Unlike PyTorch nightlies which only provide PTX backward compatibility (~70-80% performance), this build includes optimized CUDA kernels specifically compiled for RTX 50-series GPUs.
Why This Build?
Official PyTorch releases and nightlies currently only support up to SM 8.9 (Ada Lovelace/RTX 40-series). When running on RTX 50-series GPUs, they fall back to PTX compatibility mode which:
- Reduces performance by 20-30%
- Increases JIT compilation overhead
- Lacks Blackwell-specific optimizations
This build solves that problem by compiling PyTorch from source with TORCH_CUDA_ARCH_LIST=12.0, enabling full native performance.
Specifications
- PyTorch Version: 2.10.0a0+gitc5d91d9
- CUDA Version: 13.0.1 (compatible with all CUDA 12.x and 13.x)
- Python Versions: 3.12 (stable), 3.13 (available), 3.14 (coming soon) - See guide
- Platform: Linux x86_64
- Architecture: SM 12.0 (compute_120, code_sm_120)
- Build Date: November 12, 2025
- Wheel Size: 180 MB
Features Included
This build includes all PyTorch 2.7-2.10 features and Blackwell-specific optimizations:
Core Features
✅ Native SM 12.0 Support - Full Blackwell architecture support ✅ CUDA 12.x and 13.x Compatible - Works with CUDA 12.0 through 13.0+ ✅ cuDNN 9.7.0+ - Latest cuDNN with Blackwell optimizations ✅ NCCL 2.25.1 - Multi-GPU communication library ✅ CUTLASS 3.8.0 - CUDA template library for linear algebra ✅ 128-bit Vectorization - Enhanced memory bandwidth utilization ✅ 5th Gen Tensor Cores - Native Blackwell Tensor Core support
Compiler Features
✅ torch.compile - PyTorch 2.x compilation (Triton required for full optimization) ✅ Torch Function Modes - Custom operation overriding ✅ Mega Cache - Improved compilation caching
Known Limitations
⚠️ Triton Not Included - Triton 3.3+ with SM 12.0 support requires separate compilation due to CUDA 13.0 PTXAS dependencies. See TRITON_BUILD_GUIDE.md for build instructions.
Impact of Missing Triton:
- torch.compile works but with reduced optimization (~30-40% slower on attention-heavy workloads)
- FlexAttention limited functionality
- No custom Triton kernels
Want full torch.compile performance? Build Triton separately using our guide!
Supported GPUs
- NVIDIA GeForce RTX 5090
- NVIDIA GeForce RTX 5080
- NVIDIA GeForce RTX 5070 Ti
- NVIDIA GeForce RTX 5070
All GPUs with Blackwell architecture (SM 12.0 / Compute Capability 12.0)
Requirements
System Requirements
- Linux x86_64 (Ubuntu 22.04+ recommended)
- Python 3.12
- NVIDIA Driver 570.00 or newer
- CUDA 13.0+ compatible driver
Python Dependencies
- numpy >= 2.3.0
- packaging >= 25.0
- PyYAML >= 6.0
- typing-extensions >= 4.15.0
All dependencies are listed in requirements.txt and will be installed automatically.
Installation
Recommended: Install via PyPI
The easiest way to install PyTorch with RTX 50-series support:
# Install the stone-linux package
pip install stone-linux
# Run the installer (downloads and installs the appropriate PyTorch wheel)
stone-install
# Verify installation
stone-verify
The installer will automatically:
- Detect your Python version
- Download the correct PyTorch wheel from GitHub releases
- Install all dependencies
- Verify GPU compatibility
Alternative: Direct Wheel Installation
Download and install the wheel directly for your Python version:
# Python 3.10
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp310-cp310-linux_x86_64.whl
# Python 3.11
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp311-cp311-linux_x86_64.whl
# Python 3.12
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp312-cp312-linux_x86_64.whl
# Python 3.13
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp313-cp313-linux_x86_64.whl
# Python 3.14
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp314-cp314-linux_x86_64.whl
Alternative: Clone and Install Script
# Clone this repository
git clone https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-.git
cd PyTorch-2.10.0a0-for-Linux-
# Run the installation script
chmod +x install.sh
./install.sh
Windows (WSL2 Required)
This is a Linux wheel. Windows users need WSL2 with Ubuntu:
# In WSL2 Ubuntu terminal
pip install stone-linux
stone-install
Verification
Quick Verification
If you installed via stone-linux:
stone-verify
Python Verification
Verify PyTorch is working correctly:
import torch
import stone_linux
# Quick verification
stone_linux.verify_installation()
# Or manually check
print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"GPU Name: {torch.cuda.get_device_name(0)}")
print(f"Compute Capability: {torch.cuda.get_device_capability(0)}")
# Test GPU operation
x = torch.rand(5, 3).cuda()
print(f"Tensor device: {x.device}")
Expected output:
PyTorch Version: 2.10.0a0+gitc5d91d9
CUDA Available: True
CUDA Version: 13.0
GPU Name: NVIDIA GeForce RTX 5080
Compute Capability: (12, 0)
Tensor device: cuda:0
Performance
Compared to PyTorch nightlies on RTX 50-series:
- 20-30% faster training and inference
- No JIT overhead from PTX compilation
- Native Blackwell optimizations
- Full Tensor Core utilization
See the performance benchmarks notebook for detailed metrics.
Examples and Tutorials
Jupyter Notebooks
- Getting Started - Basic PyTorch operations, neural networks, and mixed precision training
- Performance Benchmarks - Comprehensive performance analysis and optimization tips
Integration Examples
- vLLM Integration - High-performance LLM inference with vLLM
- LangChain Integration - LLM applications with LangChain
- Benchmarking Script - Performance benchmarking utilities
Run examples:
# Install with examples
pip install 'stone-linux[examples,vllm,langchain]'
# Run vLLM example
python -m stone_linux.examples.vllm_example
# Run LangChain example
python -m stone_linux.examples.langchain_example
# Run benchmarks
python -m stone_linux.examples.benchmark --output results.json
Troubleshooting
"CUDA not available" after installation
-
Verify NVIDIA driver version:
nvidia-smi
Should show driver >= 570.00
-
Check GPU compute capability:
nvidia-smi --query-gpu=compute_cap --format=csv,noheader
Should show
12.0
Python version mismatch
This wheel requires Python 3.12. Create a virtual environment:
python3.12 -m venv pytorch-env
source pytorch-env/bin/activate
pip install torch_sm120.whl
Building From Source
PyTorch Only
See the included Dockerfile.pytorch-builder for PyTorch-only build instructions.
docker build -f Dockerfile.pytorch-builder -t pytorch-sm120-builder .
PyTorch + Triton (Recommended for Full Performance)
For the complete build with Triton 3.3+ support:
chmod +x build-pytorch-triton.sh
./build-pytorch-triton.sh
This will create both torch_sm120.whl and triton_sm120.whl.
See TRITON_BUILD_GUIDE.md for detailed Triton build instructions and troubleshooting.
Build Time
- PyTorch only: 1-2 hours
- PyTorch + Triton: 1.5-3 hours
License
PyTorch is released under the BSD-3-Clause license. This wheel is compiled from the official PyTorch source code with no modifications except for the architecture target.
Changelog
v2.10.0a0+gitc5d91d9 (November 12, 2025)
- Initial release
- Built from PyTorch main branch (commit c5d91d9)
- Native SM 12.0 support for RTX 50-series
- CUDA 13.0.1 compatibility
- Python 3.12 support
Built with care for the RTX 50-series community.
Download
The PyTorch wheel file (180 MB) is too large for direct GitHub hosting.
Download from GitHub Releases: https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/latest
Or use this direct link:
wget https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch_sm120.whl
Then follow the installation instructions above.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stone_linux-2.10.0a0.tar.gz.
File metadata
- Download URL: stone_linux-2.10.0a0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c23b05aed021239989bea415c0601560a56276fef6d691cef6d110758bb26c42
|
|
| MD5 |
5d072d30a79bc1c63acbbf91b3658ae2
|
|
| BLAKE2b-256 |
00589bbb532bf3f6f9669ba599a83160eecceb44e78e83b0b2c113dfff75ed9c
|
Provenance
The following attestation bundles were made for stone_linux-2.10.0a0.tar.gz:
Publisher:
publish-pypi.yml on kentstone84/PyTorch-2.10.0a0-for-Linux-
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stone_linux-2.10.0a0.tar.gz -
Subject digest:
c23b05aed021239989bea415c0601560a56276fef6d691cef6d110758bb26c42 - Sigstore transparency entry: 704646574
- Sigstore integration time:
-
Permalink:
kentstone84/PyTorch-2.10.0a0-for-Linux-@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kentstone84
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file stone_linux-2.10.0a0-py3-none-any.whl.
File metadata
- Download URL: stone_linux-2.10.0a0-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a74818174688fca7fb9fa7ba419c4d06e418c1271d7123d704e58d3d1378d7ce
|
|
| MD5 |
00a1990e02dfff2e1ead780995846f8d
|
|
| BLAKE2b-256 |
2f29bf9f107a2faa380745bfd995444183cae2354553ce10560b2f6adf4e4f48
|
Provenance
The following attestation bundles were made for stone_linux-2.10.0a0-py3-none-any.whl:
Publisher:
publish-pypi.yml on kentstone84/PyTorch-2.10.0a0-for-Linux-
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stone_linux-2.10.0a0-py3-none-any.whl -
Subject digest:
a74818174688fca7fb9fa7ba419c4d06e418c1271d7123d704e58d3d1378d7ce - Sigstore transparency entry: 704646583
- Sigstore integration time:
-
Permalink:
kentstone84/PyTorch-2.10.0a0-for-Linux-@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kentstone84
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d -
Trigger Event:
workflow_dispatch
-
Statement type: