PyTorch 2.10 with native SM 12.0 (Blackwell) support for NVIDIA RTX 50-series GPUs

These details have not been verified by PyPI

Project description

PyTorch 2.10.0a0 with SM 12.0 Support for RTX 50-series GPUs

🚀 Complete Optimization Stack for NVIDIA Blackwell Architecture 🚀

Available 12 months before PyTorch 2.10 official release.

Native Blackwell architecture support for NVIDIA GeForce RTX 5090, 5080, 5070 Ti, and 5070 GPUs.

📖 Quick Start Guide | 🔧 Triton Build Guide | 📋 Complete Feature List | 🐍 Python Versions | 🐳 Docker Install

Overview

This is a custom-built PyTorch 2.10.0a0 wheel compiled with native SM 12.0 (Blackwell) support. Unlike PyTorch nightlies which only provide PTX backward compatibility (~70-80% performance), this build includes optimized CUDA kernels specifically compiled for RTX 50-series GPUs.

Why This Build?

Official PyTorch releases and nightlies currently only support up to SM 8.9 (Ada Lovelace/RTX 40-series). When running on RTX 50-series GPUs, they fall back to PTX compatibility mode which:

Reduces performance by 20-30%
Increases JIT compilation overhead
Lacks Blackwell-specific optimizations

This build solves that problem by compiling PyTorch from source with TORCH_CUDA_ARCH_LIST=12.0, enabling full native performance.

Specifications

PyTorch Version: 2.10.0a0+gitc5d91d9
CUDA Version: 13.0.1 (compatible with all CUDA 12.x and 13.x)
Python Versions: 3.12 (stable), 3.13 (available), 3.14 (coming soon) - See guide
Platform: Linux x86_64
Architecture: SM 12.0 (compute_120, code_sm_120)
Build Date: November 12, 2025
Wheel Size: 180 MB

Features Included

This build includes all PyTorch 2.7-2.10 features and Blackwell-specific optimizations:

Core Features

✅ Native SM 12.0 Support - Full Blackwell architecture support ✅ CUDA 12.x and 13.x Compatible - Works with CUDA 12.0 through 13.0+ ✅ cuDNN 9.7.0+ - Latest cuDNN with Blackwell optimizations ✅ NCCL 2.25.1 - Multi-GPU communication library ✅ CUTLASS 3.8.0 - CUDA template library for linear algebra ✅ 128-bit Vectorization - Enhanced memory bandwidth utilization ✅ 5th Gen Tensor Cores - Native Blackwell Tensor Core support

Compiler Features

✅ torch.compile - PyTorch 2.x compilation (Triton required for full optimization) ✅ Torch Function Modes - Custom operation overriding ✅ Mega Cache - Improved compilation caching

Known Limitations

⚠️ Triton Not Included - Triton 3.3+ with SM 12.0 support requires separate compilation due to CUDA 13.0 PTXAS dependencies. See TRITON_BUILD_GUIDE.md for build instructions.

Impact of Missing Triton:

torch.compile works but with reduced optimization (~30-40% slower on attention-heavy workloads)
FlexAttention limited functionality
No custom Triton kernels

Want full torch.compile performance? Build Triton separately using our guide!

Supported GPUs

NVIDIA GeForce RTX 5090
NVIDIA GeForce RTX 5080
NVIDIA GeForce RTX 5070 Ti
NVIDIA GeForce RTX 5070

All GPUs with Blackwell architecture (SM 12.0 / Compute Capability 12.0)

Requirements

System Requirements

Linux x86_64 (Ubuntu 22.04+ recommended)
Python 3.12
NVIDIA Driver 570.00 or newer
CUDA 13.0+ compatible driver

Python Dependencies

numpy >= 2.3.0
packaging >= 25.0
PyYAML >= 6.0
typing-extensions >= 4.15.0

All dependencies are listed in requirements.txt and will be installed automatically.

Installation

Recommended: Install via PyPI

The easiest way to install PyTorch with RTX 50-series support:

# Install the stone-linux package
pip install stone-linux

# Run the installer (downloads and installs the appropriate PyTorch wheel)
stone-install

# Verify installation
stone-verify

The installer will automatically:

Detect your Python version
Download the correct PyTorch wheel from GitHub releases
Install all dependencies
Verify GPU compatibility

Alternative: Direct Wheel Installation

Download and install the wheel directly for your Python version:

# Python 3.10
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp310-cp310-linux_x86_64.whl

# Python 3.11
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp311-cp311-linux_x86_64.whl

# Python 3.12
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp312-cp312-linux_x86_64.whl

# Python 3.13
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp313-cp313-linux_x86_64.whl

# Python 3.14
pip install https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch-2.10.0a0-cp314-cp314-linux_x86_64.whl

Alternative: Clone and Install Script

# Clone this repository
git clone https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-.git
cd PyTorch-2.10.0a0-for-Linux-

# Run the installation script
chmod +x install.sh
./install.sh

Windows (WSL2 Required)

This is a Linux wheel. Windows users need WSL2 with Ubuntu:

# In WSL2 Ubuntu terminal
pip install stone-linux
stone-install

Verification

Quick Verification

If you installed via stone-linux:

stone-verify

Python Verification

Verify PyTorch is working correctly:

import torch
import stone_linux

# Quick verification
stone_linux.verify_installation()

# Or manually check
print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"GPU Name: {torch.cuda.get_device_name(0)}")
print(f"Compute Capability: {torch.cuda.get_device_capability(0)}")

# Test GPU operation
x = torch.rand(5, 3).cuda()
print(f"Tensor device: {x.device}")

Expected output:

PyTorch Version: 2.10.0a0+gitc5d91d9
CUDA Available: True
CUDA Version: 13.0
GPU Name: NVIDIA GeForce RTX 5080
Compute Capability: (12, 0)
Tensor device: cuda:0

Performance

Compared to PyTorch nightlies on RTX 50-series:

20-30% faster training and inference
No JIT overhead from PTX compilation
Native Blackwell optimizations
Full Tensor Core utilization

See the performance benchmarks notebook for detailed metrics.

Examples and Tutorials

Jupyter Notebooks

Getting Started - Basic PyTorch operations, neural networks, and mixed precision training
Performance Benchmarks - Comprehensive performance analysis and optimization tips

Integration Examples

vLLM Integration - High-performance LLM inference with vLLM
LangChain Integration - LLM applications with LangChain
Benchmarking Script - Performance benchmarking utilities

Run examples:

# Install with examples
pip install 'stone-linux[examples,vllm,langchain]'

# Run vLLM example
python -m stone_linux.examples.vllm_example

# Run LangChain example
python -m stone_linux.examples.langchain_example

# Run benchmarks
python -m stone_linux.examples.benchmark --output results.json

Troubleshooting

"CUDA not available" after installation

Verify NVIDIA driver version:
```
nvidia-smi
```
Should show driver >= 570.00

Check GPU compute capability:

nvidia-smi --query-gpu=compute_cap --format=csv,noheader

Should show 12.0

Python version mismatch

This wheel requires Python 3.12. Create a virtual environment:

python3.12 -m venv pytorch-env
source pytorch-env/bin/activate
pip install torch_sm120.whl

Building From Source

PyTorch Only

See the included Dockerfile.pytorch-builder for PyTorch-only build instructions.

docker build -f Dockerfile.pytorch-builder -t pytorch-sm120-builder .

PyTorch + Triton (Recommended for Full Performance)

For the complete build with Triton 3.3+ support:

chmod +x build-pytorch-triton.sh
./build-pytorch-triton.sh

This will create both torch_sm120.whl and triton_sm120.whl.

See TRITON_BUILD_GUIDE.md for detailed Triton build instructions and troubleshooting.

Build Time

PyTorch only: 1-2 hours
PyTorch + Triton: 1.5-3 hours

License

PyTorch is released under the BSD-3-Clause license. This wheel is compiled from the official PyTorch source code with no modifications except for the architecture target.

Changelog

v2.10.0a0+gitc5d91d9 (November 12, 2025)

Initial release
Built from PyTorch main branch (commit c5d91d9)
Native SM 12.0 support for RTX 50-series
CUDA 13.0.1 compatibility
Python 3.12 support

Built with care for the RTX 50-series community.

Download

The PyTorch wheel file (180 MB) is too large for direct GitHub hosting.

Download from GitHub Releases: https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/latest

Or use this direct link:

wget https://github.com/kentstone84/PyTorch-2.10.0a0-for-Linux-/releases/download/v2.10.0a0/torch_sm120.whl

Then follow the installation instructions above.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.10.0a0 pre-release

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stone_linux-2.10.0a0.tar.gz (36.9 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stone_linux-2.10.0a0-py3-none-any.whl (22.6 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file stone_linux-2.10.0a0.tar.gz.

File metadata

Download URL: stone_linux-2.10.0a0.tar.gz
Upload date: Nov 17, 2025
Size: 36.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stone_linux-2.10.0a0.tar.gz
Algorithm	Hash digest
SHA256	`c23b05aed021239989bea415c0601560a56276fef6d691cef6d110758bb26c42`
MD5	`5d072d30a79bc1c63acbbf91b3658ae2`
BLAKE2b-256	`00589bbb532bf3f6f9669ba599a83160eecceb44e78e83b0b2c113dfff75ed9c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stone_linux-2.10.0a0.tar.gz:

Publisher: publish-pypi.yml on kentstone84/PyTorch-2.10.0a0-for-Linux-

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stone_linux-2.10.0a0.tar.gz
- Subject digest: c23b05aed021239989bea415c0601560a56276fef6d691cef6d110758bb26c42
- Sigstore transparency entry: 704646574
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: kentstone84/PyTorch-2.10.0a0-for-Linux-@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/kentstone84
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d
- Trigger Event: workflow_dispatch

File details

Details for the file stone_linux-2.10.0a0-py3-none-any.whl.

File metadata

Download URL: stone_linux-2.10.0a0-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 22.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stone_linux-2.10.0a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a74818174688fca7fb9fa7ba419c4d06e418c1271d7123d704e58d3d1378d7ce`
MD5	`00a1990e02dfff2e1ead780995846f8d`
BLAKE2b-256	`2f29bf9f107a2faa380745bfd995444183cae2354553ce10560b2f6adf4e4f48`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stone_linux-2.10.0a0-py3-none-any.whl:

Publisher: publish-pypi.yml on kentstone84/PyTorch-2.10.0a0-for-Linux-

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stone_linux-2.10.0a0-py3-none-any.whl
- Subject digest: a74818174688fca7fb9fa7ba419c4d06e418c1271d7123d704e58d3d1378d7ce
- Sigstore transparency entry: 704646583
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: kentstone84/PyTorch-2.10.0a0-for-Linux-@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/kentstone84
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@8939373fac21c36b8c0a74c6e6c78e1f1cd80b5d
- Trigger Event: workflow_dispatch

stone-linux 2.10.0a0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PyTorch 2.10.0a0 with SM 12.0 Support for RTX 50-series GPUs

Overview

Why This Build?

Specifications

Features Included

Core Features

Compiler Features

Known Limitations

Supported GPUs

Requirements

System Requirements

Python Dependencies

Installation

Recommended: Install via PyPI

Alternative: Direct Wheel Installation

Alternative: Clone and Install Script

Windows (WSL2 Required)

Verification

Quick Verification

Python Verification

Performance

Examples and Tutorials

Jupyter Notebooks

Integration Examples

Troubleshooting

"CUDA not available" after installation

Python version mismatch

Building From Source

PyTorch Only

PyTorch + Triton (Recommended for Full Performance)

Build Time

License

Changelog

v2.10.0a0+gitc5d91d9 (November 12, 2025)

Download

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance