High-performance SDC detection and neural healing for billion-scale tensors.
Project description
TorchQuery 🛡️
TorchQuery is a high-performance reliability engine for PyTorch. It provides a "Neural Shield" against Silent Data Corruption (SDC), hardware bit-flips, and numerical instability in massive Deep Learning models.
🚀 Key Features
- Billion-Scale Protection: Optimized streaming logic designed to handle tensors with $10^9$ elements without crashing.
- Neural Healing: Automatically detects and repairs corrupted weights or neurons using statistical outlier detection ($\sigma$-clamping).
- Distributed SyncBatch: Cluster-aware protection using
All-Reduceto ensure safety across multi-GPU and multi-server environments. - Zero-Invasive: Simply wrap your existing tensors or model parameters; no architecture changes required.
Visualizing Silent Data Corruption (SDC)
Hardware glitches—like cosmic rays or VRAM overclocks—can cause random bit-flips. These create massive statistical outliers or NaNs in your tensor data.
[Image Link to Image_5.png]
TorchQuery acts as a Neural Shield that sweeps your multidimensional arrays. It identifies values that can lead to exploding gradients (3e38) or numerical instability (NaN), "healing" them before they propagate.
Pre-Sweep State:
NaN(Not a Number): Corrupts entire model during backpropagation.3e38: Causes exploding gradients, destroying training stability.
Post-Sweep State:
- Invalid data is removed, leaving behind validated tensor values.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torchquery-2.2.0.tar.gz.
File metadata
- Download URL: torchquery-2.2.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff4a01fe9210a980e63bdbd219657c81c6be6bcaecffe82e0edca9ad8f0e91e0
|
|
| MD5 |
7277601b17199e076dd74c50d43cecc4
|
|
| BLAKE2b-256 |
63b0e60638e4489101056d7ccffd2f6b52d232fb982b216c98ec97cf6d47b551
|
File details
Details for the file torchquery-2.2.0-py3-none-any.whl.
File metadata
- Download URL: torchquery-2.2.0-py3-none-any.whl
- Upload date:
- Size: 3.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeb8e211bfa0f1dea52eccbb4968a28fa2db1cdd3369a6bc457f8c6551e5edd2
|
|
| MD5 |
ca9b0c98089a23e4729c2a2e28d56549
|
|
| BLAKE2b-256 |
8760f25f6f1e042d9dc4e814b05037a907e92d30bc56fd83016c4d5f75faf5f5
|