Skip to main content

High-performance SDC detection and neural healing for billion-scale tensors.

Project description

TorchQuery 🛡️

TorchQuery Logo


PyPI version License: MIT

TorchQuery is a high-performance reliability engine for PyTorch. It provides a "Neural Shield" against Silent Data Corruption (SDC), hardware bit-flips, and numerical instability in massive Deep Learning models.

🚀 Key Features

  • Billion-Scale Protection: Optimized streaming logic designed to handle tensors with $10^9$ elements without crashing.
  • Neural Healing: Automatically detects and repairs corrupted weights or neurons using statistical outlier detection ($\sigma$-clamping).
  • Distributed SyncBatch: Cluster-aware protection using All-Reduce to ensure safety across multi-GPU and multi-server environments.
  • Zero-Invasive: Simply wrap your existing tensors or model parameters; no architecture changes required.

📦 Installation

pip install torchquery

<p align="center">
  <img src="https://raw.githubusercontent.com/powerofaisinstudy-debug/torchquery/main/chl.png" width="600">
</p>

### Visualizing Silent Data Corruption (SDC)

Hardware glitches—like cosmic rays or VRAM overclocks—can cause random bit-flips. These create massive statistical outliers or `NaNs` in your tensor data.

[Image Link to Image_5.png]

**TorchQuery** acts as a `Neural Shield` that sweeps your multidimensional arrays. It identifies values that can lead to exploding gradients (`3e38`) or numerical instability (`NaN`), "healing" them before they propagate.

**Pre-Sweep State:**
* `NaN` (Not a Number): Corrupts entire model during backpropagation.
* `3e38`: Causes exploding gradients, destroying training stability.

**Post-Sweep State:**
* Invalid data is removed, leaving behind validated tensor values.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchquery-2.1.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchquery-2.1.2-py3-none-any.whl (3.2 kB view details)

Uploaded Python 3

File details

Details for the file torchquery-2.1.2.tar.gz.

File metadata

  • Download URL: torchquery-2.1.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for torchquery-2.1.2.tar.gz
Algorithm Hash digest
SHA256 03540862980d5f63b685f41840174d1d0131a1216ff508a0546297ddd6959c84
MD5 c2f003efd67e91b3be1fb74a04300d85
BLAKE2b-256 f04d1c5bac19795c1b8a4374f7e6642f01f5cca1b9d945ada3bb67e21434a7ac

See more details on using hashes here.

File details

Details for the file torchquery-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: torchquery-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for torchquery-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 75ef916f90a036cc779652c3d45f1fec9b1ebed1f3ef529543d9f1f66e401da9
MD5 5d61a838f466daa0786cc70679ed5e23
BLAKE2b-256 49edb70fbaf7f001e4d75e038e6da77f03c33ab05d4aa231d26a2e6796566045

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page