Skip to main content

High-performance SDC detection and neural healing for billion-scale tensors.

Project description

TorchQuery 🛡️

TorchQuery Logo


PyPI version License: MIT

TorchQuery is a high-performance reliability engine for PyTorch. It provides a "Neural Shield" against Silent Data Corruption (SDC), hardware bit-flips, and numerical instability in massive Deep Learning models.

🚀 Key Features

  • Billion-Scale Protection: Optimized streaming logic designed to handle tensors with $10^9$ elements without crashing.
  • Neural Healing: Automatically detects and repairs corrupted weights or neurons using statistical outlier detection ($\sigma$-clamping).
  • Distributed SyncBatch: Cluster-aware protection using All-Reduce to ensure safety across multi-GPU and multi-server environments.
  • Zero-Invasive: Simply wrap your existing tensors or model parameters; no architecture changes required.

Visualizing Silent Data Corruption (SDC)

Hardware glitches—like cosmic rays or VRAM overclocks—can cause random bit-flips. These create massive statistical outliers or NaNs in your tensor data.

[Image Link to Image_5.png]

TorchQuery acts as a Neural Shield that sweeps your multidimensional arrays. It identifies values that can lead to exploding gradients (3e38) or numerical instability (NaN), "healing" them before they propagate.

Pre-Sweep State:

  • NaN (Not a Number): Corrupts entire model during backpropagation.
  • 3e38: Causes exploding gradients, destroying training stability.

Post-Sweep State:

  • Invalid data is removed, leaving behind validated tensor values.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchquery-2.2.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchquery-2.2.0-py3-none-any.whl (3.2 kB view details)

Uploaded Python 3

File details

Details for the file torchquery-2.2.0.tar.gz.

File metadata

  • Download URL: torchquery-2.2.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for torchquery-2.2.0.tar.gz
Algorithm Hash digest
SHA256 ff4a01fe9210a980e63bdbd219657c81c6be6bcaecffe82e0edca9ad8f0e91e0
MD5 7277601b17199e076dd74c50d43cecc4
BLAKE2b-256 63b0e60638e4489101056d7ccffd2f6b52d232fb982b216c98ec97cf6d47b551

See more details on using hashes here.

File details

Details for the file torchquery-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: torchquery-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for torchquery-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eeb8e211bfa0f1dea52eccbb4968a28fa2db1cdd3369a6bc457f8c6551e5edd2
MD5 ca9b0c98089a23e4729c2a2e28d56549
BLAKE2b-256 8760f25f6f1e042d9dc4e814b05037a907e92d30bc56fd83016c4d5f75faf5f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page