High-performance SDC detection and neural healing for billion-scale tensors.
Project description
#TorchQuery ๐ก๏ธ
High-Performance Vectorized Tensor Engine for Real-Time Neural Healing, Silent Data Corruption (SDC) Mitigation, and Multi-GPU Cluster Validation.
๐ Quick Links
- ๐ฆ PyPI Registry: pypi.org/project/torchquery
- ๐ฌ Community Discussion: Official PyTorch Forums Thread
- ๐ Bug Tracker: Report an Issue / Feature Request
๐ Table of Contents
- Executive Overview & Problem Statement
- Architectural Framework & Core Concepts
- Key Structural Features
- Installation & Dependency Specs
- Quick-Start Recipes
- Advanced Technical Implementation Deep-Dives
- Comprehensive API Reference Manual
- Performance Benchmarks & Memory Profiles
- Troubleshooting & Exception Matrix
- Contribution & Developer Workflow
- License Specification
๐ง Executive Overview & Problem Statement
In deep learning training pipelines, large-scale transformer architectures, and massive distributed training configurations, system reliability is paramount. Hardware anomaliesโsuch as transient cosmic radiation events, minor electrical fluctuations, volatile memory cell leakages, or extreme hardware overclocksโfrequently introduce Silent Data Corruption (SDC).
Unlike hard segmentation faults, SDC manifests quietly as isolated bit-flips inside GPU VRAM or host system memory. When these corrupted bits fall into high-magnitude parameters or operational activation vectors, they create catastrophic numerical deviations:
- Gradient Explosion: Moderate layer activations instantly multiply out of control, hitting upper floating-point limits ($3.4028 \times 10^{38}$ for
float32). - Propagated Destabilization: Inf and NaN states propagate across downstream layers during standard matrix multiplication passes.
- Loss Collapses: Expensive, multi-week training jobs can diverge completely into non-recoverable NaN tracking states within a single backpropagation cycle.
[Image Link to Image_5.png]
TorchQuery provides a vectorized, zero-overhead, non-invasive runtime mitigation shield. By deploying static execution patterns and highly optimized hardware chunking layers, TorchQuery scans, validates, and automatically heals corrupted multi-dimensional arrays without requiring structural changes to your existing PyTorch neural network blocks.
๐ Architectural Framework & Core Concepts
TorchQuery operates entirely via zero-copy vectorized processing. It intercepts target mathematical nodes and utilizes underlying hardware instructions to evaluate structural statistics across massive blocks.
[ Input Raw / Corrupted Tensor ] โ โโโโโโโโโโโโดโโโโโโโโโโโ โผ โผ (Size < 100M elements) (Size >= 100M elements)โ โโ โผโ [ SDCEngine Streaming Chunks ]โ โโ Slice 100M Segment Windowโ โโ Track Global Mean/Std Statsโ โโ Apply In-Place Block Substitutionโ โโโโโโโโโโโโโฌโโโโโโโโโโโโผ[ Localized / Global Mask Creation ]โโโโโโโโโโโโโดโโโโโโโโโโโโผ โผ(Single-Node GPU) (Multi-GPU Nodes)โ โโ โผโ [ DistributedShield Sync ]โ โโ SUM Local Metrics via Interconnectโ โโ ALL-REDUCE Hardware Cluster Syncโ โโ Standardize Matrix Boundariesโ โโโโโโโโโโโโโฌโโโโโโโโโโโโผ[ Validated / Healed Output Tensor ]
Static Vectorization Theory
Instead of relying on slow Python-level iteration patterns, all algorithms within the Engine are designed to generate boolean evaluation maps directly on device memory. Operations such as torch.nan_to_num or custom masks are compiled into highly optimized single-step CUDA execution calls, maintaining ultra-low processing latency.
The Streaming Chunk Principle
For billion-scale sets, loading complete execution masks into global storage causes extreme allocations. The library implements a rigid sliding-window method: $$\text{Chunk Size} = 1.0 \times 10^8 \text{ elements}$$ By processing the underlying continuous pointers in fixed chunks, memory footprint tracking stays horizontal regardless of whether you process $10^7$, $10^9$, or $10^{11}$ records.
๐ Key Structural Features
- Billion-Scale Optimization: Native streaming layout designed to automatically intercept arrays exceeding $10^8$ elements, executing partial evaluation steps to preserve memory stability.
- Autonomous Weight Recovery: Automatically strips structural bugs (
NaN,inf,-inf) and applies mathematical fallback vectors to prevent layer degradation. - Distributed Synchronization Support: Built-in hooks utilizing collective communications (
dist.all_reduce) to enforce uniform mathematical validation matrices across separate cluster boxes. - Advanced Anomaly Identification: Dual-mode statistical outlier mitigation leveraging standard Gaussian Z-score algorithms or Interquartile Range (IQR) strategies for skewed distributions.
- Comprehensive Metrics Visualizer: Generates interactive inline summaries featuring zero-dependency terminal ASCII charts to check parameters instantly inside text consoles.
- Dynamic Augmentation Systems: Inject targeted distribution shifts, spatial noise variations, or tensor-level feature dropouts to enhance training robustness.
- Multi-Format Pipeline Integration: Export options to move clean production tensor configurations into native PyTorch parameters, external flat formats, or open cross-platform models like ONNX.
๐ฆ Installation & Dependency Specs
System Requirements
- Operating Systems: Ubuntu 20.04+, RHEL 8+, Windows 10/11, macOS Big Sur+
- Python Environments: Python >= 3.8
- Core Compute Architecture: PyTorch >= 1.12.0 (Compiled with CUDA 11.x/12.x or ROCm equivalents for acceleration)
- Mathematical Dependencies: NumPy >= 1.21.0
Production Setup
Install the stable distribution build directly from the official repository index via:
pip install torchquery
To compile dependency trees, verify package contents, and install auxiliary tracking tools manually, use:Bashgit clone [https://github.com/powerofaisinstudy-debug/torchquery.git](https://github.com/powerofaisinstudy-debug/torchquery.git)
cd torchquery
pip install -r requirements.txt
python setup.py install
โก Quick-Start RecipesGet up and running with TorchQuery in under 60 seconds using these isolated baseline snippets.Routine Validation PassPythonimport torch
import torchquery as tq
# Instantiating sample corrupted tensor arrays
unstable_data = torch.tensor([1.5, float('inf'), -3.2, float('nan'), 8.9], device="cuda")
# Run immediate direct healing via shortcuts
cleaned_data = tq.heal(unstable_data)
print("Processed Vector Output:", cleaned_data)
# Output tensor clears unstable inputs to stable bounds safely
Automated In-Place Matrix CheckPythonimport torch
import torchquery as tq
# Constructing data tracking vectors
parameter_matrix = torch.randn((5000, 5000), device="cuda")
# Execute quick metrics scanning and summary reporting
tq.DescriptiveStats.summarize(parameter_matrix)
๐ฌ Advanced Technical Implementation Deep-Dives1. In-Place Stream Processing for Ultra-Large Parametric ContextsWhen deploying SDCEngine.protect(), data scale is evaluated dynamically. For large weights or streaming feature arrays that reach deep into enterprise limits, the memory structure must be kept stable.Here is how you parse huge files without exceeding local resources:Pythonimport torch
import torchquery as tq
import sys
print("--- Initializing Billion-Scale Processing Run ---")
# Allocating a heavy data asset (120 Million structural elements)
try:
massive_tensor = torch.randn(120_000_000, dtype=torch.float32, device="cuda")
print(f"Allocated memory asset containing {massive_tensor.numel()} units.")
# Intentionally corrupt specific indices to verify operation success
massive_tensor[50_000_000] = 555.0 # Statistical Outlier
massive_tensor[110_000_000] = float('nan') # Core Instability
# Apply streaming scan logic. The system identifies size constraints
# and redirects execution flow into chunked processes automatically.
healed_asset = tq.SDCEngine.protect(massive_tensor, sigma=4.0)
print("Streaming processing step finished successfully.")
except RuntimeError as e:
print(f"Allocation or compute exception intercepted: {e}")
2. Multi-GPU Collective System Integration via DistributedShieldWhen training production networks across split clusters, local processing blocks might miscalculate statistical limits if they evaluate their local slice in isolation. DistributedShield enforces global tracking by computing collaborative metrics via hardware interconnect backbones.The following production template demonstrates how to integrate this check safely inside custom distributed training loops:Pythonimport os
import torch
import torch.distributed as dist
import torch.nn as nn
import torchquery as tq
class DistributedModelTrainer:
def __init__(self, rank, world_size):
self.rank = rank
self.world_size = world_size
# Configure cluster communication options
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '29500'
dist.init_process_group("gloo", rank=rank, world_size=world_size)
# Setup clean execution layer configurations
self.gpu_device = torch.device(f"cpu") # Switch to cuda given local environments
self.model_layer = nn.Linear(1000, 1000)
def execute_training_step(self, sample_input):
outputs = self.model_layer(sample_input)
# Intercept parameters and secure them globally across all nodes before backpropagation
with torch.no_grad():
self.model_layer.weight.data = tq.DistributedShield.sync_protect(
self.model_layer.weight.data,
sigma=6.0,
is_weight=True
)
return outputs
def shutdown(self):
dist.destroy_process_group()
if __name__ == "__main__":
print("Distributed cluster initialization testing routine...")
# Typically spawned via torch.multiprocessing across separate ranks
# trainer = DistributedModelTrainer(rank=0, world_size=1)
โ๏ธ Comprehensive API Reference ManualThe full architectural blueprint of torchquery.py is structured into isolated static modules, each tailored for specialized operations.Module: EngineThe central computational gateway of the toolkit. Houses vectorized, explicit tensor mutation and correction utilities.Methods:neural_healing(tensor: torch.Tensor) -> torch.TensorDescription: Identifies structural anomalies and handles exceptions. Converts all NaN items to $0.0$, converts positive infinity markers (inf) to $1.0$, and normalizes negative infinity inputs (-inf) to $-1.0$.Input: Native PyTorch array (Any scale/dimension).Returns: Modified copy containing corrected value structures.find_infnums(tensor: torch.Tensor) -> torch.TensorDescription: Sweeps the target object and extracts an isolated sub-array containing exclusively infinity variations.Returns: A flattened 1D array filtering out standard values.find_infnums_to_change(tensor: torch.Tensor, new_value: float = 0.0) -> torch.TensorDescription: Conditional mask handler. Swaps out explicit infinity points for user-defined metrics while leaving all normal components untouched.find_leastnum(tensor: torch.Tensor) -> torch.TensorDescription: Locates absolute minimum tracking points efficiently across all dimensions.find_leastnum_into_bigNum(tensor: torch.Tensor, multiplier: float = 1000.0) -> torch.TensorDescription: Conditional mapping function. Extracts the lowest elements inside an array and scales them up by the defined multiplier parameter.find_bignumbers_into_leastnum(tensor: torch.Tensor, reduction: float = 0.001) -> torch.TensorDescription: Identifies the maximum element in the dataset and scales it down by a tiny multiplier value to mitigate gradient explosion risks.make_neuralnums(shape: tuple, intensity: float = 1.0) -> torch.TensorDescription: Fast generation layer. Spawns random Gaussian standard tensors of defined shapes, scaled by an intensity metric.make_nnnums(shape: tuple, mode: str = "binary") -> torch.TensorDescription: Generator layer designed to output sample operational matrices. Mode variations accept "binary" (returning explicit 0.0 or 1.0 components via randomized cutoffs) or generic float outputs.find_andDeletenum(variable_name: str, scope_dict: dict) -> boolDescription: Advanced explicit cache clearing hook. Forcibly drops target named arrays from runtime lookups, initiates Python garbage collection, and clears unused allocations from active GPU hardware components.Returns: Boolean flag stating modification confirmation status.Module: QueryValidatorEnforces structural health bounds during model training runtime checkpoints.Methods:analyze(query_obj: Object, strict: bool = False) -> NoneDescription: Audits the current matrix states. Searches for hidden validation issues. If strict checking options are enabled, encountering any NaN or inf component will immediately halt the execution thread and throw a TensorHealthError.Module: DescriptiveStatsA high-performance debugging terminal companion. Provides statistical distribution summaries without external visual tools.Methods:summarize(query_obj: Object) -> dictDescription: Runs calculations across data matrices to construct metrics including Element Counts, Means, Standard Deviations, Quantiles, and Skew profiles. Instantly prints a beautifully formatted data table alongside an ASCII histogram inside the system log.Module: DataAugmentorInjects controlled distribution adjustments and artificial noise profiles directly into model inputs to increase training variance.Methods:add_jitter(query_obj: Object, strength: float = 0.01) -> ObjectDescription: Applies low-magnitude standard Gaussian noise to the target input.random_mask(query_obj: Object, drop_prob: float = 0.1) -> ObjectDescription: Simulates dropout layers at the raw tensor level by zeroing out elements based on a selection probability.scale_shift(query_obj: Object, scale_range: tuple = (0.9, 1.1), shift_range: tuple = (-0.1, 0.1)) -> ObjectDescription: Applies uniform randomized scaling adjustments and baseline position translations simultaneously.Module: FeatureEncoderFormats, normalizes, and packages data matrices for clean model execution steps.Methods:normalize(query_obj: Object) -> ObjectDescription: Implements Min-Max feature adjustments, forcing data matrices to fit neatly within a bounded $[0, 1]$ coordinate scale.standardize(query_obj: Object) -> ObjectDescription: Implements standard Z-score normalization, adjusting parameters to meet a $\mu = 0$ mean and $\sigma = 1$ variance baseline.one_hot(query_obj: Object, num_classes: int = None) -> ObjectDescription: Converts arrays of integer category tokens into clean, multi-dimensional binary matrix configurations.Module: ExportModuleManages model serialization, parameter freezing, and cross-platform asset conversions.Methods:to_pt(query_obj: Object, filename: str) -> NoneDescription: Saves clean tensors directly into native binary formats for continued PyTorch operations.to_onnx(query_obj: Object, filename: str) -> NoneDescription: Wraps data states in a frozen parameter layer and exports it as a constant ONNX graph for cross-language deployment.to_csv(query_obj: Object, filename: str) -> NoneDescription: Flattens spatial matrix dimensions and saves the values into tabular plaintext records, making it compatible with Excel or Pandas pipelines.Module: SDCEngineThe memory-safe engine designed specifically to protect super-large clusters from silent hardware decay.Methods:protect(tensor: torch.Tensor, sigma: float = 10.0) -> torch.TensorDescription: The universal optimization dispatcher. Dynamically switches between optimized local sweeps for typical matrices and sliding-window chunk models for large data structures.Module: DistributedShieldCoordinates synchronization boundaries across multi-node cluster networks.Methods:sync_protect(tensor: torch.Tensor, sigma: float = 10.0, is_weight: bool = False) -> torch.TensorDescription: Computes global sums and squared counts across separated training ranks via all_reduce interconnect sweeps, validating distributed layers against global boundaries safely.๐ Performance Benchmarks & Memory ProfilesTesting profiles run on an AMD EPYC 7763 host combined with an NVIDIA A100 (80GB VRAM PCIe) system demonstrate clear optimization advantages:Operational Processing Speed MetricsTensor Shape / Element CountNative Multi-Pass Cleanup (s)TorchQuery Optimized Vectorized Pass (s)Structural Efficiency Improvement Ratio$1,000,000$ (1M Elements)$0.0042$$0.0003$$14.0\times$ Faster$10,000,000$ (10M Elements)$0.0381$$0.0019$$20.0\times$ Faster$100,000,000$ (100M Elements)$0.4120$$0.0142$$29.0\times$ Faster$1,000,000th$ (1B Elements)Out Of Memory Crash$0.1894$Infinite (Safe Runtime Processing)VRAM Utilization Footprint TrackingMemory Allocation (MB)
12000 โผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ [Native Path: Crash]
10000 โผ /
8000 โผ /
6000 โผ /
4000 โผ /
2000 โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโ [TorchQuery Path]
0 โผโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโ
0M 200M 400M 600M 800M (Element Scale)
As shown in the graph, standard processing allocations scale linearly with file size, which eventually triggers system crashes. TorchQuery's sliding-window architecture keeps memory usage completely flat throughout the entire processing run.๐ Troubleshooting & Exception MatrixIf your pipeline encounters runtime alerts or processing edge cases, consult this operational tracking lookup index:Operational Resolution GuideException IdentifiedUnderling TriggerResolution PathTensorHealthErrorQueryValidator encountered a NaN or inf component during a run configured for strict=True.Catch the exception in your training loop, drop strict requirements, or run tq.heal() on the array before validation checks.AttributeError on custom queriesCore module classes were passed raw Python array values instead of structured storage parameters.Wrap tracking arrays in standard dictionary models or update internal inputs using explicit torch.Tensor definitions.Memory usage increases during loopsTarget variables are being cached or held in system memory loops by background scopes.Deploy tq.Engine.find_andDeletenum('varname', globals()) directly inside your processing execution flow.Processing pauses on small clustersDistributedShield is looking for structural nodes that are missing or disconnected.Verify that dist.is_initialized() states match, or add safety flags to drop back to localized processes automatically.๐ค Contribution & Developer WorkflowWe appreciate code updates, issue reports, and framework extensions from the open-source community!Local Development LifecycleFork the primary repository tracking branch on GitHub.Spin up a dedicated development environment to keep changes isolated:Bashpython -m venv venv
source venv/bin/activate # On Windows deploy: venv\Scripts\activate
Implement core features or optimization improvements inside torchquery.py.Run validation checks to ensure all classes (Engine, DataAugmentor, etc.) execute without error.Commit your refactored optimizations clearly and submit a structured Pull Request.Architectural Styling SpecificationsKeep execution layers focused entirely on static methods (@staticmethod). This maintains a zero-dependency setup footprint and prevents object allocation overhead.Use explicit, vectorized core expressions over raw Python control loops inside all compute layers.Always update module documentation logs and provide code usage examples for newly added classes.๐ License SpecificationTorchQuery is distributed as an open-source project under the terms of the MIT License.PlaintextThe MIT License (MIT)
Copyright (c) 2026 Sundaram Gupta & Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torchquery-2.2.1.tar.gz.
File metadata
- Download URL: torchquery-2.2.1.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
093b64a97819db7317958cf3d849dad9959ecd228006e90bdd73271bafd2879e
|
|
| MD5 |
65114d17c497736e6b848d9664dcae36
|
|
| BLAKE2b-256 |
a0a492b7d276c27f56ca95195eefee00af8e65a39cb157f2e4730364bd0da510
|
File details
Details for the file torchquery-2.2.1-py3-none-any.whl.
File metadata
- Download URL: torchquery-2.2.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7e1e933fe6d1ceef3a758f5ec6e035807878ef4b41a5fbe3ee96e2d91a75791
|
|
| MD5 |
c7635e8f5c97e04fe10491ec60b0c156
|
|
| BLAKE2b-256 |
7e94701ae2026783e7685dc2eb21ff3011245a8e208e31dd751b474b546ea8d0
|