Instance noise reduction framework based on Deep Learning gradients agnostic to the network architecture.
Project description
DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement
DenoGrad is a novel, model-agnostic framework designed to reduce noise in both input features and target variables by leveraging the gradients of a pre-trained Deep Learning model.
In the Data-Centric AI paradigm, traditional denoising often compromises data integrity by aggressively smoothing features. DenoGrad resolves this by leveraging the semantic spectral bias of neural networks. Instead of requiring clean ground truth data, it freezes the weights of your predictive backbone and iteratively backpropagates error corrections into the input space, effectively shifting noisy instances toward the learned data manifold.
Key Capabilities
- Model-Agnostic: Works with any differentiable PyTorch model (MLP, LSTM, CNN, Transformers, TabPFN, etc.).
- No Clean Ground Truth Required: Operates via self-supervised input optimization on the noisy dataset itself.
- Dual Domain Support: Specialized handling for both Static Tabular data and Time-Series (via a Consensus Strategy).
- Manifold Preservation: Achieves state-of-the-art error reduction while maintaining high structural fidelity (minimal $D_{KL}$ and high feature correlation).
📦 Installation
DenoGrad is available on PyPI and can be installed via pip:
pip install denograd
Alternatively, you can install the latest version from the source:
git clone [https://github.com/JJavier98/DenoGrad.git](https://github.com/JJavier98/DenoGrad.git)
cd DenoGrad
pip install -r requirements.txt
Requirements:
- Python >= 3.8
- PyTorch
- NumPy
- tqdm
🚀 Quick Start
DenoGrad integrates seamlessly into existing PyTorch pipelines. You simply need your noisy data and a model that has been trained (or partially trained) on it.
1. Static Tabular Data Example
import torch
import torch.nn as nn
from denograd import DenoGrad
# 1. Define your model and data
# The model should be pre-trained on the noisy data (or a similar distribution)
model = nn.Sequential(
nn.Linear(10, 32),
nn.ReLU(),
nn.Linear(32, 1)
)
criterion = nn.MSELoss()
# Assume X_noisy and y_noisy are your numpy arrays
# model.load_state_dict(...)
# 2. Initialize DenoGrad
denoiser = DenoGrad(model=model, criterion=criterion, device=torch.device('cuda'))
# 3. Fit and Transform
# nrr: Noise Reduction Rate (learning rate for the input)
# nr_threshold: Gating mechanism (don't correct if error < threshold)
X_clean, y_clean, grad_x, grad_y = denoiser.fit_transform(
X=X_noisy,
y=y_noisy,
nrr=0.05,
nr_threshold=0.01,
max_epochs=100
)
print("Denoising complete!")
2. Time-Series Example (Consensus Strategy)
For time-series data, DenoGrad employs a Consensus Strategy. Since a single time step $t$ appears in multiple sliding windows, DenoGrad accumulates gradients from all contexts and averages them to ensure temporal consistency.
# 1. Initialize DenoGrad with a recurrent model (e.g., LSTM)
denoiser = DenoGrad(model=lstm_model, criterion=criterion)
# 2. Fit and Transform with Time-Series parameters
X_clean, y_clean, _, _ = denoiser.fit_transform(
X=X_ts_noisy,
y=y_ts_noisy,
is_ts=True, # Enable Time-Series mode
window_size=24, # Size of the look-back window used by the model
stride=1,
future=1, # Steps ahead the model predicts
nrr=0.01,
max_epochs=50
)
🧠 How It Works
Traditional training updates weights ($\theta$) to minimize loss. DenoGrad inverts this process: it freezes $\theta$ and updates the input ($x$).
$$x_{new} \leftarrow x - \eta \cdot \nabla_x \mathcal{L}(f_\theta(x), y)$$
-
Input Optimization: The framework calculates the gradient of the loss with respect to the input features and targets.
-
Gating Mechanism: To prevent over-smoothing, DenoGrad only updates instances where the prediction error exceeds a user-defined threshold $\tau$ (aleatory margin).
-
Joint Normalization: Gradients for features and targets are normalized jointly to ensure balanced corrections across dimensions.
-
Consensus Strategy (Time-Series): For sequential data, gradients are accumulated across all sliding windows covering a time step $t$, and the final update is the average "consensus" direction.
🔧 API Reference
DenoGrad Class
__init__(model, criterion, device=None)
model: The pre-trained PyTorch model (nn.Module).criterion: The loss function (e.g.,nn.MSELoss).device: computing device ('cpu' or 'cuda').
fit_transform(X, y, ...)
Configures the dataset strategy and executes the denoising loop.
General Parameters:
-
X,y: Input data (Numpy array, Torch Tensor, or Pandas DataFrame). -
nrr(float, default=0.05): Noise Reduction Rate. Controls the step size of the correction ($\eta$). -
nr_threshold(float, default=0.01): Noise Tolerance. Corrections are zeroed out if $|y_{pred} - y_{true}| \le \tau$. -
max_epochs(int): Maximum number of optimization iterations. -
denoise_y(bool, default=True): Whether to also refine the target variable.
Time-Series Specific Parameters:
is_ts(bool): Set toTruefor sequence data.window_size(int): The input sequence length expected by the model.future(int): The forecasting horizon (default 1).flattening(bool): If true, flattens windows (useful for MLP backbones on TS data).
📄 Citation
If you use DenoGrad in your research, please cite our paper:
ON REVISION
👥 Acknowledgments
This work was supported by the University of Granada and the Andalusian Institute of Data Science and Computational Intelligence (DaSCI). It is part of the Project "Ethical, Responsible and General Purpose Artificial Intelligence" (IAFER) funded by the European Union Next Generation EU.
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file denograd-1.0.2.tar.gz.
File metadata
- Download URL: denograd-1.0.2.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57eb426e6e64bda26e5c6a77275dd280a43f99e45b4cf09ade824b2d6c2823f2
|
|
| MD5 |
d0de6c38fc31ea97d6eeb0b34039e636
|
|
| BLAKE2b-256 |
957e746a4a902f917cef808ea1d67b1b410fe84c0b17a5515d0ab76147709fe2
|
File details
Details for the file denograd-1.0.2-py3-none-any.whl.
File metadata
- Download URL: denograd-1.0.2-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f87f1aec07215530ac88fb0fdd1f4b9900282df7314c4f705bc306cdde88599c
|
|
| MD5 |
e1025f7a5ea980ad2b6c480611bdd627
|
|
| BLAKE2b-256 |
b9c28c826d64ce1eb80396c1706c2ee0cf40e52ad0e8a61e5f65f548a55e7db8
|