A model-agnostic framework for gradient-based data refinement.
Project description
DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement
DenoGrad is a novel, model-agnostic framework for gradient-based data refinement that leverages the representational knowledge and spectral bias of deep neural networks to correct corrupted observations. It operates within the Data-Centric AI paradigm, where the focus shifts from improving models to improving data.
Unlike supervised denoising approaches that require clean ground truth, DenoGrad performs input optimization: it freezes the weights of a pre-trained backbone model and iteratively backpropagates error corrections directly into the input space, guiding noisy samples toward regions consistent with the learned data manifold.
Paper: DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement J. Javier Alonso-Ramos, Ignacio Aguilera-Martos, Andrés Herrera-Poyatos, Francisco Herrera University of Granada & DaSCI Institute
Key Features
- Model-Agnostic: Works with any differentiable PyTorch backbone (MLP, LSTM, xLSTM, CNN, CNN-LSTM, Transformers, TabPFN, DLinear, etc.).
- No Clean Ground Truth Required: Self-supervised input optimization on the noisy dataset itself.
- Dual Domain Support: Specialized handling for both Static Tabular data and Time-Series forecasting (via a Consensus Strategy).
- Joint Feature-Target Optimization: Simultaneously refines input features $X$ and continuous targets $Y$ using jointly normalized gradients.
- Manifold Preservation: Achieves state-of-the-art error reduction while maintaining the highest structural fidelity, evidenced by minimal Sliced Wasserstein Distance (SWD) and maximal feature correlation consistency ($\bar{\rho}$).
- Dataset-Level Regularizer: Yields predictive improvements even on nominally clean datasets by mitigating latent aleatory noise.
Installation
DenoGrad is available on PyPI:
pip install denograd
Or install the latest version from source:
git clone https://github.com/ari-dasci/S-noise-gradient.git
cd S-noise-gradient
pip install .
Requirements: Python >= 3.6, PyTorch, NumPy, tqdm
Quick Start
DenoGrad integrates seamlessly into existing PyTorch pipelines. You need your (noisy) data and a model that has been trained on it.
Static Tabular Data
import torch
import torch.nn as nn
from denograd import DenoGrad
# 1. Define and train your model on the noisy data
model = nn.Sequential(
nn.Linear(10, 64), nn.ReLU(),
nn.Linear(64, 32), nn.ReLU(),
nn.Linear(32, 1)
)
criterion = nn.MSELoss()
# ... train the model on X_noisy, y_noisy ...
# 2. Initialize DenoGrad (reuses the trained backbone)
denoiser = DenoGrad(model=model, criterion=criterion, device=torch.device('cuda'))
# 3. Fit and Transform
X_clean, y_clean, grad_x, grad_y = denoiser.fit_transform(
X=X_noisy, # numpy array (n_samples, n_features)
y=y_noisy, # numpy array (n_samples,) or (n_samples, n_targets)
nrr=0.05, # Noise Reduction Rate (η)
nr_threshold=0.01, # Gating threshold (τ)
max_epochs=200
)
Time-Series Forecasting (Consensus Strategy)
For time-series data, DenoGrad employs a Consensus Strategy. Since a single time step $t$ participates in multiple overlapping sliding windows, DenoGrad accumulates the gradient from every window context and averages them to produce a single, temporally consistent update.
# 1. Initialize DenoGrad with a sequential model (e.g., LSTM)
denoiser = DenoGrad(model=lstm_model, criterion=nn.MSELoss())
# 2. Fit and Transform in Time-Series mode
X_clean, y_clean, _, _ = denoiser.fit_transform(
X=X_ts_noisy, # numpy array (total_timesteps, n_features)
y=y_ts_noisy, # numpy array (total_timesteps,)
is_ts=True, # Enable Time-Series mode
window_size=24, # Sliding window size (look-back period)
future=1, # Steps ahead the model predicts
stride=1, # Window stride
nrr=0.01,
nr_threshold=0.1,
max_epochs=200
)
Pandas DataFrame Support
import pandas as pd
df = pd.DataFrame({"feat1": [...], "feat2": [...], "target": [...]})
X_clean, y_clean, _, _ = denoiser.fit_transform(
X=df,
y="target", # Column name(s) to use as target
nrr=0.05,
max_epochs=100
)
How It Works
In standard training, gradients update model weights $\theta$ to minimize loss. DenoGrad inverts this: it freezes $\theta$ and treats the data instances themselves as the trainable parameters.
Core Update Rule
$$x' = x - \eta \cdot \frac{g_x}{|[g_x, g_y]|2} \cdot \mathbb{I}{\text{noisy}}, \qquad y' = y - \eta \cdot \frac{g_y}{|[g_x, g_y]|2} \cdot \mathbb{I}{\text{noisy}}$$
where $g_x = \nabla_x \mathcal{L}(f_\theta(x), y)$, $g_y = \nabla_y \mathcal{L}(f_\theta(x), y)$, and $\mathbb{I}_{\text{noisy}}$ is a binary gating mask.
Algorithm Components
-
Input Optimization: Compute the gradient of the loss $\mathcal{L}$ with respect to the input features $X$ and targets $Y$ via backpropagation through the frozen model.
-
Gating Mechanism: A threshold $\tau$ controls noise tolerance. Gradients are zeroed for any instance where $|f_\theta(x) - y| \leq \tau$, preserving high-confidence samples and preventing over-smoothing. This retained stochasticity acts as implicit regularization.
-
Joint Normalization: Gradients for $X$ and $Y$ are concatenated and normalized by their joint $L_2$ norm. This ensures balanced corrections across all dimensions regardless of their scale.
-
Consensus Strategy (Time-Series): For sequential data, gradient contributions from all overlapping windows covering time step $t$ are accumulated into global buffers $G_t$ with visit counters $C_t$. The final update is the averaged consensus direction:
$$x_t^{\text{new}} = x_t^{\text{old}} - \eta \cdot \frac{G_t}{C_t}$$
Theoretical Foundation: Spectral Bias
DenoGrad exploits the well-documented spectral bias of neural networks: DNNs inherently prioritize learning low-frequency patterns (the true signal) over high-frequency variations (noise) during SGD training. Even when trained on noisy data, a sufficiently regularized model captures the underlying data manifold. The gradients derived from this model therefore direct noisy instances toward this learned manifold.
API Reference
DenoGrad(model, criterion, device=None)
| Parameter | Type | Description |
|---|---|---|
model |
nn.Module |
Pre-trained PyTorch model (weights will be frozen). |
criterion |
nn.modules.loss._Loss |
Loss function (e.g., nn.MSELoss()). |
device |
torch.device, optional |
Compute device. Auto-detects CUDA if available. |
The constructor automatically detects recurrent modules (RNN/LSTM/GRU) and sets the appropriate mode, and identifies CNN architectures for dimension handling.
.fit(X, y, is_ts=False, window_size=None, future=1, stride=1, flattening=False)
Configures the internal dataset strategy without running the denoising loop.
| Parameter | Type | Default | Description |
|---|---|---|---|
X |
array / Tensor / DataFrame | — | Input features. |
y |
array / Tensor / str / list | — | Targets. If X is a DataFrame, can be column name(s). |
is_ts |
bool |
False |
Enable Time-Series mode. |
window_size |
int |
None |
Sliding window size (required if is_ts=True). |
future |
int |
1 |
Forecasting horizon (steps ahead). |
stride |
int |
1 |
Stride between consecutive windows. |
flattening |
bool |
False |
Flatten windows into 1D vectors (useful for MLP on TS data). |
Returns self for method chaining.
.transform(nrr=0.05, nr_threshold=0.01, max_epochs=100, denoise_y=True, batch_size=1000, save_gradients=True)
Executes the denoising optimization loop.
| Parameter | Type | Default | Description |
|---|---|---|---|
nrr |
float |
0.05 |
Noise Reduction Rate ($\eta$). Step size for input corrections. |
nr_threshold |
float |
0.01 |
Gating Threshold ($\tau$). Instances with error $\leq \tau$ are skipped. |
max_epochs |
int |
100 |
Maximum optimization iterations. |
denoise_y |
bool |
True |
Whether to also refine the target variable $Y$. |
batch_size |
int |
1000 |
Mini-batch size for the DataLoader. |
save_gradients |
bool |
True |
Store per-epoch gradients for analysis. |
Returns (X_denoised, y_denoised, grad_x_list, grad_y_list).
.fit_transform(X, y, ..., nrr=0.05, nr_threshold=0.01, max_epochs=100, ...)
Convenience method combining .fit() and .transform(). Accepts all parameters from both methods.
Hyperparameter Guidelines
Based on the empirical analysis in the paper:
| Parameter | Recommended Range | Notes |
|---|---|---|
nrr ($\eta$) |
0.01 – 0.1 | Higher rates converge faster; peak performance within ~200 iterations. |
nr_threshold ($\tau$) |
0.1 | Robust baseline. Can be increased for larger aleatory margins. |
max_epochs |
100 – 500 | Conservative rates (0.001) require 10x more iterations without matching performance. |
Experimental Results
DenoGrad was evaluated on 10 real-world datasets (5 tabular, 5 time-series) against 7 state-of-the-art denoising baselines (DAE, DN-ResNet, PCA, WTD, EMD, KF, MA) using diverse downstream regressors (Ridge, kNN, XGBoost, DNN, TabPFN, LSTM, xLSTM, CNN-LSTM, DLinear).
Key Results (Friedman + Nemenyi test, $\alpha = 0.05$)
| Metric | DenoGrad Avg. Rank | Best Competitor |
|---|---|---|
| Predictive Improvement (Imp%) | 3.10 | KF (1.50) — but with severe manifold distortion |
| Sliced Wasserstein Distance (SWD ↓) | 1.70 | PCA (2.30) |
| Feature Correlation ($\bar{\rho}$ ↑) | 2.10 | DN-ResNet (1.90) |
DenoGrad uniquely occupies the optimal Pareto front: it achieves top-tier predictive gains while strictly preserving the topological integrity of the data. Methods that score higher in raw Imp% (e.g., KF at 98%+) do so at the cost of massive distributional distortion (SWD > 0.5, $\bar{\rho}$ < 0.3).
Highlights
- ECL dataset: 98.4% average improvement across all downstream models.
- Microsoft Stock: 97.6% improvement.
- Time-Series: The only method maintaining >90% improvement consistently across LSTM, xLSTM, CNN-LSTM, DLinear, and XGBoost.
Datasets Used
| Dataset | Type | Instances | Features |
|---|---|---|---|
| House Prices | Tabular | 21,436 | 19 |
| Lattice Physics | Tabular | 24,000 | 40 |
| Parkinsons | Tabular | 5,875 | 20 |
| RT-IoT 2022 | Tabular | 117,915 | 82 |
| Support2 | Tabular | 8,579 | 33 |
| Daily Climate | Time-Series | 1,576 | 4 |
| ECL | Time-Series | 6,000 | 320 |
| ETT | Time-Series | 17,420 | 7 |
| Microsoft Stock | Time-Series | 2,192 | 5 |
| WTH | Time-Series | 35,064 | 12 |
Citation
If you use DenoGrad in your research, please cite our paper:
@article{alonso2025denograd,
title={DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement},
author={Alonso-Ramos, J. Javier and Aguilera-Martos, Ignacio and Herrera-Poyatos, Andr{\'e}s and Herrera, Francisco},
year={2025}
}
Acknowledgments
This work was supported by the University of Granada and the Andalusian Institute of Data Science and Computational Intelligence (DaSCI). It is part of the Project "Ethical, Responsible and General Purpose Artificial Intelligence" (IAFER) funded by the European Union Next Generation EU.
License
This project is licensed under the GNU Affero General Public License v3 — see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file denograd-1.1.0.tar.gz.
File metadata
- Download URL: denograd-1.1.0.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c840a012048c78dceedc3e69c56c040b58bb5dc2b080681f381dc94dc7ce5fd2
|
|
| MD5 |
41250fca6366dfc520661f255cffcbb0
|
|
| BLAKE2b-256 |
c7cfb909bfd108791b1ad6c045c32107d1940c795030cb91de08f2c4d40e392a
|
File details
Details for the file denograd-1.1.0-py3-none-any.whl.
File metadata
- Download URL: denograd-1.1.0-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
453d88b57e89699298382b4081d7b907c217647fb5622fe0686b1ff2814f7d78
|
|
| MD5 |
f81d2988b3de2845045baeff0aac5a7e
|
|
| BLAKE2b-256 |
ff7448423db11b853f78cf7927d280f762c874ecc5d729dff56e053f3ae59779
|