Skip to main content

Instance noise reduction framework based on Deep Learning gradients agnostic to the network architecture.

Project description

DenoGrad: A Model-Agnostic Framework for Gradient-Based Data Refinement

PyPI version License: MIT

DenoGrad is a novel, model-agnostic framework designed to reduce noise in both input features and target variables by leveraging the gradients of a pre-trained Deep Learning model.

In the Data-Centric AI paradigm, traditional denoising often compromises data integrity by aggressively smoothing features. DenoGrad resolves this by leveraging the semantic spectral bias of neural networks. Instead of requiring clean ground truth data, it freezes the weights of your predictive backbone and iteratively backpropagates error corrections into the input space, effectively shifting noisy instances toward the learned data manifold.

Key Capabilities

  • Model-Agnostic: Works with any differentiable PyTorch model (MLP, LSTM, CNN, Transformers, TabPFN, etc.).
  • No Clean Ground Truth Required: Operates via self-supervised input optimization on the noisy dataset itself.
  • Dual Domain Support: Specialized handling for both Static Tabular data and Time-Series (via a Consensus Strategy).
  • Manifold Preservation: Achieves state-of-the-art error reduction while maintaining high structural fidelity (minimal $D_{KL}$ and high feature correlation).

📦 Installation

DenoGrad is available on PyPI and can be installed via pip:

pip install denograd

Alternatively, you can install the latest version from the source:

git clone [https://github.com/JJavier98/DenoGrad.git](https://github.com/JJavier98/DenoGrad.git)
cd DenoGrad
pip install -r requirements.txt

Requirements:

  • Python >= 3.8
  • PyTorch
  • NumPy
  • tqdm

🚀 Quick Start

DenoGrad integrates seamlessly into existing PyTorch pipelines. You simply need your noisy data and a model that has been trained (or partially trained) on it.

1. Static Tabular Data Example

import torch
import torch.nn as nn
from denograd import DenoGrad

# 1. Define your model and data
# The model should be pre-trained on the noisy data (or a similar distribution)
model = nn.Sequential(
    nn.Linear(10, 32),
    nn.ReLU(),
    nn.Linear(32, 1)
)
criterion = nn.MSELoss()

# Assume X_noisy and y_noisy are your numpy arrays
# model.load_state_dict(...) 

# 2. Initialize DenoGrad
denoiser = DenoGrad(model=model, criterion=criterion, device=torch.device('cuda'))

# 3. Fit and Transform
# nrr: Noise Reduction Rate (learning rate for the input)
# nr_threshold: Gating mechanism (don't correct if error < threshold)
X_clean, y_clean, grad_x, grad_y = denoiser.fit_transform(
    X=X_noisy, 
    y=y_noisy,
    nrr=0.05,           
    nr_threshold=0.01,  
    max_epochs=100
)

print("Denoising complete!")

2. Time-Series Example (Consensus Strategy)

For time-series data, DenoGrad employs a Consensus Strategy. Since a single time step $t$ appears in multiple sliding windows, DenoGrad accumulates gradients from all contexts and averages them to ensure temporal consistency.

# 1. Initialize DenoGrad with a recurrent model (e.g., LSTM)
denoiser = DenoGrad(model=lstm_model, criterion=criterion)

# 2. Fit and Transform with Time-Series parameters
X_clean, y_clean, _, _ = denoiser.fit_transform(
    X=X_ts_noisy, 
    y=y_ts_noisy,
    is_ts=True,          # Enable Time-Series mode
    window_size=24,      # Size of the look-back window used by the model
    stride=1,
    future=1,            # Steps ahead the model predicts
    nrr=0.01,
    max_epochs=50
)

🧠 How It Works

Traditional training updates weights ($\theta$) to minimize loss. DenoGrad inverts this process: it freezes $\theta$ and updates the input ($x$).

$$x_{new} \leftarrow x - \eta \cdot \nabla_x \mathcal{L}(f_\theta(x), y)$$

  1. Input Optimization: The framework calculates the gradient of the loss with respect to the input features and targets.

  2. Gating Mechanism: To prevent over-smoothing, DenoGrad only updates instances where the prediction error exceeds a user-defined threshold $\tau$ (aleatory margin).

  3. Joint Normalization: Gradients for features and targets are normalized jointly to ensure balanced corrections across dimensions.

  4. Consensus Strategy (Time-Series): For sequential data, gradients are accumulated across all sliding windows covering a time step $t$, and the final update is the average "consensus" direction.


🔧 API Reference

DenoGrad Class

__init__(model, criterion, device=None)

  • model: The pre-trained PyTorch model (nn.Module).
  • criterion: The loss function (e.g., nn.MSELoss).
  • device: computing device ('cpu' or 'cuda').

fit_transform(X, y, ...)

Configures the dataset strategy and executes the denoising loop.

General Parameters:

  • X, y: Input data (Numpy array, Torch Tensor, or Pandas DataFrame).

  • nrr (float, default=0.05): Noise Reduction Rate. Controls the step size of the correction ($\eta$).

  • nr_threshold (float, default=0.01): Noise Tolerance. Corrections are zeroed out if $|y_{pred} - y_{true}| \le \tau$.

  • max_epochs (int): Maximum number of optimization iterations.

  • denoise_y (bool, default=True): Whether to also refine the target variable.

Time-Series Specific Parameters:

  • is_ts (bool): Set to True for sequence data.
  • window_size (int): The input sequence length expected by the model.
  • future (int): The forecasting horizon (default 1).
  • flattening (bool): If true, flattens windows (useful for MLP backbones on TS data).

📄 Citation

If you use DenoGrad in your research, please cite our paper:

ON REVISION

👥 Acknowledgments

This work was supported by the University of Granada and the Andalusian Institute of Data Science and Computational Intelligence (DaSCI). It is part of the Project "Ethical, Responsible and General Purpose Artificial Intelligence" (IAFER) funded by the European Union Next Generation EU.


📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denograd-1.0.2.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

denograd-1.0.2-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file denograd-1.0.2.tar.gz.

File metadata

  • Download URL: denograd-1.0.2.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for denograd-1.0.2.tar.gz
Algorithm Hash digest
SHA256 57eb426e6e64bda26e5c6a77275dd280a43f99e45b4cf09ade824b2d6c2823f2
MD5 d0de6c38fc31ea97d6eeb0b34039e636
BLAKE2b-256 957e746a4a902f917cef808ea1d67b1b410fe84c0b17a5515d0ab76147709fe2

See more details on using hashes here.

File details

Details for the file denograd-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: denograd-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for denograd-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f87f1aec07215530ac88fb0fdd1f4b9900282df7314c4f705bc306cdde88599c
MD5 e1025f7a5ea980ad2b6c480611bdd627
BLAKE2b-256 b9c28c826d64ce1eb80396c1706c2ee0cf40e52ad0e8a61e5f65f548a55e7db8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page