Skip to main content

SVD-based time series imputation with uncertainty estimation

Project description

SVD Time Series Imputer

A Python package for time series imputation using Singular Value Decomposition (SVD) with automatic rank estimation and uncertainty quantification.

Table of Contents

A Python package for time series imputation using SVD with automatic rank estimation, uncertainty quantification, and scikit-learn compatible API.

Installation

Install from source (development version):

pip install -e .

Install with development dependencies:

pip install -e ".[dev]"

Quick Start

import pandas as pd
import numpy as np
from svd_imputer import Imputer

# Load your time series data (with datetime index)
df = pd.read_csv("your_data.csv", index_col=0, parse_dates=True)

# Simple imputation with automatic rank estimation
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# With uncertainty estimation  
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")

Note: The Imputer class uses a data-centric design where data is provided at initialization and preprocessed once. This ensures consistency across all analyses and eliminates redundant preprocessing operations.

Usage

from svd_imputer import Imputer

# Basic imputation (automatic rank estimation)
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# Cross-validation optimization
imputer = Imputer(data=df, rank="auto")
imputer.fit()
print(f"Optimized rank: {imputer.rank_}")

# With uncertainty estimation
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")

# Advanced: model diagnostics
residuals, stats = imputer.calculate_reconstruction_residuals(return_stats=True)
print(f"Reconstruction R²: {stats['r_squared']:.3f}")

Configuration

imputer = Imputer(
    data=df,                    # Input DataFrame (required)
    variance_threshold=0.95,    # Variance threshold for auto rank estimation
    rank=None,                  # None (auto-estimate), int (fixed), or "auto" (optimize)
    max_iters=500,             # Maximum SVD iterations
    tol=1e-4,                  # Convergence tolerance  
    verbose=True               # Enable logging output
)

Examples

Complete examples are available in the examples/ directory:

  • basic_example.ipynb - Basic usage and quick start tutorial
  • augmented_example.ipynb - Extended examples with data agumentation features

How It Works

The algorithm performs iterative SVD imputation with automatic rank estimation:

  1. Preprocessing: Data validation, standardization, and missing value handling
  2. Rank Estimation: Variance threshold, cross-validation, or fixed rank
  3. SVD Imputation: Iterative low-rank approximation until convergence
  4. Uncertainty Estimation: Monte Carlo validation with temporal or random masking

API Reference

Main Class

Imputer(data, variance_threshold=0.95, rank=None, max_iters=500, tol=1e-4, verbose=True)

Key Methods

  • fit() / transform() / fit_transform(): Standard sklearn interface
  • estimate_uncertainty(): Monte Carlo validation
  • calculate_reconstruction_residuals(): Model diagnostics
  • project_data() / reconstruct_data(): SVD subspace operations

Requirements

  • Python >= 3.8
  • numpy >= 1.20.0
  • pandas >= 1.3.0
  • scikit-learn >= 1.0.0

Performance Notes

  • Memory: O(n × m) for data size n×m, plus O(min(n,m)²) for SVD decomposition
  • Time Complexity: O(k × min(n,m)³) where k is the number of SVD iterations
  • Recommended Scale: Efficient for datasets up to ~10,000 × 100 dimensions
  • Optimization: SVD components are cached for efficient reuse across operations

Development Status

This package is currently very much in Beta User beware!

Disclaimer

IMPORTANT: This software is provided "as is" without warranty of any kind. The authors and contributors make no representations or warranties regarding the accuracy, completeness, or validity of the code or its results. Users are solely responsible for validating the appropriateness and correctness of this software for their specific use cases. The authors assume no responsibility or liability for any errors, omissions, or damages arising from the use of this software.

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

If you use this package in your research, please cite:

@software{svd_time_series_imputer,
  title={SVD Time Series Imputer: A Python Package for Missing Data Imputation},
  author={Rui Hugman},
  year={2025},
  url={https://github.com/rhugman/svd_imputer}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svd_imputer-0.1.0.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svd_imputer-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file svd_imputer-0.1.0.tar.gz.

File metadata

  • Download URL: svd_imputer-0.1.0.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for svd_imputer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c5fb9c47568d2ee0534aca2fc2862f82fb4261c5569edb66e12def2b9437c6a0
MD5 b33dbda10bc7d2f53ef8bbb4206040c6
BLAKE2b-256 c6b7ff2712d8f9285b5dbcf303268345a68d2b409d47d04a8cd302a05116b568

See more details on using hashes here.

File details

Details for the file svd_imputer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: svd_imputer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for svd_imputer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2eeb60574114f2174a7331aab124de9403330dad49c2766a4a2395477078e7cf
MD5 cfa9475558ab8a824f2df6f2ed9dcbf6
BLAKE2b-256 e33b6790c38f78efee8c6f8a67ffe547ad0bd89e494b98abed406840c56daad1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page