Skip to main content

SVD-based time series imputation with uncertainty estimation

Project description

SVD Time Series Imputer

PyPI version Python versions License: MIT

A Python package for time series imputation using Singular Value Decomposition (SVD) with automatic rank estimation and uncertainty quantification.

📦 Now available on PyPI: pip install svd-imputer

Table of Contents

A Python package for time series imputation using SVD with automatic rank estimation, uncertainty quantification, and scikit-learn compatible API.

Installation

PyPI (Recommended):

pip install svd-imputer

From Source (development version):

git clone https://github.com/rhugman/svd_imputer.git
cd svd_imputer
pip install -e .

With Development Dependencies:

pip install -e ".[dev]"

Quick Start

import pandas as pd
import numpy as np
from svd_imputer import Imputer

# Load your time series data (with datetime index)
df = pd.read_csv("your_data.csv", index_col=0, parse_dates=True)

# Simple imputation with automatic rank estimation
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# With uncertainty estimation  
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")

Note: The Imputer class uses a data-centric design where data is provided at initialization and preprocessed once. This ensures consistency across all analyses and eliminates redundant preprocessing operations.

Usage

from svd_imputer import Imputer

# Basic imputation (automatic rank estimation)
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()

# Cross-validation optimization
imputer = Imputer(data=df, rank="auto")
imputer.fit()
print(f"Optimized rank: {imputer.rank_}")

# With uncertainty estimation
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")

# Advanced: model diagnostics
residuals, stats = imputer.calculate_reconstruction_residuals(return_stats=True)
print(f"Reconstruction R²: {stats['r_squared']:.3f}")

Configuration

imputer = Imputer(
    data=df,                    # Input DataFrame (required)
    variance_threshold=0.95,    # Variance threshold for auto rank estimation
    rank=None,                  # None (auto-estimate), int (fixed), or "auto" (optimize)
    max_iters=500,             # Maximum SVD iterations
    tol=1e-4,                  # Convergence tolerance  
    verbose=True               # Enable logging output
)

Examples

Complete examples are available in the examples/ directory:

  • basic_example.ipynb - Basic usage and quick start tutorial
  • augmented_example.ipynb - Extended examples with data agumentation features

How It Works

The algorithm performs iterative SVD imputation with automatic rank estimation:

  1. Preprocessing: Data validation, standardization, and missing value handling
  2. Rank Estimation: Variance threshold, cross-validation, or fixed rank
  3. SVD Imputation: Iterative low-rank approximation until convergence
  4. Uncertainty Estimation: Monte Carlo validation with temporal or random masking

API Reference

Main Class

Imputer(data, variance_threshold=0.95, rank=None, max_iters=500, tol=1e-4, verbose=True)

Key Methods

  • fit() / transform() / fit_transform(): Standard sklearn interface
  • estimate_uncertainty(): Monte Carlo validation
  • calculate_reconstruction_residuals(): Model diagnostics
  • project_data() / reconstruct_data(): SVD subspace operations

Requirements

  • Python >= 3.8
  • numpy >= 1.20.0
  • pandas >= 1.3.0
  • scikit-learn >= 1.0.0

Performance Notes

  • Memory: O(n × m) for data size n×m, plus O(min(n,m)²) for SVD decomposition
  • Time Complexity: O(k × min(n,m)³) where k is the number of SVD iterations
  • Recommended Scale: Efficient for datasets up to ~10,000 × 100 dimensions
  • Optimization: SVD components are cached for efficient reuse across operations

Package Status

Current Status: Published on PyPI 🎉

This package is currently in Beta - the core functionality is stable and tested (86 tests passing), but the API may evolve. Suitable for research and development use.

Disclaimer

IMPORTANT: This software is provided "as is" without warranty of any kind. The authors and contributors make no representations or warranties regarding the accuracy, completeness, or validity of the code or its results. Users are solely responsible for validating the appropriateness and correctness of this software for their specific use cases. The authors assume no responsibility or liability for any errors, omissions, or damages arising from the use of this software.

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Links

Citation

If you use this package in your research, please cite:

@software{svd_time_series_imputer,
  title={SVD Time Series Imputer: A Python Package for Missing Data Imputation},
  author={Rui Hugman},
  year={2025},
  url={https://github.com/rhugman/svd_imputer},
  note={Available on PyPI: https://pypi.org/project/svd-imputer/}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svd_imputer-0.1.1.tar.gz (60.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svd_imputer-0.1.1-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file svd_imputer-0.1.1.tar.gz.

File metadata

  • Download URL: svd_imputer-0.1.1.tar.gz
  • Upload date:
  • Size: 60.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for svd_imputer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 da1c6a49378cc5e820b11a1a2b7ebfbc4f2fb3b4b5e8170d7ee8255de38bc601
MD5 7d9217364567f8248eed4958e2c8d4bf
BLAKE2b-256 0d903bef521cf695e2a452a035f1cb2a7edee8def038c99803b64cdeff966ff7

See more details on using hashes here.

File details

Details for the file svd_imputer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: svd_imputer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for svd_imputer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 752f0c46f78b20d3f302e7ec1b415c21e4a76a2a62eae950829910ab52ad5dad
MD5 b20eeb3dbeb01481656d7cb3af4ad58d
BLAKE2b-256 08a243a1eeb53493128befd01ebf05b8434633d92b9be2a3936154946dab66a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page