SVD-based time series imputation with uncertainty estimation
Project description
SVD Time Series Imputer
A Python package for time series imputation using Singular Value Decomposition (SVD) with automatic rank estimation and uncertainty quantification.
📦 Now available on PyPI: pip install svd-imputer
Table of Contents
A Python package for time series imputation using SVD with automatic rank estimation, uncertainty quantification, and scikit-learn compatible API.
Installation
PyPI (Recommended):
pip install svd-imputer
From Source (development version):
git clone https://github.com/rhugman/svd_imputer.git
cd svd_imputer
pip install -e .
With Development Dependencies:
pip install -e ".[dev]"
Quick Start
import pandas as pd
import numpy as np
from svd_imputer import Imputer
# Load your time series data (with datetime index)
df = pd.read_csv("your_data.csv", index_col=0, parse_dates=True)
# Simple imputation with automatic rank estimation
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()
# With uncertainty estimation
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")
Note: The
Imputerclass uses a data-centric design where data is provided at initialization and preprocessed once. This ensures consistency across all analyses and eliminates redundant preprocessing operations.
Usage
from svd_imputer import Imputer
# Basic imputation (automatic rank estimation)
imputer = Imputer(data=df, variance_threshold=0.95)
df_imputed = imputer.fit_transform()
# Cross-validation optimization
imputer = Imputer(data=df, rank="auto")
imputer.fit()
print(f"Optimized rank: {imputer.rank_}")
# With uncertainty estimation
df_imputed, uncertainty = imputer.fit_transform(return_uncertainty=True)
print(f"RMSE: {uncertainty['rmse']:.3f} ± {uncertainty['rmse_std']:.3f}")
# Advanced: model diagnostics
residuals, stats = imputer.calculate_reconstruction_residuals(return_stats=True)
print(f"Reconstruction R²: {stats['r_squared']:.3f}")
Configuration
imputer = Imputer(
data=df, # Input DataFrame (required)
variance_threshold=0.95, # Variance threshold for auto rank estimation
rank=None, # None (auto-estimate), int (fixed), or "auto" (optimize)
max_iters=500, # Maximum SVD iterations
tol=1e-4, # Convergence tolerance
verbose=True # Enable logging output
)
Examples
Complete examples are available in the examples/ directory:
basic_example.ipynb- Basic usage and quick start tutorialaugmented_example.ipynb- Extended examples with data agumentation features
How It Works
The algorithm performs iterative SVD imputation with automatic rank estimation:
- Preprocessing: Data validation, standardization, and missing value handling
- Rank Estimation: Variance threshold, cross-validation, or fixed rank
- SVD Imputation: Iterative low-rank approximation until convergence
- Uncertainty Estimation: Monte Carlo validation with temporal or random masking
API Reference
Main Class
Imputer(data, variance_threshold=0.95, rank=None, max_iters=500, tol=1e-4, verbose=True)
Key Methods
fit()/transform()/fit_transform(): Standard sklearn interfaceestimate_uncertainty(): Monte Carlo validationcalculate_reconstruction_residuals(): Model diagnosticsproject_data()/reconstruct_data(): SVD subspace operations
Requirements
- Python >= 3.8
- numpy >= 1.20.0
- pandas >= 1.3.0
- scikit-learn >= 1.0.0
Performance Notes
- Memory: O(n × m) for data size n×m, plus O(min(n,m)²) for SVD decomposition
- Time Complexity: O(k × min(n,m)³) where k is the number of SVD iterations
- Recommended Scale: Efficient for datasets up to ~10,000 × 100 dimensions
- Optimization: SVD components are cached for efficient reuse across operations
Package Status
Current Status: Published on PyPI 🎉
This package is currently in Beta - the core functionality is stable and tested (86 tests passing), but the API may evolve. Suitable for research and development use.
Disclaimer
IMPORTANT: This software is provided "as is" without warranty of any kind. The authors and contributors make no representations or warranties regarding the accuracy, completeness, or validity of the code or its results. Users are solely responsible for validating the appropriateness and correctness of this software for their specific use cases. The authors assume no responsibility or liability for any errors, omissions, or damages arising from the use of this software.
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Links
- PyPI Package: https://pypi.org/project/svd-imputer/
- Source Code: https://github.com/rhugman/svd_imputer
- Issues: https://github.com/rhugman/svd_imputer/issues
Citation
If you use this package in your research, please cite:
@software{svd_time_series_imputer,
title={SVD Time Series Imputer: A Python Package for Missing Data Imputation},
author={Rui Hugman},
year={2025},
url={https://github.com/rhugman/svd_imputer},
note={Available on PyPI: https://pypi.org/project/svd-imputer/}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file svd_imputer-0.1.1.tar.gz.
File metadata
- Download URL: svd_imputer-0.1.1.tar.gz
- Upload date:
- Size: 60.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da1c6a49378cc5e820b11a1a2b7ebfbc4f2fb3b4b5e8170d7ee8255de38bc601
|
|
| MD5 |
7d9217364567f8248eed4958e2c8d4bf
|
|
| BLAKE2b-256 |
0d903bef521cf695e2a452a035f1cb2a7edee8def038c99803b64cdeff966ff7
|
File details
Details for the file svd_imputer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: svd_imputer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
752f0c46f78b20d3f302e7ec1b415c21e4a76a2a62eae950829910ab52ad5dad
|
|
| MD5 |
b20eeb3dbeb01481656d7cb3af4ad58d
|
|
| BLAKE2b-256 |
08a243a1eeb53493128befd01ebf05b8434633d92b9be2a3936154946dab66a1
|