Time-aware missing data imputation for irregular time series
Project description
Time-Aware Missing Data Imputer
A Python library for intelligent time-series imputation with irregular intervals. Unlike traditional imputation methods that treat all gaps equally, this library understands that a 1-hour gap is fundamentally different from a 1-minute gap.
Features
- Time-Aware Imputation: Respects temporal structure and irregular time intervals
- Multiple Interpolation Methods: Linear, cubic, quadratic, PCHIP, and Akima splines
- Gap Analysis Tools: Comprehensive diagnostics for understanding missing data patterns
- Scikit-learn Compatible: Familiar
fit/transformAPI that works with sklearn pipelines - Production-Ready: Fully tested, type-annotated, and formatted with black/mypy/flake8
Installation
pip install time-aware-imputer
For development:
git clone https://github.com/ontedduabhishakereddy/time-aware-imputer.git
cd time-aware-imputer
pip install -e ".[dev]"
Quick Start
import pandas as pd
import numpy as np
from time_aware_imputer import SplineImputer, GapAnalyzer
# Create sample data with missing values
df = pd.DataFrame({
'timestamp': pd.date_range('2024-01-01', periods=100, freq='h'),
'temperature': np.random.randn(100)
})
df.loc[10:15, 'temperature'] = np.nan
df.loc[50:52, 'temperature'] = np.nan
# Analyze gaps
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
print(f"Found {stats['temperature']['n_gaps']} gaps")
print(f"Missing: {stats['temperature']['missing_percentage']:.1f}%")
# Visualize gaps
fig = analyzer.plot_gaps(df)
# Impute missing values
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)
# Check which values were imputed
imputed_mask = imputer.get_imputed_mask()
Core Modules
1. TimeAwareImputer (Base Class)
Foundation for all imputation strategies with sklearn-compatible API.
from time_aware_imputer import TimeAwareImputer
# All imputers inherit from this base class
# Provides common functionality:
# - Timestamp validation and parsing
# - Automatic column detection
# - Imputation tracking
2. SplineImputer
Time-aware spline interpolation for smooth, trend-preserving imputation.
from time_aware_imputer import SplineImputer
# Linear interpolation (fast, simple)
imputer = SplineImputer(method='linear')
# Cubic spline (smooth, default)
imputer = SplineImputer(method='cubic')
# PCHIP (monotonicity-preserving)
imputer = SplineImputer(method='pchip')
# Akima (local interpolation, less oscillation)
imputer = SplineImputer(method='akima')
# Fit and transform
df_imputed = imputer.fit_transform(df)
Parameters:
method: Interpolation method ('linear', 'cubic', 'quadratic', 'slinear', 'pchip', 'akima')fill_value: How to handle extrapolation ('extrapolate' or a float)time_column: Name of timestamp column (default: 'timestamp')value_columns: List of columns to impute (default: all numeric columns)
3. GapAnalyzer
Comprehensive gap analysis and visualization tools.
from time_aware_imputer import GapAnalyzer
analyzer = GapAnalyzer()
# Analyze gaps
stats = analyzer.analyze(df)
print(stats['temperature'])
# {
# 'n_gaps': 2,
# 'total_missing': 9,
# 'missing_percentage': 9.0,
# 'mean_gap_duration': 14400.0, # seconds
# 'max_gap_duration': 18000.0,
# 'min_gap_duration': 10800.0
# }
# Get summary table
summary = analyzer.get_summary()
# Visualize gaps
fig = analyzer.plot_gaps(df, column='temperature')
# Missing data heatmap (for multivariate data)
fig = analyzer.plot_missing_heatmap(df)
Examples
Example 1: IoT Sensor Data
import pandas as pd
import numpy as np
from time_aware_imputer import SplineImputer, GapAnalyzer
# Simulate IoT sensor data with irregular timestamps and gaps
timestamps = pd.to_datetime([
'2024-01-01 00:00:00',
'2024-01-01 00:15:00',
'2024-01-01 00:30:00',
'2024-01-01 01:00:00', # 30-min gap
'2024-01-01 01:15:00',
'2024-01-01 02:00:00', # 45-min gap
'2024-01-01 02:15:00',
])
temperature = [20.1, 20.3, np.nan, 21.5, np.nan, 22.0, 22.1]
df = pd.DataFrame({
'timestamp': timestamps,
'temperature': temperature
})
# Analyze gaps
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
print(f"Gaps: {stats['temperature']['n_gaps']}")
print(f"Mean gap duration: {stats['temperature']['mean_gap_duration']/60:.1f} minutes")
# Impute with cubic spline
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)
print(df_imputed)
Example 2: Multiple Sensors
# Multiple correlated sensors
df = pd.DataFrame({
'timestamp': pd.date_range('2024-01-01', periods=100, freq='10min'),
'temperature': np.random.randn(100) * 5 + 20,
'humidity': np.random.randn(100) * 10 + 60,
'pressure': np.random.randn(100) * 2 + 1013
})
# Introduce gaps
df.loc[20:25, 'temperature'] = np.nan
df.loc[40:43, 'humidity'] = np.nan
df.loc[70:72, 'pressure'] = np.nan
# Analyze all columns
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
summary = analyzer.get_summary()
print(summary)
# Impute all columns
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)
Example 3: Integration with Sklearn Pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from time_aware_imputer import SplineImputer
# Create pipeline
pipeline = Pipeline([
('imputer', SplineImputer(method='cubic')),
# Note: StandardScaler will work on all numeric columns
# Need to handle timestamp column separately or drop it
])
# Use in ML workflow
# (Typically you'd separate timestamp column from features first)
API Reference
TimeAwareImputer
Methods:
fit(X, y=None): Fit the imputer on training datatransform(X): Transform data by imputing missing valuesfit_transform(X, y=None): Fit and transform in one stepget_imputed_mask(): Get boolean mask of imputed valuesget_feature_names_out(): Get output feature names
Attributes:
is_fitted_: Whether the imputer has been fittedfeature_names_in_: Names of features seen during fitn_features_in_: Number of features seen during fitimputed_mask_: Boolean mask of imputed values
SplineImputer
Inherits all methods and attributes from TimeAwareImputer.
Additional Attributes:
interpolators_: Dictionary of fitted interpolator objects per column
GapAnalyzer
Methods:
analyze(data): Analyze gaps and return statisticsget_summary(): Get summary DataFrame of gap statisticsplot_gaps(data, column=None): Visualize gaps in time seriesplot_missing_heatmap(data): Create heatmap of missing patterns
Attributes:
gap_stats_: Dictionary of gap statistics per column
Development
Setup Development Environment
# Clone repository
git clone https://github.com/ontedduabhishakereddy/time-aware-imputer.git
cd time-aware-imputer
# Install with dev dependencies
pip install -e ".[dev]"
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=time_aware_imputer --cov-report=html
# Run specific test file
pytest tests/test_spline.py
# Run specific test
pytest tests/test_spline.py::TestSplineImputer::test_fit_transform_linear
Code Quality
# Format code with black
black time_aware_imputer tests
# Sort imports
isort time_aware_imputer tests
# Type checking with mypy
mypy time_aware_imputer
# Linting with flake8
flake8 time_aware_imputer tests
Running All Quality Checks
# Format
black time_aware_imputer tests
isort time_aware_imputer tests
# Check
mypy time_aware_imputer
flake8 time_aware_imputer tests
# Test
pytest
Requirements
- Python >= 3.8
- numpy >= 1.21.0
- pandas >= 1.3.0
- scipy >= 1.7.0
- scikit-learn >= 1.0.0
- matplotlib >= 3.4.0
Use Cases
IoT & Industrial Monitoring
- Sensor networks with irregular data collection
- Network failures causing data gaps
- Equipment downtime
Medical Devices
- Continuous glucose monitoring
- Heart rate monitors
- Patient activity trackers
Financial Markets
- High-frequency trading data
- Tick data with irregular timestamps
- Market data feed interruptions
Environmental Monitoring
- Weather stations
- Air quality sensors
- Hydrological measurements
Roadmap
Future enhancements planned:
- Gaussian Process imputation with uncertainty quantification
- Adaptive imputation strategy selection
- Multivariate imputation using correlations
- Seasonal pattern detection and imputation
- Real-time streaming imputation
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this library in your research, please cite:
@software{time_aware_imputer,
title = {Time-Aware Missing Data Imputer},
author = {Abhishake Reddy O },
year = {2026},
url = {https://github.com/ontedduabhishakereddy/time-aware-imputer}
}
Acknowledgments
- Inspired by the gap between simple imputation (scikit-learn) and complex deep learning approaches
- Built on top of NumPy, SciPy, and scikit-learn
- Designed for real-world time-series challenges
Contact
- Author: Abhishake Reddy O
- Email: ontedduabhishakereddy@gmail.com
- GitHub: @ontedduabhishakereddy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file time_aware_imputer-1.0.0.tar.gz.
File metadata
- Download URL: time_aware_imputer-1.0.0.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4d3a33e96c322381b0376a9228302b03409ccd2e7486ecb6440b9468e8619ef
|
|
| MD5 |
f6129df16895641984fe23fd4e388b43
|
|
| BLAKE2b-256 |
c971b13831114dad5eee6380de4b429521b62da2a03e23226bf492f0e49ba16a
|
File details
Details for the file time_aware_imputer-1.0.0-py3-none-any.whl.
File metadata
- Download URL: time_aware_imputer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd5410b2593fbd43e9afe42b5ce00ee22d542a5a0344c8a74255cc95aad3146b
|
|
| MD5 |
957fc59f9763fefedaa78d6fce910032
|
|
| BLAKE2b-256 |
8c9bf1fa5d575aa9da7e273ecba5df2eba2e5f31915bef769114de98d9682598
|