Skip to main content

Time-aware missing data imputation for irregular time series

Project description

Time-Aware Missing Data Imputer

A Python library for intelligent time-series imputation with irregular intervals. Unlike traditional imputation methods that treat all gaps equally, this library understands that a 1-hour gap is fundamentally different from a 1-minute gap.

Features

  • Time-Aware Imputation: Respects temporal structure and irregular time intervals
  • Multiple Interpolation Methods: Linear, cubic, quadratic, PCHIP, and Akima splines
  • Gap Analysis Tools: Comprehensive diagnostics for understanding missing data patterns
  • Scikit-learn Compatible: Familiar fit/transform API that works with sklearn pipelines
  • Production-Ready: Fully tested, type-annotated, and formatted with black/mypy/flake8

Installation

pip install time-aware-imputer

For development:

git clone https://github.com/ontedduabhishakereddy/time-aware-imputer.git
cd time-aware-imputer
pip install -e ".[dev]"

Quick Start

import pandas as pd
import numpy as np
from time_aware_imputer import SplineImputer, GapAnalyzer

# Create sample data with missing values
df = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=100, freq='h'),
    'temperature': np.random.randn(100)
})
df.loc[10:15, 'temperature'] = np.nan
df.loc[50:52, 'temperature'] = np.nan

# Analyze gaps
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
print(f"Found {stats['temperature']['n_gaps']} gaps")
print(f"Missing: {stats['temperature']['missing_percentage']:.1f}%")

# Visualize gaps
fig = analyzer.plot_gaps(df)

# Impute missing values
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)

# Check which values were imputed
imputed_mask = imputer.get_imputed_mask()

Core Modules

1. TimeAwareImputer (Base Class)

Foundation for all imputation strategies with sklearn-compatible API.

from time_aware_imputer import TimeAwareImputer

# All imputers inherit from this base class
# Provides common functionality:
# - Timestamp validation and parsing
# - Automatic column detection
# - Imputation tracking

2. SplineImputer

Time-aware spline interpolation for smooth, trend-preserving imputation.

from time_aware_imputer import SplineImputer

# Linear interpolation (fast, simple)
imputer = SplineImputer(method='linear')

# Cubic spline (smooth, default)
imputer = SplineImputer(method='cubic')

# PCHIP (monotonicity-preserving)
imputer = SplineImputer(method='pchip')

# Akima (local interpolation, less oscillation)
imputer = SplineImputer(method='akima')

# Fit and transform
df_imputed = imputer.fit_transform(df)

Parameters:

  • method: Interpolation method ('linear', 'cubic', 'quadratic', 'slinear', 'pchip', 'akima')
  • fill_value: How to handle extrapolation ('extrapolate' or a float)
  • time_column: Name of timestamp column (default: 'timestamp')
  • value_columns: List of columns to impute (default: all numeric columns)

3. GapAnalyzer

Comprehensive gap analysis and visualization tools.

from time_aware_imputer import GapAnalyzer

analyzer = GapAnalyzer()

# Analyze gaps
stats = analyzer.analyze(df)
print(stats['temperature'])
# {
#     'n_gaps': 2,
#     'total_missing': 9,
#     'missing_percentage': 9.0,
#     'mean_gap_duration': 14400.0,  # seconds
#     'max_gap_duration': 18000.0,
#     'min_gap_duration': 10800.0
# }

# Get summary table
summary = analyzer.get_summary()

# Visualize gaps
fig = analyzer.plot_gaps(df, column='temperature')

# Missing data heatmap (for multivariate data)
fig = analyzer.plot_missing_heatmap(df)

Examples

Example 1: IoT Sensor Data

import pandas as pd
import numpy as np
from time_aware_imputer import SplineImputer, GapAnalyzer

# Simulate IoT sensor data with irregular timestamps and gaps
timestamps = pd.to_datetime([
    '2024-01-01 00:00:00',
    '2024-01-01 00:15:00',
    '2024-01-01 00:30:00',
    '2024-01-01 01:00:00',  # 30-min gap
    '2024-01-01 01:15:00',
    '2024-01-01 02:00:00',  # 45-min gap
    '2024-01-01 02:15:00',
])

temperature = [20.1, 20.3, np.nan, 21.5, np.nan, 22.0, 22.1]

df = pd.DataFrame({
    'timestamp': timestamps,
    'temperature': temperature
})

# Analyze gaps
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
print(f"Gaps: {stats['temperature']['n_gaps']}")
print(f"Mean gap duration: {stats['temperature']['mean_gap_duration']/60:.1f} minutes")

# Impute with cubic spline
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)
print(df_imputed)

Example 2: Multiple Sensors

# Multiple correlated sensors
df = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=100, freq='10min'),
    'temperature': np.random.randn(100) * 5 + 20,
    'humidity': np.random.randn(100) * 10 + 60,
    'pressure': np.random.randn(100) * 2 + 1013
})

# Introduce gaps
df.loc[20:25, 'temperature'] = np.nan
df.loc[40:43, 'humidity'] = np.nan
df.loc[70:72, 'pressure'] = np.nan

# Analyze all columns
analyzer = GapAnalyzer()
stats = analyzer.analyze(df)
summary = analyzer.get_summary()
print(summary)

# Impute all columns
imputer = SplineImputer(method='cubic')
df_imputed = imputer.fit_transform(df)

Example 3: Integration with Sklearn Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from time_aware_imputer import SplineImputer

# Create pipeline
pipeline = Pipeline([
    ('imputer', SplineImputer(method='cubic')),
    # Note: StandardScaler will work on all numeric columns
    # Need to handle timestamp column separately or drop it
])

# Use in ML workflow
# (Typically you'd separate timestamp column from features first)

API Reference

TimeAwareImputer

Methods:

  • fit(X, y=None): Fit the imputer on training data
  • transform(X): Transform data by imputing missing values
  • fit_transform(X, y=None): Fit and transform in one step
  • get_imputed_mask(): Get boolean mask of imputed values
  • get_feature_names_out(): Get output feature names

Attributes:

  • is_fitted_: Whether the imputer has been fitted
  • feature_names_in_: Names of features seen during fit
  • n_features_in_: Number of features seen during fit
  • imputed_mask_: Boolean mask of imputed values

SplineImputer

Inherits all methods and attributes from TimeAwareImputer.

Additional Attributes:

  • interpolators_: Dictionary of fitted interpolator objects per column

GapAnalyzer

Methods:

  • analyze(data): Analyze gaps and return statistics
  • get_summary(): Get summary DataFrame of gap statistics
  • plot_gaps(data, column=None): Visualize gaps in time series
  • plot_missing_heatmap(data): Create heatmap of missing patterns

Attributes:

  • gap_stats_: Dictionary of gap statistics per column

Development

Setup Development Environment

# Clone repository
git clone https://github.com/ontedduabhishakereddy/time-aware-imputer.git
cd time-aware-imputer

# Install with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=time_aware_imputer --cov-report=html

# Run specific test file
pytest tests/test_spline.py

# Run specific test
pytest tests/test_spline.py::TestSplineImputer::test_fit_transform_linear

Code Quality

# Format code with black
black time_aware_imputer tests

# Sort imports
isort time_aware_imputer tests

# Type checking with mypy
mypy time_aware_imputer

# Linting with flake8
flake8 time_aware_imputer tests

Running All Quality Checks

# Format
black time_aware_imputer tests
isort time_aware_imputer tests

# Check
mypy time_aware_imputer
flake8 time_aware_imputer tests

# Test
pytest

Requirements

  • Python >= 3.8
  • numpy >= 1.21.0
  • pandas >= 1.3.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • matplotlib >= 3.4.0

Use Cases

IoT & Industrial Monitoring

  • Sensor networks with irregular data collection
  • Network failures causing data gaps
  • Equipment downtime

Medical Devices

  • Continuous glucose monitoring
  • Heart rate monitors
  • Patient activity trackers

Financial Markets

  • High-frequency trading data
  • Tick data with irregular timestamps
  • Market data feed interruptions

Environmental Monitoring

  • Weather stations
  • Air quality sensors
  • Hydrological measurements

Roadmap

Future enhancements planned:

  • Gaussian Process imputation with uncertainty quantification
  • Adaptive imputation strategy selection
  • Multivariate imputation using correlations
  • Seasonal pattern detection and imputation
  • Real-time streaming imputation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{time_aware_imputer,
  title = {Time-Aware Missing Data Imputer},
  author = {Abhishake Reddy O },
  year = {2026},
  url = {https://github.com/ontedduabhishakereddy/time-aware-imputer}
}

Acknowledgments

  • Inspired by the gap between simple imputation (scikit-learn) and complex deep learning approaches
  • Built on top of NumPy, SciPy, and scikit-learn
  • Designed for real-world time-series challenges

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

time_aware_imputer-1.0.0.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

time_aware_imputer-1.0.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file time_aware_imputer-1.0.0.tar.gz.

File metadata

  • Download URL: time_aware_imputer-1.0.0.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for time_aware_imputer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d4d3a33e96c322381b0376a9228302b03409ccd2e7486ecb6440b9468e8619ef
MD5 f6129df16895641984fe23fd4e388b43
BLAKE2b-256 c971b13831114dad5eee6380de4b429521b62da2a03e23226bf492f0e49ba16a

See more details on using hashes here.

File details

Details for the file time_aware_imputer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for time_aware_imputer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd5410b2593fbd43e9afe42b5ce00ee22d542a5a0344c8a74255cc95aad3146b
MD5 957fc59f9763fefedaa78d6fce910032
BLAKE2b-256 8c9bf1fa5d575aa9da7e273ecba5df2eba2e5f31915bef769114de98d9682598

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page