Skip to main content

A framework to analyze imperfections (missingness, noise) in time-series datasets.

Project description

Imperfekt - Understanding Data Imperfections in Time-Series

PyPI version License: MIT Python 3.10+

A comprehensive analysis toolkit for studying "imperfect" data patterns in time-series datasets. Imperfection refers to missingness, noise, and other data quality issues that can be indicated using a binary mask.

Overview

This library provides tools to analyze data quality issues in time-series data, including:

  • Intravariable analysis of imperfection patterns for individual variables
  • Intervariable analysis of co-occurring imperfections across multiple parameters
  • Feature generation based on missingness patterns for downstream ML tasks

Installation

Install the library using pip:

pip install imperfekt

Note: If you export Plotly figures as static images (save_results=True), some environments may raise a plotly_get_chrome/Kaleido error at runtime. This happens because Kaleido needs a Chrome/Chromium binary. Install Chrome manually, or run:

plotly_get_chrome

Quick Start

import polars as pl
from imperfekt import Imperfekt, FeatureGenerator

# Load your time-series data
df = pl.read_parquet("your_data.parquet")

# Run simple

# Configure Analyzer Setup
analyzer = Imperfekt(
    df=df,
    id_col="id",           # Unique identifier column
    clock_col="clock",     # Timestamp column
    cols=["var1", "var2"], # Variables to analyze
    save_path="./results"
)

# Simple intravariable missingness stats
analyzer.intravariable.column_statistics(save_results=True)
print(analyzer.intravariable.results.cs_overall_statistics)
print(analyzer.intravariable.results.cs_case_level_statistics)

# Run full imperfection analysis (preliminary correlations, intra- and intervariable analyses)
results = analyzer.run()

# Or generate missingness-aware features for ML
fg = FeatureGenerator(
    df=df,
    id_col="id",
    clock_col="clock",
    variable_cols=["var1", "var2"]
)
features_df = fg.add_binary_masks().add_temporal_features().df

# Or restrict individual steps to a subset of variables
features_df = (
    fg.add_binary_masks(cols=["var1"])
    .add_temporal_features(cols=["var1"])
    .add_window_features(rolling_window_sizes=[2], ewma_alphas=[0.3], cols=["var1", "var2"])
    .df
)

Library Structure

imperfekt/
├── analysis/
│   ├── preliminary/     # Basic data exploration
│   ├── intravariable/      # Single variable analysis
│   ├── intervariable/    # Multi-variable patterns
│   └── utils/           # Shared utilities
├── features/            # Feature engineering
│   ├── core.py          # FeatureGenerator class
│   ├── temporal.py      # Time-based features
│   └── interaction.py   # Variable interactions
└── config/              # Default settings

Data Format

The library expects time-series data with the following structure:

Column Description
id Unique identifier for each time-series (e.g., patient, sensor)
clock Timestamp for each observation
var1, var2, ... Variables to analyze

Key Dependencies

  • polars: High-performance data processing
  • plotly: Interactive visualizations
  • scipy: Statistical computations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imperfekt-0.3.0.tar.gz (327.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imperfekt-0.3.0-py3-none-any.whl (352.7 kB view details)

Uploaded Python 3

File details

Details for the file imperfekt-0.3.0.tar.gz.

File metadata

  • Download URL: imperfekt-0.3.0.tar.gz
  • Upload date:
  • Size: 327.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.3.0.tar.gz
Algorithm Hash digest
SHA256 aefadfc385c845ae7f8c1018f9bb3776c0b9428ff629440dfc5ecf5d68e9a5cc
MD5 42db480f8601f0aee5fece95031ec75e
BLAKE2b-256 f6871c2feba9e945c640dde88d7a56190213227c695eb39e3a5c79a5d2ee32ca

See more details on using hashes here.

File details

Details for the file imperfekt-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: imperfekt-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 352.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1dd6ba101d9176082794dca732b4cc6e3eef05bf4a801ba0f9f41ddea13351b2
MD5 1f26b1064ce31183d9daee7175652f77
BLAKE2b-256 689bf1d0051cf41dd86b15cccec0835f32639ab5e1813bb4ce4391ecea6ee675

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page