Skip to main content

A framework to analyze imperfections (missingness, noise) in time-series datasets.

Project description

Imperfekt - Understanding Data Imperfections in Time-Series

PyPI version License: MIT Python 3.10+

A comprehensive analysis toolkit for studying "imperfect" data patterns in time-series datasets. Imperfection refers to missingness, noise, and other data quality issues that can be indicated using a binary mask.

Overview

This library provides tools to analyze data quality issues in time-series data, including:

  • Intravariable analysis of imperfection patterns for individual variables
  • Intervariable analysis of co-occurring imperfections across multiple parameters
  • Feature generation based on missingness patterns for downstream ML tasks

Installation

Install the library using pip:

pip install imperfekt

Note: If you export Plotly figures as static images (save_results=True), some environments may raise a plotly_get_chrome/Kaleido error at runtime. This happens because Kaleido needs a Chrome/Chromium binary. Install Chrome manually, or run:

plotly_get_chrome

Quick Start

import polars as pl
from imperfekt import Imperfekt, FeatureGenerator

# Load your time-series data
df = pl.read_parquet("your_data.parquet")

# Run simple

# Configure Analyzer Setup
analyzer = Imperfekt(
    df=df,
    id_col="id",           # Unique identifier column
    clock_col="clock",     # Timestamp column
    cols=["var1", "var2"], # Variables to analyze
    save_path="./results"
)

# Simple intravariable missingness stats
analyzer.intravariable.column_statistics(save_results=True)
print(analyzer.intravariable.results.cs_overall_statistics)
print(analyzer.intravariable.results.cs_case_level_statistics)

# Run full imperfection analysis (preliminary correlations, intra- and intervariable analyses)
results = analyzer.run()

# Or generate missingness-aware features for ML
fg = FeatureGenerator(
    df=df,
    id_col="id",
    clock_col="clock",
    variable_cols=["var1", "var2"]
)
features_df = fg.add_binary_masks().add_temporal_features().df

Library Structure

imperfekt/
├── analysis/
│   ├── preliminary/     # Basic data exploration
│   ├── intravariable/      # Single variable analysis
│   ├── intervariable/    # Multi-variable patterns
│   └── utils/           # Shared utilities
├── features/            # Feature engineering
│   ├── core.py          # FeatureGenerator class
│   ├── temporal.py      # Time-based features
│   └── interaction.py   # Variable interactions
└── config/              # Default settings

Data Format

The library expects time-series data with the following structure:

Column Description
id Unique identifier for each time-series (e.g., patient, sensor)
clock Timestamp for each observation
var1, var2, ... Variables to analyze

Key Dependencies

  • polars: High-performance data processing
  • plotly: Interactive visualizations
  • scipy: Statistical computations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imperfekt-0.2.6.tar.gz (413.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imperfekt-0.2.6-py3-none-any.whl (433.7 kB view details)

Uploaded Python 3

File details

Details for the file imperfekt-0.2.6.tar.gz.

File metadata

  • Download URL: imperfekt-0.2.6.tar.gz
  • Upload date:
  • Size: 413.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.6.tar.gz
Algorithm Hash digest
SHA256 004e8717c65f3978bcbad3e5b4c3e7106ada8763b542fa1cd802331957aae183
MD5 d4e7cfbd3397877804b05785fbbc1e98
BLAKE2b-256 a722d1375947496d397e8793315e4f7ffcfab8c4ff1a9bba3bc6c50c26fb49e4

See more details on using hashes here.

File details

Details for the file imperfekt-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: imperfekt-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 433.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 eabf537f329918550a817a1f2bed081c0d36cb052e9c028b6a02e30fefc34a84
MD5 a8661bd20937fed1c9e1bed442b27cd0
BLAKE2b-256 f55f0dbf6879d85248fecdfe7bb3a99f02a63cdc11c659e417111193c1d5dc01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page