Skip to main content

A framework to analyze imperfections (missingness, noise) in time-series datasets.

Project description

Imperfekt - Understanding Data Imperfections in Time-Series

PyPI version License: MIT Python 3.10+

A comprehensive analysis toolkit for studying "imperfect" data patterns in time-series datasets. Imperfection refers to missingness, noise, and other data quality issues that can be indicated using a binary mask.

Overview

This library provides tools to analyze data quality issues in time-series data, including:

  • Intravariable analysis of imperfection patterns for individual variables
  • Intervariable analysis of co-occurring imperfections across multiple parameters
  • Feature generation based on missingness patterns for downstream ML tasks

Installation

Install the library using pip:

pip install imperfekt

Quick Start

import polars as pl
from imperfekt import Imperfekt, FeatureGenerator

# Load your time-series data
df = pl.read_parquet("your_data.parquet")

# Run simple

# Configure Analyzer Setup
analyzer = Imperfekt(
    df=df,
    id_col="id",           # Unique identifier column
    clock_col="clock",     # Timestamp column
    cols=["var1", "var2"], # Variables to analyze
    save_path="./results"
)

# Simple intravariable missingness stats
analyzer.intravariable.column_statistics(save_results=True)
print(analyzer.intravariable.results.cs_overall_statistics)
print(analyzer.intravariable.results.cs_case_level_statistics)

# Run full imperfection analysis (preliminary correlations, intra- and intervariable analyses)
results = analyzer.run()

# Or generate missingness-aware features for ML
fg = FeatureGenerator(
    df=df,
    id_col="id",
    clock_col="clock",
    variable_cols=["var1", "var2"]
)
features_df = fg.add_binary_masks().add_temporal_features().df

Library Structure

imperfekt/
├── analysis/
│   ├── preliminary/     # Basic data exploration
│   ├── intravariable/      # Single variable analysis
│   ├── intervariable/    # Multi-variable patterns
│   └── utils/           # Shared utilities
├── features/            # Feature engineering
│   ├── core.py          # FeatureGenerator class
│   ├── temporal.py      # Time-based features
│   └── interaction.py   # Variable interactions
└── config/              # Default settings

Data Format

The library expects time-series data with the following structure:

Column Description
id Unique identifier for each time-series (e.g., patient, sensor)
clock Timestamp for each observation
var1, var2, ... Variables to analyze

Key Dependencies

  • polars: High-performance data processing
  • plotly: Interactive visualizations
  • scipy: Statistical computations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imperfekt-0.2.5.tar.gz (412.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imperfekt-0.2.5-py3-none-any.whl (433.5 kB view details)

Uploaded Python 3

File details

Details for the file imperfekt-0.2.5.tar.gz.

File metadata

  • Download URL: imperfekt-0.2.5.tar.gz
  • Upload date:
  • Size: 412.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.5.tar.gz
Algorithm Hash digest
SHA256 e106094f167e580c3f59e719e7287655d15bb5aa1c687c02c616cc6f68fcdc3d
MD5 e817613cf905ecd5178687949043ef88
BLAKE2b-256 284289bf82c321d8c84d480f6d899a4469debc93ad09e4f63756097d68369ab5

See more details on using hashes here.

File details

Details for the file imperfekt-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: imperfekt-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 433.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b3f53dff20c19253a37609f01a0cbd67d91dd1ccbb3827c95bc66c259902e4d7
MD5 d5710642c6c1cc3200b803cb6a5cdc17
BLAKE2b-256 40ec0e78b81f4d563a5c26680d9e77ad1ed896915ecaa2dc041556e593617c0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page