Skip to main content

A framework to analyze imperfections (missingness, noise) in time-series datasets.

Project description

Imperfekt - Understanding Data Imperfections in Time-Series

PyPI version License: MIT Python 3.10+

A comprehensive analysis toolkit for studying "imperfect" data patterns in time-series datasets. Imperfection refers to missingness, noise, and other data quality issues that can be indicated using a binary mask.

Overview

This library provides tools to analyze data quality issues in time-series data, including:

  • Intravariable analysis of imperfection patterns for individual variables
  • Intervariable analysis of co-occurring imperfections across multiple parameters
  • Feature generation based on missingness patterns for downstream ML tasks

Installation

Install the library using pip:

pip install imperfekt

For additional features:

# With visualization export support
pip install imperfekt[viz]

# With statistical tests
pip install imperfekt[stats]

# Full installation
pip install imperfekt[full]

Quick Start

import polars as pl
from imperfekt import Imperfekt, FeatureGenerator

# Load your time-series data
df = pl.read_parquet("your_data.parquet")

# Run full imperfection analysis
analyzer = Imperfekt(
    df=df,
    id_col="id",           # Unique identifier column
    clock_col="clock",     # Timestamp column
    cols=["var1", "var2"], # Variables to analyze
    save_path="./results"
)
results = analyzer.run()

# Or generate missingness-aware features for ML
fg = FeatureGenerator(
    df=df,
    id_col="id",
    clock_col="clock",
    variable_cols=["var1", "var2"]
)
features_df = fg.add_binary_masks().add_temporal_features().df

Library Structure

imperfekt/
├── analysis/
│   ├── preliminary/     # Basic data exploration
│   ├── intravariable/      # Single variable analysis
│   ├── intervariable/    # Multi-variable patterns
│   └── utils/           # Shared utilities
├── features/            # Feature engineering
│   ├── core.py          # FeatureGenerator class
│   ├── temporal.py      # Time-based features
│   └── interaction.py   # Variable interactions
└── config/              # Default settings

Data Format

The library expects time-series data with the following structure:

Column Description
id Unique identifier for each time-series (e.g., patient, sensor)
clock Timestamp for each observation
var1, var2, ... Variables to analyze

Key Dependencies

  • polars: High-performance data processing
  • plotly: Interactive visualizations
  • scipy: Statistical computations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imperfekt-0.2.3.tar.gz (412.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imperfekt-0.2.3-py3-none-any.whl (434.0 kB view details)

Uploaded Python 3

File details

Details for the file imperfekt-0.2.3.tar.gz.

File metadata

  • Download URL: imperfekt-0.2.3.tar.gz
  • Upload date:
  • Size: 412.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.3.tar.gz
Algorithm Hash digest
SHA256 f02d8666e943a5dc04f0981e9a77eee38d634ccf41f33c41f0c9fa965f29b84a
MD5 7168fb0292b214f49e24e01403f81796
BLAKE2b-256 3ee837a376f7dc6514a8f1fb477ec306b412b98d5f16ffe7151add2767c06ae7

See more details on using hashes here.

File details

Details for the file imperfekt-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: imperfekt-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 434.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c2c4e7ac4c3afce1078b754b08ef6b0fbf381fd2977031a1ffdc8ee9f26dc0ae
MD5 156dbf4b646805326529d4c49d3fcaf6
BLAKE2b-256 887e28611491b4ed3d053185da73263ae9c3e235a2cb2109a4226f5e769c336b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page