Skip to main content

A framework to analyze imperfections (missingness, noise) in time-series datasets.

Project description

Imperfekt - Understanding Data Imperfections in Time-Series

PyPI version License: MIT Python 3.10+

A comprehensive analysis toolkit for studying "imperfect" data patterns in time-series datasets. Imperfection refers to missingness, noise, and other data quality issues that can be indicated using a binary mask.

Overview

This library provides tools to analyze data quality issues in time-series data, including:

  • Intravariable analysis of imperfection patterns for individual variables
  • Intervariable analysis of co-occurring imperfections across multiple parameters
  • Feature generation based on missingness patterns for downstream ML tasks

Installation

Install the library using pip:

pip install imperfekt

Quick Start

import polars as pl
from imperfekt import Imperfekt, FeatureGenerator

# Load your time-series data
df = pl.read_parquet("your_data.parquet")

# Run simple

# Configure Analyzer Setup
analyzer = Imperfekt(
    df=df,
    id_col="id",           # Unique identifier column
    clock_col="clock",     # Timestamp column
    cols=["var1", "var2"], # Variables to analyze
    save_path="./results"
)

# Simple intravariable missingness stats
analyzer.intravariable.column_statistics(save_results=True)
print(analyzer.intravariable.results.cs_overall_statistics)
print(analyzer.intravariable.results.cs_case_level_statistics)

# Run full imperfection analysis (preliminary correlations, intra- and intervariable analyses)
results = analyzer.run()

# Or generate missingness-aware features for ML
fg = FeatureGenerator(
    df=df,
    id_col="id",
    clock_col="clock",
    variable_cols=["var1", "var2"]
)
features_df = fg.add_binary_masks().add_temporal_features().df

Library Structure

imperfekt/
├── analysis/
│   ├── preliminary/     # Basic data exploration
│   ├── intravariable/      # Single variable analysis
│   ├── intervariable/    # Multi-variable patterns
│   └── utils/           # Shared utilities
├── features/            # Feature engineering
│   ├── core.py          # FeatureGenerator class
│   ├── temporal.py      # Time-based features
│   └── interaction.py   # Variable interactions
└── config/              # Default settings

Data Format

The library expects time-series data with the following structure:

Column Description
id Unique identifier for each time-series (e.g., patient, sensor)
clock Timestamp for each observation
var1, var2, ... Variables to analyze

Key Dependencies

  • polars: High-performance data processing
  • plotly: Interactive visualizations
  • scipy: Statistical computations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imperfekt-0.2.4.tar.gz (412.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imperfekt-0.2.4-py3-none-any.whl (433.5 kB view details)

Uploaded Python 3

File details

Details for the file imperfekt-0.2.4.tar.gz.

File metadata

  • Download URL: imperfekt-0.2.4.tar.gz
  • Upload date:
  • Size: 412.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.4.tar.gz
Algorithm Hash digest
SHA256 169a6b52193d4b5b9de2340fa4efa2cb27a14f9c064cdf824ec0c3b4333b2643
MD5 6ccd3baf847994c366bf1dd20898d1f3
BLAKE2b-256 63252de0c7ce233a9142a1daee9b209d5209f17592f3938af84fd48800df3181

See more details on using hashes here.

File details

Details for the file imperfekt-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: imperfekt-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 433.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for imperfekt-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8ba04b33b56822967f7eb3a2a67268c73dba5f5c35fd5a9796ff480681632cd7
MD5 5baf1347a3bc1c99b0afb9a18ac83b87
BLAKE2b-256 dffdf1b1b9ad56d4659e8a6e3899472d1606f9ac98297ad79ed29673356c4ee6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page