A framework to analyze imperfections (missingness, noise) in time-series datasets.
Project description
Imperfekt - Understanding Data Imperfections in Time-Series
A comprehensive analysis toolkit for studying "imperfect" data patterns in time-series datasets. Imperfection refers to missingness, noise, and other data quality issues that can be indicated using a binary mask.
Overview
This library provides tools to analyze data quality issues in time-series data, including:
- Intravariable analysis of imperfection patterns for individual variables
- Intervariable analysis of co-occurring imperfections across multiple parameters
- Feature generation based on missingness patterns for downstream ML tasks
Installation
Install the library using pip:
pip install imperfekt
Note: If you export Plotly figures as static images (
save_results=True), some environments may raise aplotly_get_chrome/Kaleido error at runtime. This happens because Kaleido needs a Chrome/Chromium binary. Install Chrome manually, or run:plotly_get_chrome
Quick Start
import polars as pl
from imperfekt import Imperfekt, FeatureGenerator
# Load your time-series data
df = pl.read_parquet("your_data.parquet")
# Run simple
# Configure Analyzer Setup
analyzer = Imperfekt(
df=df,
id_col="id", # Unique identifier column
clock_col="clock", # Timestamp column
cols=["var1", "var2"], # Variables to analyze
save_path="./results"
)
# Simple intravariable missingness stats
analyzer.intravariable.column_statistics(save_results=True)
print(analyzer.intravariable.results.cs_overall_statistics)
print(analyzer.intravariable.results.cs_case_level_statistics)
# Run full imperfection analysis (preliminary correlations, intra- and intervariable analyses)
results = analyzer.run()
# Or generate missingness-aware features for ML
fg = FeatureGenerator(
df=df,
id_col="id",
clock_col="clock",
variable_cols=["var1", "var2"]
)
features_df = fg.add_binary_masks().add_temporal_features().df
# Or restrict individual steps to a subset of variables
features_df = (
fg.add_binary_masks(cols=["var1"])
.add_temporal_features(cols=["var1"])
.add_window_features(rolling_window_sizes=[2], ewma_alphas=[0.3], cols=["var1", "var2"])
.df
)
Library Structure
imperfekt/
├── analysis/
│ ├── preliminary/ # Basic data exploration
│ ├── intravariable/ # Single variable analysis
│ ├── intervariable/ # Multi-variable patterns
│ └── utils/ # Shared utilities
├── features/ # Feature engineering
│ ├── core.py # FeatureGenerator class
│ ├── temporal.py # Time-based features
│ └── interaction.py # Variable interactions
└── config/ # Default settings
Data Format
The library expects time-series data with the following structure:
| Column | Description |
|---|---|
id |
Unique identifier for each time-series (e.g., patient, sensor) |
clock |
Timestamp for each observation |
var1, var2, ... |
Variables to analyze |
Key Dependencies
- polars: High-performance data processing
- plotly: Interactive visualizations
- scipy: Statistical computations
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imperfekt-0.3.0.tar.gz.
File metadata
- Download URL: imperfekt-0.3.0.tar.gz
- Upload date:
- Size: 327.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aefadfc385c845ae7f8c1018f9bb3776c0b9428ff629440dfc5ecf5d68e9a5cc
|
|
| MD5 |
42db480f8601f0aee5fece95031ec75e
|
|
| BLAKE2b-256 |
f6871c2feba9e945c640dde88d7a56190213227c695eb39e3a5c79a5d2ee32ca
|
File details
Details for the file imperfekt-0.3.0-py3-none-any.whl.
File metadata
- Download URL: imperfekt-0.3.0-py3-none-any.whl
- Upload date:
- Size: 352.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dd6ba101d9176082794dca732b4cc6e3eef05bf4a801ba0f9f41ddea13351b2
|
|
| MD5 |
1f26b1064ce31183d9daee7175652f77
|
|
| BLAKE2b-256 |
689bf1d0051cf41dd86b15cccec0835f32639ab5e1813bb4ce4391ecea6ee675
|