Skip to main content

A Python toolkit for introducing missing values into datasets

Project description

Welcome to PyGrinder

a Python toolkit for grinding data beans into the incomplete

Python version the latest release version BSD-3 license Community GitHub contributors GitHub Repo stars GitHub Repo forks Code Climate maintainability Coveralls report GitHub Testing arXiv DOI Conda downloads PyPI downloads

PyGrinder is a part of PyPOTS (a Python toolbox for data mining on Partially-Observed Time Series), was called PyCorruptor and separated from PyPOTS for decoupling missingness-creating functionalities from learning algorithms.

In data analysis and modeling, sometimes we may need to corrupt the original data to achieve our goal, for instance, evaluating models' ability to reconstruct corrupted data or assessing the model's performance on only partially-observed data. PyGrinder is such a tool to help you corrupt your data, which provides several patterns to create missing values in the given data.

❖ Usage Examples

PyGrinder now is available on ❗️

Install it with conda install pygrinder, you may need to specify the channel with option -c conda-forge

or install via PyPI:

pip install pygrinder

or install from source code:

pip install https://github.com/WenjieDu/PyGrinder/archive/main.zip

import numpy as np

from pygrinder import (
    mcar,
    mar_logistic,
    mnar_x,
    mnar_t,
    mnar_nonuniform,
    rdo,
    seq_missing,
    block_missing,
    calc_missing_rate
)

# given a time-series dataset with 128 samples, each sample with 10 time steps and 36 data features
ts_dataset = np.random.randn(128, 10, 36)

# grind the dataset with MCAR pattern, 10% missing probability, and using 0 to fill missing values
X_with_mcar_data = mcar(ts_dataset, p=0.1)

# grind the dataset with MAR pattern
X_with_mar_data = mar_logistic(ts_dataset[:, 0, :], obs_rate=0.1, missing_rate=0.1)

# grind the dataset with MNAR pattern
X_with_mnar_x_data = mnar_x(ts_dataset, offset=0.1)
X_with_mnar_t_data = mnar_t(ts_dataset, cycle=20, pos=10, scale=3)
X_with_mnar_nonuniform_data = mnar_nonuniform(ts_dataset, p=0.5, increase_factor=0.5)

# grind the dataset with RDO pattern
X_with_rdo_data = rdo(ts_dataset, p=0.1)

# grind the dataset with Sequence-Missing pattern
X_with_seq_missing_data = seq_missing(ts_dataset, p=0.1, seq_len=5)

# grind the dataset with Block-Missing pattern
X_with_block_missing_data = block_missing(ts_dataset, factor=0.1, block_width=3, block_len=3)

# calculate the missing rate of the dataset
missing_rate = calc_missing_rate(X_with_mcar_data)

❖ Citing PyGrinder/PyPOTS

The paper introducing PyPOTS is available on arXiv, A short version of it is accepted by the 9th SIGKDD international workshop on Mining and Learning from Time Series (MiLeTS'23)). Additionally, PyPOTS has been included as a PyTorch Ecosystem project. We are pursuing to publish it in prestigious academic venues, e.g. JMLR (track for Machine Learning Open Source Software). If you use PyPOTS in your work, please cite it as below and 🌟star this repository to make others notice this library. 🤗

There are scientific research projects using PyPOTS and referencing in their papers. Here is an incomplete list of them.

@article{du2023pypots,
title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
author={Wenjie Du},
journal={arXiv preprint arXiv:2305.18811},
year={2023},
}

or

Wenjie Du. PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series. arXiv, abs/2305.18811, 2023.

🏠 Visits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygrinder-0.7.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygrinder-0.7-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file pygrinder-0.7.tar.gz.

File metadata

  • Download URL: pygrinder-0.7.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.11

File hashes

Hashes for pygrinder-0.7.tar.gz
Algorithm Hash digest
SHA256 d227e9d71ce75099c59857706f97d0d53d778bd4c8fb52c577b3f7dd33a872bc
MD5 50d25295311844b585868695c08f1788
BLAKE2b-256 1bc063f297e12dce134397efe319a014c3c39c16c1a1cc3c221ebeadf7dc4157

See more details on using hashes here.

File details

Details for the file pygrinder-0.7-py3-none-any.whl.

File metadata

  • Download URL: pygrinder-0.7-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.11

File hashes

Hashes for pygrinder-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 aef8ccf6ccd5f537ca76078230cf8c22303c6af0379f5b37ba6ce68c7c75dbd0
MD5 a0f9222ed0a35d777af6b27783c9cf29
BLAKE2b-256 7c5e0d975d4528cce9b56f256389311b0f0a8e45bc59c597283a387ca3654921

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page