Skip to main content

Hampel Filter with numba JIT parallelization

Project description

Hampel Filter

This project provides an efficient implementation of the Hampel filter for outlier detection, leveraging Numba’s JIT compilation to improve performance. It supports both serial and parallel computation modes for median and median absolute deviation (MAD) calculations, making it ideal for processing large datasets.

Installation

Make sure to have Numba, NumPy, and Pandas installed:

pip install numba numpy pandas

and then you can install the package

pip install hampel_filter

Usage

This package contains functions for outlier detection in time-series or sequence data using the Hampel filter. The main function hampel() identifies outliers by calculating the median and median absolute deviation (MAD) within a specified window size and compares it to a threshold to detect outliers.

Function: hampel

def hampel(arr, window_size=5, n=3, parallel=False, return_indices=True)

Parameters:

  • arr (np.ndarray, pd.Series, or pd.DataFrame): The input data array for outlier detection.
  • window_size (int, default=5): The half-size of the moving window for median calculation.
  • n (int, default=3): The threshold factor; outliers are values beyond n times the MAD.
  • parallel (bool, default=False): Whether to use parallel computation. When True, it leverages multi-core processing.
  • return_indices (bool, default=True): If True, returns the indices of outliers; otherwise, returns a boolean array indicating outliers.

Returns:

  • If return_indices is True, returns a tuple of arrays with outlier indices.
  • If return_indices is False, returns a boolean array where True indicates outliers.

Example

import numpy as np
from hampel_filter import hampel  # Assuming the script is named hampel_filter.py

# Sample data with outliers
data = np.array([1, 1, 2, 2, 100, 2, 2, 1, 1])

# Detect outliers
outlier_indices = hampel(data, window_size=2, n=3, parallel=True)

print("Outlier indices:", outlier_indices)

Detailed Functions

The implementation includes the following helper functions:

  • calc_medians(window_size, arr, medians): Calculates the moving median over a specified window.
  • calc_medians_std(window_size, arr, medians, medians_diff): Computes the median absolute deviation (MAD) for outlier detection.
  • calc_medians_parallel and calc_medians_std_parallel: Parallel versions of the above functions using prange for improved performance.

Performance Notes

By setting parallel=True, the Hampel filter calculation leverages Numba's parallel processing capabilities, significantly speeding up the computations on large arrays. However, performance gains depend on your system's CPU cores and load.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hampel_filter-0.0.3.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hampel_filter-0.0.3-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file hampel_filter-0.0.3.tar.gz.

File metadata

  • Download URL: hampel_filter-0.0.3.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for hampel_filter-0.0.3.tar.gz
Algorithm Hash digest
SHA256 674d34b547142e02e3a8617ec7f4a8fdb3014ec9d57d160eed9cfc19c3afa059
MD5 b518a9aced9f77a39b0dd1166b2287b7
BLAKE2b-256 16989550ef063c5ec9e91cf26c90d1af1418fd652ea5a67194f08f1eb9aabd2a

See more details on using hashes here.

File details

Details for the file hampel_filter-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: hampel_filter-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for hampel_filter-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 614e1641f8d2c1b9faf59e76d46cfaf1a5d645e472c57707465c99c9e56934f0
MD5 4c757a9f1e110f21fed39dba105b21f2
BLAKE2b-256 3071ded54393183de1569450f3e782e12268a620a8f419a30514426d508062cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page