Skip to main content

Hampel Filter with numba JIT parallelization

Project description

Hampel Filter

This project provides an efficient implementation of the Hampel filter for outlier detection, leveraging Numba’s JIT compilation to improve performance. It supports both serial and parallel computation modes for median and median absolute deviation (MAD) calculations, making it ideal for processing large datasets.

Installation

Make sure to have Numba, NumPy, and Pandas installed:

pip install numba numpy pandas

Usage

This package contains functions for outlier detection in time-series or sequence data using the Hampel filter. The main function hampel() identifies outliers by calculating the median and median absolute deviation (MAD) within a specified window size and compares it to a threshold to detect outliers.

Function: hampel

def hampel(arr, window_size=5, n=3, parallel=False, return_indices=True)

Parameters:

  • arr (np.ndarray, pd.Series, or pd.DataFrame): The input data array for outlier detection.
  • window_size (int, default=5): The half-size of the moving window for median calculation.
  • n (int, default=3): The threshold factor; outliers are values beyond n times the MAD.
  • parallel (bool, default=False): Whether to use parallel computation. When True, it leverages multi-core processing.
  • return_indices (bool, default=True): If True, returns the indices of outliers; otherwise, returns a boolean array indicating outliers.

Returns:

  • If return_indices is True, returns a tuple of arrays with outlier indices.
  • If return_indices is False, returns a boolean array where True indicates outliers.

Example

import numpy as np
from hampel_filter import hampel  # Assuming the script is named hampel_filter.py

# Sample data with outliers
data = np.array([1, 1, 2, 2, 100, 2, 2, 1, 1])

# Detect outliers
outlier_indices = hampel(data, window_size=2, n=3, parallel=True)

print("Outlier indices:", outlier_indices)

Detailed Functions

The implementation includes the following helper functions:

  • calc_medians(window_size, arr, medians): Calculates the moving median over a specified window.
  • calc_medians_std(window_size, arr, medians, medians_diff): Computes the median absolute deviation (MAD) for outlier detection.
  • calc_medians_parallel and calc_medians_std_parallel: Parallel versions of the above functions using prange for improved performance.

Performance Notes

By setting parallel=True, the Hampel filter calculation leverages Numba's parallel processing capabilities, significantly speeding up the computations on large arrays. However, performance gains depend on your system's CPU cores and load.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hampel_filter-0.0.1.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hampel_filter-0.0.1-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file hampel_filter-0.0.1.tar.gz.

File metadata

  • Download URL: hampel_filter-0.0.1.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for hampel_filter-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d0108a3aa57624c5eff4924d0f416ae18ccde7eb2494e16315d39c8b668ac24c
MD5 e41a7f606a5ae38087695f28747248ba
BLAKE2b-256 f51548b99e49965eb0a7f2f371ac6de91fafddb20aeec2ed57adfbbc0cd0c8ac

See more details on using hashes here.

File details

Details for the file hampel_filter-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: hampel_filter-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for hampel_filter-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e518f8f5139d5ba9b7a54ea70ea2feda2c0c7437fc216aa9fcacd7c841fe2ea4
MD5 12922ea501e0ddd482ed8d8e7d4d9d41
BLAKE2b-256 8816ecd6d10d9540fa4853e327035f326f1a00530113fd653d787bce5fafb530

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page