Skip to main content

Hampel Filter with numba JIT parallelization

Project description

Hampel Filter

This project provides an efficient implementation of the Hampel filter for outlier detection, leveraging Numba’s JIT compilation to improve performance. It supports both serial and parallel computation modes for median and median absolute deviation (MAD) calculations, making it ideal for processing large datasets.

Installation

Make sure to have Numba, NumPy, and Pandas installed:

pip install numba numpy pandas

and then you can install the package

pip install hampel_filter

Usage

This package contains functions for outlier detection in time-series or sequence data using the Hampel filter. The main function hampel() identifies outliers by calculating the median and median absolute deviation (MAD) within a specified window size and compares it to a threshold to detect outliers.

Function: hampel

def hampel(arr, window_size=5, n=3, parallel=False, return_indices=True)

Parameters:

  • arr (np.ndarray, pd.Series, or pd.DataFrame): The input data array for outlier detection.
  • window_size (int, default=5): The half-size of the moving window for median calculation.
  • n (int, default=3): The threshold factor; outliers are values beyond n times the MAD.
  • parallel (bool, default=False): Whether to use parallel computation. When True, it leverages multi-core processing.
  • return_indices (bool, default=True): If True, returns the indices of outliers; otherwise, returns a boolean array indicating outliers.

Returns:

  • If return_indices is True, returns a tuple of arrays with outlier indices.
  • If return_indices is False, returns a boolean array where True indicates outliers.

Example

import numpy as np
from hampel_filter import hampel  # Assuming the script is named hampel_filter.py

# Sample data with outliers
data = np.array([1, 1, 2, 2, 100, 2, 2, 1, 1])

# Detect outliers
outlier_indices = hampel(data, window_size=2, n=3, parallel=True)

print("Outlier indices:", outlier_indices)

Detailed Functions

The implementation includes the following helper functions:

  • calc_medians(window_size, arr, medians): Calculates the moving median over a specified window.
  • calc_medians_std(window_size, arr, medians, medians_diff): Computes the median absolute deviation (MAD) for outlier detection.
  • calc_medians_parallel and calc_medians_std_parallel: Parallel versions of the above functions using prange for improved performance.

Performance Notes

By setting parallel=True, the Hampel filter calculation leverages Numba's parallel processing capabilities, significantly speeding up the computations on large arrays. However, performance gains depend on your system's CPU cores and load.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hampel_filter-0.0.2.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hampel_filter-0.0.2-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file hampel_filter-0.0.2.tar.gz.

File metadata

  • Download URL: hampel_filter-0.0.2.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for hampel_filter-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d8fc6a8e56e35f447f2dd8825f07e50604956a581156e808dff3a794fbd4ea9c
MD5 cce7db13f936b3f48bbaebff9d39e57b
BLAKE2b-256 0a6a5687948c4d1f9cabef302b849719cab77c30e530893a38ff303ac13a2d84

See more details on using hashes here.

File details

Details for the file hampel_filter-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: hampel_filter-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for hampel_filter-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dd16a6f17fa50dee1e5f31235a666929804fc127774a81da47140c84949cbaa4
MD5 f863979b6955089819d0edda9bf0be8d
BLAKE2b-256 b7ffb2b3a7495579511de7f52f03b389c09e540e06d7ccf58ae4deefd659f584

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page