Hampel Filter with numba JIT parallelization
Project description
Hampel Filter
This project provides an efficient implementation of the Hampel filter for outlier detection, leveraging Numba’s JIT compilation to improve performance. It supports both serial and parallel computation modes for median and median absolute deviation (MAD) calculations, making it ideal for processing large datasets.
Installation
Make sure to have Numba, NumPy, and Pandas installed:
pip install numba numpy pandas
Usage
This package contains functions for outlier detection in time-series or sequence data using the Hampel filter. The main function hampel() identifies outliers by calculating the median and median absolute deviation (MAD) within a specified window size and compares it to a threshold to detect outliers.
Function: hampel
def hampel(arr, window_size=5, n=3, parallel=False, return_indices=True)
Parameters:
arr(np.ndarray,pd.Series, orpd.DataFrame): The input data array for outlier detection.window_size(int, default=5): The half-size of the moving window for median calculation.n(int, default=3): The threshold factor; outliers are values beyondntimes the MAD.parallel(bool, default=False): Whether to use parallel computation. WhenTrue, it leverages multi-core processing.return_indices(bool, default=True): IfTrue, returns the indices of outliers; otherwise, returns a boolean array indicating outliers.
Returns:
- If
return_indicesisTrue, returns a tuple of arrays with outlier indices. - If
return_indicesisFalse, returns a boolean array whereTrueindicates outliers.
Example
import numpy as np
from hampel_filter import hampel # Assuming the script is named hampel_filter.py
# Sample data with outliers
data = np.array([1, 1, 2, 2, 100, 2, 2, 1, 1])
# Detect outliers
outlier_indices = hampel(data, window_size=2, n=3, parallel=True)
print("Outlier indices:", outlier_indices)
Detailed Functions
The implementation includes the following helper functions:
calc_medians(window_size, arr, medians): Calculates the moving median over a specified window.calc_medians_std(window_size, arr, medians, medians_diff): Computes the median absolute deviation (MAD) for outlier detection.calc_medians_parallelandcalc_medians_std_parallel: Parallel versions of the above functions usingprangefor improved performance.
Performance Notes
By setting parallel=True, the Hampel filter calculation leverages Numba's parallel processing capabilities, significantly speeding up the computations on large arrays. However, performance gains depend on your system's CPU cores and load.
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hampel_filter-0.0.1.tar.gz.
File metadata
- Download URL: hampel_filter-0.0.1.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0108a3aa57624c5eff4924d0f416ae18ccde7eb2494e16315d39c8b668ac24c
|
|
| MD5 |
e41a7f606a5ae38087695f28747248ba
|
|
| BLAKE2b-256 |
f51548b99e49965eb0a7f2f371ac6de91fafddb20aeec2ed57adfbbc0cd0c8ac
|
File details
Details for the file hampel_filter-0.0.1-py3-none-any.whl.
File metadata
- Download URL: hampel_filter-0.0.1-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e518f8f5139d5ba9b7a54ea70ea2feda2c0c7437fc216aa9fcacd7c841fe2ea4
|
|
| MD5 |
12922ea501e0ddd482ed8d8e7d4d9d41
|
|
| BLAKE2b-256 |
8816ecd6d10d9540fa4853e327035f326f1a00530113fd653d787bce5fafb530
|