Skip to main content

Python implementation of the Hampel Filter

Project description

Hampel

Warning: Starting from version 1.0.1, there are breaking changes in this library. Please review the Usage section for details and update your code accordingly.

The Hampel filter is generally used to detect anomalies in data with a timeseries structure. It basically consists of a sliding window of a parameterizable size. For each window, each observation will be compared with the Median Absolute Deviation (MAD). The observation will be considered an outlier in the case in which it exceeds the MAD by n times (the parameter n is also parameterizable).

For more details, see the Related Links section.

usage_visualization.png

Table of Contents

Installation

To use the Hampel filter in your Python project, you can install it via pip:

pip install hampel

Usage

Here's a simple example of how to use the Hampel filter:

import pandas as pd
from hampel import hampel

# Sample data as a pandas.Series
data = pd.Series([1.0, 2.0, 3.0, 100.0, 4.0, 5.0, 6.0])

# Apply the Hampel filter
result = hampel(data, window_size=3, n_sigma=3.0)

print(result.filtered_data)

When you apply the Hampel filter, it returns a Result object with the following attributes:

  • filtered_data: The data with outliers replaced.

  • outlier_indices: Indices of the detected outliers.

  • medians: Median values within the sliding window.

  • median_absolute_deviations: Median Absolute Deviation (MAD) values within the sliding window.

  • thresholds: Threshold values for outlier detection.

You can access these attributes as follows:

result = hampel(data, window_size=3, n_sigma=3.0)

filtered_data = result.filtered_data
outlier_indices = result.outlier_indices
medians = result.medians
mad_values = result.median_absolute_deviations
thresholds = result.thresholds

If you want to directly apply hampel filter to multiple columns in a pandas.Dataframe, follow this code:

import pandas as pd
from hampel import hampel

df = pd.DataFrame({
    'A': [1.0, 2.0, 3.0, 100.0, 4.0, 5.0, 6.0],
    'B': [7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0]
})

# We are just getting the filtered data in this case
filtered_df = df.apply(lambda x: hampel(x).filtered_data, axis=0)

print(df)

Parameters

  • data: The input 1-dimensional data to be filtered (pandas.Series or numpy.ndarray).
  • window_size (optional): The size of the moving window for outlier detection (default is 5).
  • n_sigma (optional): The number of standard deviations for outlier detection (default is 3.0).

Testing

If you want to run the tests, simple run:

make test

License

See the LICENSE file for details.

Contributing

Contributions are welcome! Feel free to open issues, submit pull requests, or suggest improvements.

Related Links

https://medium.com/wwblog/clean-up-your-time-series-data-with-a-hampel-filter-58b0bb3ebb04

https://en.wikipedia.org/wiki/Median_absolute_deviation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hampel-1.0.2.tar.gz (78.7 kB view details)

Uploaded Source

File details

Details for the file hampel-1.0.2.tar.gz.

File metadata

  • Download URL: hampel-1.0.2.tar.gz
  • Upload date:
  • Size: 78.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for hampel-1.0.2.tar.gz
Algorithm Hash digest
SHA256 08d5646b40c9f68190820d04e3a0cca37267f46731048683f68c5b932761075d
MD5 67f448bbddd924b52ebd9f025d88ff68
BLAKE2b-256 390e49c5d1bd48e5e26c8ba0564a9ebe55a655f1c2b3acecd50bbd49671083a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page