Skip to main content

A Python package to detect anomalies using Z-Scores

Project description

Zscore Anomaly Detector

** Zscore Anomaly Detector** is a Python package designed to detect anomalies in numerical datasets using Z-Score analysis. This package identifies outliers by calculating the Z-Score for each numerical column and flags data points that deviate significantly from the mean. The package can handle mixed datasets containing numerical, categorical, and object types.

Installation

You can install the package using pip. Run the following command:

pip install zscore-anomaly-detector


## Usage
Here is an example of how to use the ZScore Anomaly Detector package with a realistic dataset containing both numerical and categorical columns:

import pandas as pd

from zscore_anomaly.zscore_anomaly_detector import ZScoreAnomalyDetector

## Sample Dataset

data = pd.DataFrame({
    'Age': [25, 32, 47, 51, 62, 35, 27, 100, 29, 38],  # Numeric
    'Salary': [50000, 54000, 58000, 62000, 65000, 52000, 51000, 200000, 53000, 56000],  # Numeric
    'Department': ['HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR'],  # Categorical
    'Has_Debt': [True, False, True, True, False, False, True, True, False, True],  # Boolean
    'City': ['New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco', 'Los Angeles', 'New York'],  # Object
})


#### Initialize the ZscoreAnomalyDetector

detector = ZScoreAnomalyDetector(threshold=2)  # The user can specify the threshold value for detecting anomalies. By default, the threshold is set to 3 if not provided.



#### Create a DataFrame that Includes Anomalies Marked

df_with_anomalies = detector.create_dataframe_with_anomalies(data)


#### Style the DataFrame to Highlight Anomalies in Red

styled_df = detector.style_dataframe(df_with_anomalies)


#### Display the Styled DataFrame

styled_df


Explanation
Age and Salary are numeric columns where Z-Scores will be calculated to detect anomalies.

Department is a categorical column, and City is an object column. These will not be included in Z-Score calculations, but they remain in the dataset.

Has_Debt is a boolean column.

This example shows how to detect anomalies in the numeric columns (Age and Salary) while leaving the non-numeric columns intact.

Output

After running the above code, the DataFrame will display anomalies detected in the numeric columns. Anomalies will be highlighted in red if used in a Jupyter notebook or a similar environment that supports DataFrame styling.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zscore_anomaly_detector-0.1.2.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zscore_anomaly_detector-0.1.2-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file zscore_anomaly_detector-0.1.2.tar.gz.

File metadata

  • Download URL: zscore_anomaly_detector-0.1.2.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for zscore_anomaly_detector-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3aad3ab42d9eb3a4f3133d4bfc2db6d137c7d805a4b055816cc81b206139d3f4
MD5 84db3f8dcffd261a80400f828e439421
BLAKE2b-256 664a329d6ff798a7df4571a158432106688fa1fabdf31a15baada5b5259c2c43

See more details on using hashes here.

File details

Details for the file zscore_anomaly_detector-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for zscore_anomaly_detector-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 54c5c6af162fbf286d63c2ca8aba70d32ab0a23f4e5ea9f7b50999c68e1e61a9
MD5 57b6fa66cd946b65ffa544ff84203a73
BLAKE2b-256 4e96daea2f36ea165f332add7daedc9ecce8975ed86b403e533b9794b5a13d74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page