A Python package to detect anomalies using Z-Scores
Project description
Zscore Anomaly Detector
** Zscore Anomaly Detector** is a Python package designed to detect anomalies in numerical datasets using Z-Score analysis. This package identifies outliers by calculating the Z-Score for each numerical column and flags data points that deviate significantly from the mean. The package can handle mixed datasets containing numerical, categorical, and object types.
Installation
You can install the package using pip. Run the following command:
pip install zscore-anomaly-detector
## Usage
Here is an example of how to use the ZScore Anomaly Detector package with a realistic dataset containing both numerical and categorical columns:
import pandas as pd
from zscore_anomaly.zscore_anomaly_detector import ZScoreAnomalyDetector
## Sample Dataset
data = pd.DataFrame({
'Age': [25, 32, 47, 51, 62, 35, 27, 100, 29, 38], # Numeric
'Salary': [50000, 54000, 58000, 62000, 65000, 52000, 51000, 200000, 53000, 56000], # Numeric
'Department': ['HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR', 'IT', 'Finance', 'HR'], # Categorical
'Has_Debt': [True, False, True, True, False, False, True, True, False, True], # Boolean
'City': ['New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco', 'Los Angeles', 'New York'], # Object
})
#### Initialize the ZscoreAnomalyDetector
detector = ZScoreAnomalyDetector(threshold=2) # The user can specify the threshold value for detecting anomalies. By default, the threshold is set to 3 if not provided.
#### Create a DataFrame that Includes Anomalies Marked
df_with_anomalies = detector.create_dataframe_with_anomalies(data)
#### Style the DataFrame to Highlight Anomalies in Red
styled_df = detector.style_dataframe(df_with_anomalies)
#### Display the Styled DataFrame
styled_df
Explanation
Age and Salary are numeric columns where Z-Scores will be calculated to detect anomalies.
Department is a categorical column, and City is an object column. These will not be included in Z-Score calculations, but they remain in the dataset.
Has_Debt is a boolean column.
This example shows how to detect anomalies in the numeric columns (Age and Salary) while leaving the non-numeric columns intact.
Output
After running the above code, the DataFrame will display anomalies detected in the numeric columns. Anomalies will be highlighted in red if used in a Jupyter notebook or a similar environment that supports DataFrame styling.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for zscore_anomaly_detector-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3aad3ab42d9eb3a4f3133d4bfc2db6d137c7d805a4b055816cc81b206139d3f4 |
|
MD5 | 84db3f8dcffd261a80400f828e439421 |
|
BLAKE2b-256 | 664a329d6ff798a7df4571a158432106688fa1fabdf31a15baada5b5259c2c43 |
Close
Hashes for zscore_anomaly_detector-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54c5c6af162fbf286d63c2ca8aba70d32ab0a23f4e5ea9f7b50999c68e1e61a9 |
|
MD5 | 57b6fa66cd946b65ffa544ff84203a73 |
|
BLAKE2b-256 | 4e96daea2f36ea165f332add7daedc9ecce8975ed86b403e533b9794b5a13d74 |