Anomaly Detetion
Project description
DATASOL-package
This package is made for Seagate. The package allows the user to detect anomalies in the data.
To Install the package:
pip install datasol
Creating_Dataframe function will clean the data from unneeded columns, can transfer categorical to numeric, fill NAs with knn imputer, and save the file afterward.
input- data frame (pandas), columns that you want to keep, and date column the rest are optional By setting the next options True: categorical_to_numeric is categorical columns to numeric.
fillna is filling the Na values with knn imputer.
save is saving the file to CSV.
output- None if save put on True else Data Frame
Load_Dataframe function will read the CSV file and return the data frame after cleaning.
input – file directory, columns that you want to keep, and date column.
output- Data Frame
Anomaly_Detetion function will create a list for each column with anomaly data.
input - data frameworks better if you use the output data frame from previous functions, time_step should be calculated by the user example: Prepare training data Get data values from the training time series data file and normalize the value data. We have a value for every 5 mins for 14 days.
24 * 60 / 5 = 288 timesteps per day 288 * 14 = 4032 data points in total
*The right amount of time steps will provide better results for anomaly detection. *
model settings- epochs, batch_size, patience, layer1-4
epochs -An epoch is when all the training data is used at once and is defined as the total number of iterations of all the training data in one cycle for training the machine learning model.
batch_size – the number of samples we want to pass into the training loop at each iteration.
patience -After many tries, the model will have to improve before early stopping.
layers 1-4 - in the base model there is 4 layer you can change how many filters each layer will have to improve performance.
The threshold is set to 99% in other words the 1% data that is problematic and has a high chance to be an anomaly.
visual if set to true will return a plot of the data and the anomaly.
the function will return the anomaly list.
Original data frame:
After Creating_Dataframe
output of Anomaly_Detetion
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasol-0.0.101.tar.gz.
File metadata
- Download URL: datasol-0.0.101.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10dabfff7ee5cd22a58dd04735d91f702b7e77dc69e3b004e1350a0d381e53bc
|
|
| MD5 |
1b6b2d58ec2d62da7e6b242482de183a
|
|
| BLAKE2b-256 |
11f75fee87cd2b80faec5eb4c00fdc19b04908daac3178429548aeffbedfe0f5
|
File details
Details for the file datasol-0.0.101-py3-none-any.whl.
File metadata
- Download URL: datasol-0.0.101-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa8f8b28be52cf5de4a2db93681aa6b3a8618a1df007391a37353f3984eec867
|
|
| MD5 |
be95cf6beb57d75646ec99f61029ded4
|
|
| BLAKE2b-256 |
d4f588afd6fc7082909fae689a9e8e23f3d4f5a7e2fb2fb2fa7b1f4e28cef7e8
|