Skip to main content

Anomaly Detetion

Project description

DATASOL-package

This package is made for Seagate. The package allows the user to detect anomalies in the data.

To Install the package: pip install datasol

Creating_Dataframe function will clean the data from unneeded columns, can transfer categorical to numeric, fill NAs with knn imputer, and save the file afterward.

input- data frame (pandas), columns that you want to keep, and date column the rest are optional By setting the next options True: categorical_to_numeric is categorical columns to numeric.

fillna is filling the Na values with knn imputer.

save is saving the file to CSV.

output- None if save put on True else Data Frame

Load_Dataframe function will read the CSV file and return the data frame after cleaning.

input – file directory, columns that you want to keep, and date column.

output- Data Frame

Anomaly_Detetion function will create a list for each column with anomaly data.

input - data frameworks better if you use the output data frame from previous functions, time_step should be calculated by the user example: Prepare training data Get data values from the training time series data file and normalize the value data. We have a value for every 5 mins for 14 days.

24 * 60 / 5 = 288 timesteps per day 288 * 14 = 4032 data points in total

*The right amount of time steps will provide better results for anomaly detection. *

model settings- epochs, batch_size, patience, layer1-4

epochs -An epoch is when all the training data is used at once and is defined as the total number of iterations of all the training data in one cycle for training the machine learning model.

batch_size – the number of samples we want to pass into the training loop at each iteration.

patience -After many tries, the model will have to improve before early stopping.

layers 1-4 - in the base model there is 4 layer you can change how many filters each layer will have to improve performance.

The threshold is set to 99% in other words the 1% data that is problematic and has a high chance to be an anomaly.

visual if set to true will return a plot of the data and the anomaly.

the function will return the anomaly list.

Original data frame:

image

After Creating_Dataframe

image

output of Anomaly_Detetion

image

outputlab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasol-0.0.101.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasol-0.0.101-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file datasol-0.0.101.tar.gz.

File metadata

  • Download URL: datasol-0.0.101.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.2

File hashes

Hashes for datasol-0.0.101.tar.gz
Algorithm Hash digest
SHA256 10dabfff7ee5cd22a58dd04735d91f702b7e77dc69e3b004e1350a0d381e53bc
MD5 1b6b2d58ec2d62da7e6b242482de183a
BLAKE2b-256 11f75fee87cd2b80faec5eb4c00fdc19b04908daac3178429548aeffbedfe0f5

See more details on using hashes here.

File details

Details for the file datasol-0.0.101-py3-none-any.whl.

File metadata

  • Download URL: datasol-0.0.101-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.2

File hashes

Hashes for datasol-0.0.101-py3-none-any.whl
Algorithm Hash digest
SHA256 fa8f8b28be52cf5de4a2db93681aa6b3a8618a1df007391a37353f3984eec867
MD5 be95cf6beb57d75646ec99f61029ded4
BLAKE2b-256 d4f588afd6fc7082909fae689a9e8e23f3d4f5a7e2fb2fb2fa7b1f4e28cef7e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page