Skip to main content

For Clustering, ploting and Perfomence matrics

Project description


Efficient Method for Optimizing Anomaly Detection with Clustering Algorithms
and for Unifiying in a Package

To create a common platform for anomaly detection process with some popular clustering algorithms to be an easy solution for data analysis to verify the process data with other clustering algorithms.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

The world of data is growing very fast, and it is a new challenge for data analysis to develop new methods to handle this massive amount of data. A large number of data have many hidden factors that need to be identified and used for different algorithms. Clustering is one of the significant parts of data mining. The term comes from the idea of classifying unsupervised data. Now-a-days a lot of algorithms are implemented. Besides that, all those algorithms have some limitations, creating an opportunity to innovate new
algorithms for clustering. The clustering process can be separated in six different ways: partitioning, hierarchical, density, gridmodel, and constraint-based models. The aim of the package is to implement various types of clustering algorithms and helps to determine which one is more accurate on detecting impure data from a large data set. To create a common platform for Some popluar algorithms for anomaly detection are implemented and converged all of them into a package(AnDe). The algorithms which are implemented and combined into the package are: K-means, DBSCAN, HDBSCAN, Isolation Forest, Local Outlier Factor and Agglomerative Hierarchical Clustering. The package reduce the consumption of time by compressing implementation hurdles of each algorithms. The package is also makes the anomaly detection procedure more robust by visualizing in a more precise way along with visualization of comparison in performance(accuracy, runtime and memory consumption) of those algorithm
implemented.

Built With

For using this package, some popular packages are need to be configured in the working environment.

(back to top)

Getting Started

This is an example of how you set up thie pacage and use in you script.

Prerequisites

At first, need install the package in your working environment for using this package.

pip install python=3.8
pip install numpy
pip install pandas
pip install matplotlib
pip install time
pip install os
pip install sklearn
  pip install Hdbscan
pip install Tracemalloc

Installation

  1. Download the package from (https://github.com/cbiswascse/AUnifiedPackageForAnomalyDetection)
  2. Install the package in you environment.
    pip install cb-cluster
    
  3. Import the pacage in your script.
    from EMOADCAUP import Cluster
    

(back to top)

Usage

  1. Call the cluster function.
    from ande import ande 
    ande.ClusterView()
    
  2. Input the Location of CSV file.
    Please, Input the Location of CSV:
    
  3. Select yes(y) If you have Catagorical data in your dataset.
    Do you want to include Catagorical data [y/n]:
    
  4. Select yes(y) If you want to scaling your dataset with MinMaxScaler.
    Scaling data with MinMaxScaler [y/n]:
    
  5. Available Clusering Algorithm Kmeans Dbscan Isolation Forest Local Factor Outlier Hdbscan Agglomerative
    Choose your Algorithm:
    
  6. Kmeans Clusering: Number of Cluster
   How many clusters you want?:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary
  1. Dbscan: Input a Epsilon value
    epsilon in Decimal:
    
  2. Input a Min Samples value
   Min Samples In Integer:
  1. Select one of Average Method for Performance Metrics
 weighted,micro,macro,binary

11.Hdbscan: Minimum size of cluster

	Minimun size of clusters you want?:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary

13.Isolation Forest: Contamination value

   Contamination value between [0,0.5]:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary
  1. Local Outlier Factor: Contamination value
   Contamination value between [0,0.5]:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary

17.Agglomerative: Number of Cluster

   How many clusters you want?:

18.Select one of Average Method for Performance Metrics

   weighted,micro,macro,binary

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Chandrima Biswas - cbiswascse26@gmail.com

Project Link: https://github.com/cbiswascse/AUnifiedPackageForAnomalyDetection

(back to top)

Acknowledgments

I would like to convey my heartfelt appreciation to my supervisor Prof.Dr. Doina Logofatu,for all her feedback, guidance, and evaluations during the work. Without her unique ideas, as well as her unwavering support and encouragement, I would never have been able to complete this project. In spite of her hectic schedule, she listened to my problem and gavethe appropriate advice.
Furthermore, I express my very profound gratitude Prof. Dr. Peter Nauth for being the second supervisor of this work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ande-0.0.1.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

ande-0.0.1-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page