Skip to main content

For Clustering, ploting and Perfomence matrics

Project description


Efficient Method for Optimizing Anomaly Detection with Clustering Algorithms
and for Unifiying in a Package

To create a common platform for anomaly detection process with some popular clustering algorithms to be an easy solution for data analysis to verify the process data with other clustering algorithms.

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

The world of data is growing very fast, and it is a new challenge for data analysis to develop new methods to handle this massive amount of data. A large number of data have many hidden factors that need to be identified and used for different algorithms. Clustering is one of the significant parts of data mining. The term comes from the idea of classifying unsupervised data. Now-a-days a lot of algorithms are implemented. Besides that, all those algorithms have some limitations, creating an opportunity to innovate new
algorithms for clustering. The clustering process can be separated in six different ways: partitioning, hierarchical, density, gridmodel, and constraint-based models. The aim of the package is to implement various types of clustering algorithms and helps to determine which one is more accurate on detecting impure data from a large data set. To create a common platform for Some popluar algorithms for anomaly detection are implemented and converged all of them into a package(AnDe). The algorithms which are implemented and combined into the package are: K-means, DBSCAN, HDBSCAN, Isolation Forest, Local Outlier Factor and Agglomerative Hierarchical Clustering. The package reduce the consumption of time by compressing implementation hurdles of each algorithms. The package is also makes the anomaly detection procedure more robust by visualizing in a more precise way along with visualization of comparison in performance(accuracy, runtime and memory consumption) of those algorithm
implemented.

Built With

For using this package, some popular packages are need to be configured in the working environment.

(back to top)

Getting Started

This is an example of how you set up thie pacage and use in you script.

Prerequisites

At first, need install the package in your working environment for using this package.

pip install python=3.8
pip install numpy
pip install pandas
pip install matplotlib
pip install time
pip install os
pip install sklearn
  pip install Hdbscan
pip install Tracemalloc

Installation

  1. Download the package from (https://github.com/cbiswascse/AUnifiedPackageForAnomalyDetection)
  2. Install the package in you environment.
    pip install cb-cluster
    
  3. Import the pacage in your script.
    from EMOADCAUP import Cluster
    

(back to top)

Usage

  1. Call the cluster function.
    from ande import ande 
    ande.ClusterView()
    
  2. Input the Location of CSV file.
    Please, Input the Location of CSV:
    
  3. Select yes(y) If you have Catagorical data in your dataset.
    Do you want to include Catagorical data [y/n]:
    
  4. Select yes(y) If you want to scaling your dataset with MinMaxScaler.
    Scaling data with MinMaxScaler [y/n]:
    
  5. Available Clusering Algorithm Kmeans Dbscan Isolation Forest Local Factor Outlier Hdbscan Agglomerative
    Choose your Algorithm:
    
  6. Kmeans Clusering: Number of Cluster
   How many clusters you want?:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary
  1. Dbscan: Input a Epsilon value
    epsilon in Decimal:
    
  2. Input a Min Samples value
   Min Samples In Integer:
  1. Select one of Average Method for Performance Metrics
 weighted,micro,macro,binary

11.Hdbscan: Minimum size of cluster

	Minimun size of clusters you want?:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary

13.Isolation Forest: Contamination value

   Contamination value between [0,0.5]:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary
  1. Local Outlier Factor: Contamination value
   Contamination value between [0,0.5]:
  1. Select one of Average Method for Performance Metrics
   weighted,micro,macro,binary

17.Agglomerative: Number of Cluster

   How many clusters you want?:

18.Select one of Average Method for Performance Metrics

   weighted,micro,macro,binary

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Chandrima Biswas - cbiswascse26@gmail.com

Project Link: https://github.com/cbiswascse/AUnifiedPackageForAnomalyDetection

(back to top)

Acknowledgments

I would like to convey my heartfelt appreciation to my supervisor Prof.Dr. Doina Logofatu,for all her feedback, guidance, and evaluations during the work. Without her unique ideas, as well as her unwavering support and encouragement, I would never have been able to complete this project. In spite of her hectic schedule, she listened to my problem and gavethe appropriate advice.
Furthermore, I express my very profound gratitude Prof. Dr. Peter Nauth for being the second supervisor of this work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ande-0.0.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

ande-0.0.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file ande-0.0.1.tar.gz.

File metadata

  • Download URL: ande-0.0.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for ande-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6e63e2314eeb915cce12141683a1703028c325b9789d3165b99fdf3b1fb834b3
MD5 535c605ef8ee86f936de4cc9b34355c2
BLAKE2b-256 48166f7a0590a2bd3fe26000586d1726a798debb9b7cb93d11e940cc8d7e82de

See more details on using hashes here.

File details

Details for the file ande-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ande-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for ande-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 df573365caf9fade1c9a208268ef47df9c00cf37a311ce9299fcc4824792704d
MD5 4c33cfcf808cab95669b2e081ba7d262
BLAKE2b-256 336b4b75a63508a3c8b35105dead8aee74d11c704a80feac17121f669c845a39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page