FraudTransactionDetector

Scalable Fraud Transaction Identifier using Clustering, Anamoly Detection and Classification ML Algorithms

Project description

The generic objective of this project is to identify clusters in the data and finding out anamolies/outliers in each cluster which gives a mapping to each data point to determine whether it is an anamoly or genuine one. With this information, we can create a classification model through which we can segregate say fraud transactions from genuine ones. This algorithm can be applied to lot of use cases such as:

Fradulent Medical Claim detection
Fradulent Credit Card Transactions
Early detection of insider trading
System Security

Technologies used

As the package needs to be scalable and handle Big Data involving Hundreds of Millions of records, I have chosen to use

Apache Spark
H2o

My Approach

Below is the approach taken and algorithms used to solve the problem at hand:

K-Means Clustering from Apache Spark MLlib to identify clusters
Isolation Forest from H2o to detect the Anamolies
PCA to visualize the data in 3D by reducing the number of dimensions
Gradient Boosted Classification Trees from Spark MLlib to create classification model
Model optimization using Apache Spark MLlib Cross Validator

How to import and use the package?

Project details

Release history Release notifications | RSS feed

0.1.4

Apr 27, 2020

0.1.3.dev0 pre-release

Apr 27, 2020

This version

0.1.1.dev0 pre-release

Apr 26, 2020

0.1.0.dev0 pre-release

Apr 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FraudTransactionDetector-0.1.1.dev0.tar.gz (2.0 kB view hashes)

Uploaded Apr 26, 2020 Source

Hashes for FraudTransactionDetector-0.1.1.dev0.tar.gz

Hashes for FraudTransactionDetector-0.1.1.dev0.tar.gz
Algorithm	Hash digest
SHA256	`b2b906c1e48a5fed109475c9632359fe4cca662db19732b01301b5a2edc19509`
MD5	`f80248d3e05402265d344125af611029`
BLAKE2b-256	`7c88f2ae273f0d12ce6e77416ab5c613b98607766fa61309d3d9a605b9f14682`