FraudTransactionDetector

Scalable Fraud Transaction Identifier using Clustering, Anomaly Detection and Classification ML Algorithms

Project description

The objective of this project is to come up with a classfication machine learning model which identifies anomaly data/records from genuine data/records given unclassified/unlabeled data as input. This generic objective has application in lot of domains like Healthcare, Stocks Trading, Banking, System Security etc. and few of the use cases are as below:

Fradulent Medical Claim detection
Fradulent Credit Card Transactions
Early detection of insider trading
Intrusion detection

Technologies used

As the module needs to be scalable and handle Big Data involving Hundreds of Millions of records, I have chosen to use

Apache Spark
H2o

My Approach

Below is the approach taken and algorithms used to solve the problem at hand:

K-Means Clustering from Apache Spark MLlib
- To identify clusters in the given unlabeled data
- Handles Big Data and scales on a cluster of machines
Isolation Forest from H2o
- To detect the Anamolies in each cluster identified in #1
- Handles Big Data and works seamlessly with Apache Spark
Gradient Boosted Classification Trees from Spark MLlib
- To create Ensemble classification model
- Handles Big Data and scales on a cluster of machines
Model optimization using Apache Spark MLlib CrossValidator
PCA
- Dimensionality Reduction to visualize the data in 3D

How to import and use the package?

Below is the sample usage:

from fraudtransactiondetector import FraudTransactionClassifier
classifier = FraudTransactionClassifier(numClusters=num_clusters,
                                        quantile=0.99)

classifier.fit(df)
print(classifier.modelValidationMetrics())

# Apply it on entire Training data just to check
results = classifier.transform(df)

# Apply PCA and Visualize
classifier.visualizeByApplyingPCA()

# Select optimal number of clusters using Elbow Method
classifier.selectOptimalClusters(df)

Software Requirements

Before installing the package, please ensure that the following softwares are installed:

Apache Spark 2.4.3 toward pyspark

Java (JDK 8)

Along with the package, the below packages will be installed when you do ‘pip install FraudTransactionDetector’:

h2o == 3.30.0.1

pandas == 0.25.1

numpy == 1.16.5

matplotlib == 3.1.3

scikit-learn == 0.21.3

Project details

Release history Release notifications | RSS feed

This version

0.1.4

Apr 27, 2020

0.1.3.dev0 pre-release

Apr 27, 2020

0.1.1.dev0 pre-release

Apr 26, 2020

0.1.0.dev0 pre-release

Apr 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FraudTransactionDetector-0.1.4.tar.gz (913.6 kB view details)

Uploaded Apr 27, 2020 Source

File details

Details for the file FraudTransactionDetector-0.1.4.tar.gz.

File metadata

Download URL: FraudTransactionDetector-0.1.4.tar.gz
Upload date: Apr 27, 2020
Size: 913.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for FraudTransactionDetector-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`f0452f5fce961731b5ccad0612ca1f795d50a5ae188403afe2d09e956c157562`
MD5	`e14eef863f8a8da5bf46dac8bfe30e0a`
BLAKE2b-256	`80e0533deee8861c2d18bc75f40acc7e14bcc05fd982a48969f3b4c7dbbf6e89`

See more details on using hashes here.

FraudTransactionDetector 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta