An implementation of the MixSAD algorithm for anomaly detection in mixed-feature data.
Project description
MixSAD: High-Performance Fraud Detection This project implements a high-performance, supervised learning pipeline for fraud detection. Originally based on the unsupervised MixSAD algorithm, the model has been significantly enhanced to use a direct supervised approach, enabling it to achieve high accuracy and recall on complex fraud detection tasks.
The current implementation is optimized to run on the Kaggle Credit Card Fraud Detection dataset.
Core Approach: Supervised Prediction The key to the model's high performance is its shift from unsupervised anomaly detection to a direct supervised classification strategy.
Supervised Feature Engineering: The pipeline trains a LogisticRegression model on the labeled data. This model's primary purpose is to generate a powerful, predictive feature: a fraud_score for each transaction, which represents the probability of that transaction being fraudulent.
Threshold-Based Prediction: Instead of using a complex secondary model, predictions are made by applying a simple probability threshold to the fraud_score. Any transaction with a score greater than or equal to the threshold is classified as fraud.
This direct approach is highly effective and transparent, allowing for precise control over the model's sensitivity to fraud.
Project Structure mixsad/: The main package source code, including the pipeline, preprocessor, feature_engineer, and prediction_builder.
examples/: Contains the run_on_kaggle_data.py script demonstrating how to use the package.
pyproject.toml: The package configuration file.
README.md: This file.
Setup and Installation Local Setup
Clone the repository and navigate into it.
Create a virtual environment: python -m venv venv and activate it.
Install requirements: pip install -r requirements.txt
Install the package in editable mode: pip install -e .
Usage Download the Dataset:
Download the "Credit Card Fraud Detection Dataset" from Kaggle.
Rename the file to credit_card_fraud.csv and place it in the project's root directory.
Run the Example: Execute the example script to see the model in action:
python examples/run_on_kaggle_data.py
Fine-Tuning for High Performance 🎯 For fraud detection, missing a real case of fraud (low recall) is usually much worse than flagging a legitimate transaction for review (low precision). The primary way to fine-tune this model is by adjusting the probability threshold.
Adjusting the Prediction Threshold
The run method of the pipeline accepts a threshold parameter.
A higher threshold (e.g., 0.7) makes the model more conservative. It will only flag transactions it is very confident are fraudulent. This leads to high precision but lower recall.
A lower threshold (e.g., 0.3) makes the model more sensitive. It will flag transactions that have even a small chance of being fraudulent. This leads to high recall but lower precision.
The examples/run_on_kaggle_data.py script demonstrates this principle by running the pipeline with two different thresholds to show how it directly impacts the precision-recall trade-off.
The example script shows how to adjust the threshold
to meet the goal of >90% recall for fraud.
pipeline.run(df_features, true_labels, threshold=0.30)
By adjusting this single parameter, you can configure the model to meet the specific business requirements of your fraud detection system.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mixsad_anomaly_detection-0.1.0.tar.gz.
File metadata
- Download URL: mixsad_anomaly_detection-0.1.0.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c68e176ea786f5daf4db839996f5923f872b0bdc27f9c9c274d32d0a1e7a376e
|
|
| MD5 |
c5b082895387638e813a5e86c9c12de4
|
|
| BLAKE2b-256 |
b18deedd3db5b9eec373931f7bd62f16c090084043b30939508992253a7caca8
|
File details
Details for the file mixsad_anomaly_detection-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mixsad_anomaly_detection-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fc6f00177219fc000a40358905ed08df199d70e3083e14a639aa02085720c87
|
|
| MD5 |
5d282f77016efacbd7117815b51a1ad5
|
|
| BLAKE2b-256 |
4ebe2e80c9dc8c5abded627d0a2fde08bca1ca9c8d4b386877ef57ad46ff7ecc
|