An implementation of the MixSAD algorithm for anomaly detection in mixed-feature data.

These details have not been verified by PyPI

Project links

Project description

MixSAD: High-Performance Fraud Detection This project implements a high-performance, supervised learning pipeline for fraud detection. Originally based on the unsupervised MixSAD algorithm, the model has been significantly enhanced to use a direct supervised approach, enabling it to achieve high accuracy and recall on complex fraud detection tasks.

The current implementation is optimized to run on the Kaggle Credit Card Fraud Detection dataset.

Core Approach: Supervised Prediction The key to the model's high performance is its shift from unsupervised anomaly detection to a direct supervised classification strategy.

Supervised Feature Engineering: The pipeline trains a LogisticRegression model on the labeled data. This model's primary purpose is to generate a powerful, predictive feature: a fraud_score for each transaction, which represents the probability of that transaction being fraudulent.

Threshold-Based Prediction: Instead of using a complex secondary model, predictions are made by applying a simple probability threshold to the fraud_score. Any transaction with a score greater than or equal to the threshold is classified as fraud.

This direct approach is highly effective and transparent, allowing for precise control over the model's sensitivity to fraud.

Project Structure mixsad/: The main package source code, including the pipeline, preprocessor, feature_engineer, and prediction_builder.

examples/: Contains the run_on_kaggle_data.py script demonstrating how to use the package.

pyproject.toml: The package configuration file.

README.md: This file.

Setup and Installation Local Setup

Clone the repository and navigate into it.

Create a virtual environment: python -m venv venv and activate it.

Install requirements: pip install -r requirements.txt

Install the package in editable mode: pip install -e .

Usage Download the Dataset:

Download the "Credit Card Fraud Detection Dataset" from Kaggle.

Rename the file to credit_card_fraud.csv and place it in the project's root directory.

Run the Example: Execute the example script to see the model in action:

python examples/run_on_kaggle_data.py

Fine-Tuning for High Performance 🎯 For fraud detection, missing a real case of fraud (low recall) is usually much worse than flagging a legitimate transaction for review (low precision). The primary way to fine-tune this model is by adjusting the probability threshold.

Adjusting the Prediction Threshold

The run method of the pipeline accepts a threshold parameter.

A higher threshold (e.g., 0.7) makes the model more conservative. It will only flag transactions it is very confident are fraudulent. This leads to high precision but lower recall.

A lower threshold (e.g., 0.3) makes the model more sensitive. It will flag transactions that have even a small chance of being fraudulent. This leads to high recall but lower precision.

The examples/run_on_kaggle_data.py script demonstrates this principle by running the pipeline with two different thresholds to show how it directly impacts the precision-recall trade-off.

The example script shows how to adjust the threshold

to meet the goal of >90% recall for fraud.

pipeline.run(df_features, true_labels, threshold=0.30)

By adjusting this single parameter, you can configure the model to meet the specific business requirements of your fraud detection system.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jul 20, 2025

0.0.1

Jan 13, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mixsad_anomaly_detection-0.1.0.tar.gz (5.7 kB view details)

Uploaded Jul 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mixsad_anomaly_detection-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Jul 20, 2025 Python 3

File details

Details for the file mixsad_anomaly_detection-0.1.0.tar.gz.

File metadata

Download URL: mixsad_anomaly_detection-0.1.0.tar.gz
Upload date: Jul 20, 2025
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for mixsad_anomaly_detection-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c68e176ea786f5daf4db839996f5923f872b0bdc27f9c9c274d32d0a1e7a376e`
MD5	`c5b082895387638e813a5e86c9c12de4`
BLAKE2b-256	`b18deedd3db5b9eec373931f7bd62f16c090084043b30939508992253a7caca8`

See more details on using hashes here.

File details

Details for the file mixsad_anomaly_detection-0.1.0-py3-none-any.whl.

File metadata

Download URL: mixsad_anomaly_detection-0.1.0-py3-none-any.whl
Upload date: Jul 20, 2025
Size: 7.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for mixsad_anomaly_detection-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fc6f00177219fc000a40358905ed08df199d70e3083e14a639aa02085720c87`
MD5	`5d282f77016efacbd7117815b51a1ad5`
BLAKE2b-256	`4ebe2e80c9dc8c5abded627d0a2fde08bca1ca9c8d4b386877ef57ad46ff7ecc`

See more details on using hashes here.

mixsad-anomaly-detection 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The example script shows how to adjust the threshold

to meet the goal of >90% recall for fraud.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes