Skip to main content

A customer segmentation package for preprocessing data

Project description

Customer Segmentation Package

Overview

This data analysis package provides comprehensive tools for preprocessing, feature engineering, clustering, and feature selection/reduction of data. It streamlines and automates common data analysis tasks, making it easier to prepare datasets for further analysis and machine learning. The package includes functionalities for validating data structure, handling missing values, removing outliers, scaling data, and much more.

Features

  • Data Preprocessing

    • Data Structure Validation: Ensures that the dataset meets expected structural requirements.
    • Null Value Removal: Identifies and removes or imputes missing values.
    • Outlier Removal: Detects and removes outliers from the dataset.
    • Data Scaling: Standardizes or normalizes data for consistent analysis.
  • Feature Engineering

    • RFM (Recency, Frequency, Monetary) Calculation: Computes RFM metrics for customer segmentation and analysis.
    • Velocity Calculation: Measures the rate of change in data over time.
    • Growth Calculation: Computes the growth metrics across data points.
  • Feature Selection and Reduction

    • Information Gain Calculation: Evaluates the importance of features in predicting target variables.
    • WOE (Weight of Evidence) and IV (Information Value) Calculation: Assesses the predictive power of categorical features.
    • PCA (Principal Component Analysis): Reduces dimensionality of data and allows for inverse transformation to original space.
  • Advanced Clustering

    • Best Clustering Method Selection: Provides various clustering algorithms (e.g., KMeans, DBSCAN, EM, MeanShift, Agglomerative) and selects the most suitable one based on data characteristics.

Requirements

To use this package, you need to have the following installed:

  • Python 3.7 or higher
  • The following Python libraries:
    • pandas
    • numpy
    • scikit-learn
    • scipy
    • matplotlib
    • seaborn
    • statsmodels

You can install these dependencies using:

pip install -r requirements.txt

To import the package,you can use the following format:

from customer_segmentation_clustering.main import main

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

custmr_segmentation-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

Custmr_segmentation-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file custmr_segmentation-0.1.0.tar.gz.

File metadata

  • Download URL: custmr_segmentation-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for custmr_segmentation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4ff2e72e9b7344b287509b51f64c3f574b3dd55a960609819f176e09f3bb036a
MD5 5a10786d7c10a1dd314bf19b935d1aac
BLAKE2b-256 db38ebb3cdafbb64ab604d6684ad39821e97074827c279b0b1025a823f73ce64

See more details on using hashes here.

File details

Details for the file Custmr_segmentation-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for Custmr_segmentation-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 42fd2f873a0df8eb9ed7a213a5c93aec9fc624ef70d30a87f4a8784de6cbfd9a
MD5 3f79663e00e6efbb6f6d0da64ff195c8
BLAKE2b-256 98118eb7ee2eaed695ae27a3d7b701f8035608d9abce9800d5c1cf3fa79b8a76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page