A customer segmentation package for preprocessing data
Project description
Customer Segmentation Package
Overview
This data analysis package provides comprehensive tools for preprocessing, feature engineering, clustering, and feature selection/reduction of data. It streamlines and automates common data analysis tasks, making it easier to prepare datasets for further analysis and machine learning. The package includes functionalities for validating data structure, handling missing values, removing outliers, scaling data, and much more.
Features
-
Data Preprocessing
- Data Structure Validation: Ensures that the dataset meets expected structural requirements.
- Null Value Removal: Identifies and removes or imputes missing values.
- Outlier Removal: Detects and removes outliers from the dataset.
- Data Scaling: Standardizes or normalizes data for consistent analysis.
-
Feature Engineering
- RFM (Recency, Frequency, Monetary) Calculation: Computes RFM metrics for customer segmentation and analysis.
- Velocity Calculation: Measures the rate of change in data over time.
- Growth Calculation: Computes the growth metrics across data points.
-
Feature Selection and Reduction
- Information Gain Calculation: Evaluates the importance of features in predicting target variables.
- WOE (Weight of Evidence) and IV (Information Value) Calculation: Assesses the predictive power of categorical features.
- PCA (Principal Component Analysis): Reduces dimensionality of data and allows for inverse transformation to original space.
-
Advanced Clustering
- Best Clustering Method Selection: Provides various clustering algorithms (e.g., KMeans, DBSCAN, EM, MeanShift, Agglomerative) and selects the most suitable one based on data characteristics.
Requirements
To use this package, you need to have the following installed:
- Python 3.7 or higher
- The following Python libraries:
pandas
numpy
scikit-learn
scipy
matplotlib
seaborn
statsmodels
You can install these dependencies using:
pip install -r requirements.txt
To import the package,you can use the following format:
from customer_segmentation_clustering import main
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file custmr_segmentation-0.0.0.tar.gz
.
File metadata
- Download URL: custmr_segmentation-0.0.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3c984ac426e152816f901c6f71e4d4f07cd88aeb101882bb5315e176aa902b6 |
|
MD5 | 260c147224dc48baec3f2c16a83415c9 |
|
BLAKE2b-256 | a066948900cee9ed9be3685c721bf809fe7a4012aacb7bdc2d6af48a3bb12d1b |
File details
Details for the file Custmr_segmentation-0.0.0-py3-none-any.whl
.
File metadata
- Download URL: Custmr_segmentation-0.0.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aff98590fdd22e7278a5197af4601628fdf4ec59c3a319e1e1969d7a2b800821 |
|
MD5 | b1887b772b383d1c3aae9b8fbfea5b80 |
|
BLAKE2b-256 | efc1d5ee9f0bdf6fe7a00719fd9d950f26c47343e6dc2b4c6b73f271cf3aa743 |