High-performance audience segmentation engine with Python bindings

These details have not been verified by PyPI

Project links

Project description

ClusterAudienceKit

The ONLY Python library built for customer segmentation in marketing automation

RFM analysis, clustering, segment profiling, and streaming updates — one pip install, one import, one API. Stop stitching together sklearn, pandas, and lifetimes for every marketing project.

Why Star ClusterAudienceKit?

First MarTech-focused library — Built for marketing engineers and data scientists in CDP/marketing ops
No more glue code — All-in-one: RFM analysis + KMeans clustering + segment profiling + streaming updates
Production-ready — Scales to millions of customers, handles real CDP workflows
Streaming support — Update segments as new transactions arrive, not batch-only
Drift detection — Know when segment quality degrades, automatic alerts
MIT licensed — Free for commercial use

Star if you're tired of rebuilding customer segmentation for every campaign or if ClusterAudienceKit powers your CDP.

Installation

# pip
pip install clusteraudiencekit

# uv
uv pip install clusteraudiencekit

# curl (pre-built wheel — see INSTALL.md for all platforms)
curl -L -O https://github.com/Mullassery/clusteraudiencekit/releases/download/v0.1.0/clusteraudiencekit-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
pip install ./clusteraudiencekit-0.1.0-cp313-cp313-macosx_11_0_arm64.whl

Quick Example

from clusteraudiencekit import AudienceSegmenter
import pandas as pd

# Load transaction data from your CRM, CDP, or data warehouse
transactions = pd.read_csv('transactions.csv')
# Required columns: customer_id, transaction_date, amount

# Segment customers into marketing groups using RFM + KMeans
segmenter = AudienceSegmenter(method='rfm_kmeans', n_clusters=4)
segmenter.fit(transactions)

# Get segment assignment for each customer
segments = segmenter.predict(transactions)

# View marketing profile for each segment
profiles = segmenter.segment_profiles()
print(profiles)
#   segment | size  | avg_recency | avg_frequency | avg_monetary
#   0       | 250k  | 15.3 days   | 8.2 purchases | $450   <- high-value loyalists
#   1       | 180k  | 45.2 days   | 3.1 purchases | $120   <- regular buyers
#   2       | 320k  | 2.1 days    | 2.0 purchases | $80    <- new / recent
#   3       | 250k  | 60.5 days   | 1.0 purchases | $30    <- at-risk / dormant

# Validate segment quality before using in campaigns
print(f"Silhouette score: {segmenter.silhouette_score():.3f}")

Why ClusterAudienceKit

There is no dedicated Python library for customer segmentation in Martech today. Marketing engineers and data scientists stitch together sklearn, pandas, and lifetimes for every project — writing hundreds of lines of glue code that does not stream, does not detect drift, and fails silently at scale.

ClusterAudienceKit replaces the entire stack:

Capability	scikit-learn	pandas	lifetimes	ClusterAudienceKit
RFM calculation	No	Manual	No	Yes
Customer clustering (KMeans)	Yes	No	No	Yes
Mixed data clustering (K-Prototypes)	No	No	No	Yes
Marketing segment profiles	No	Manual	No	Yes
Segment quality metrics	Yes	No	No	Yes
Streaming / incremental updates	No	No	No	Yes
Segment drift detection	No	No	No	Yes
Save / load model state	No	No	Yes	Yes
Customer lifetime value (CLV)	No	No	Yes	Planned
Multi-core parallelisation by default	Partial	No	No	Yes

See docs/comparison.md for the full comparison including code examples and benchmarks.

Performance

Real measured timings (Apple M1, sklearn 1.6.1, pandas 3.0.3):

Customer base	sklearn + pandas	ClusterAudienceKit (Phase 1 target)
1,000	38ms	<9ms
10,000	606ms	<37ms
100,000	>2.7 hours*	<130ms
1,000,000	Would not complete	<470ms

* The sklearn silhouette_score is O(n²). At 100k customers it takes over 2.7 hours — unusable for any Martech team working with real audience sizes. ClusterAudienceKit targets <200ms at 1M customers.

See BENCHMARKS.md for full methodology and step-by-step timing breakdowns.

Features

10-25x faster than the sklearn + pandas pipeline for customer segmentation
Streaming-first — ingest marketing events and update segments incrementally without full recomputation
Integrated pipeline — RFM, clustering, segment profiles, and quality metrics in one library
Marketing-ready output — segment profiles surface avg recency, frequency, and spend per group
K-Prototypes support — cluster on RFM plus categorical attributes (channel, region, product category)
Drift detection — segment_stability() flags when campaigns or seasonality have shifted your audience
State management — save_state() and load_state() for production Martech pipelines
sklearn-compatible — fit(), predict(), transform() interface; works in existing ML pipelines

Customer Segmentation Methods

RFM + KMeans

The Martech industry standard. RFM (Recency, Frequency, Monetary) quantifies each customer's engagement and spend, then KMeans groups them into actionable marketing segments.

segmenter = AudienceSegmenter(method='rfm_kmeans', n_clusters=4)
segmenter.fit(df)

RFM + K-Prototypes

Extends RFM with categorical marketing attributes — acquisition channel, product category, geographic region — for richer, more targeted customer segmentation.

segmenter = AudienceSegmenter(method='rfm_kprototypes', n_clusters=5)
segmenter.fit(df, categorical_columns=['channel', 'region', 'product_category'])

Streaming Segment Updates

Keep customer segments current as marketing events arrive daily, without reprocessing your full customer history:

segmenter.fit(historical_data)

for daily_events in event_stream:
    segmenter.update(daily_events)

    stability = segmenter.segment_stability(previous_segments)
    if stability < 0.85:              # significant post-campaign drift
        segmenter.fit(all_data, refit=True)

    previous_segments = segmenter.predict(customers)

Configuration Reference

AudienceSegmenter(
    method='rfm_kmeans',        # 'rfm_kmeans' | 'rfm_kprototypes' | 'kmeans_only'
    n_clusters=4,                # number of customer segments
    recency_window_days=90,      # marketing lookback window in days
    decay_function='linear',     # 'linear' | 'exponential' | 'inverse'
    decay_half_life_days=30,     # half-life for exponential decay weighting
    frequency_threshold=1,       # minimum transactions to include a customer
    monetary_threshold=0.0,      # minimum spend to include a customer
    random_state=42,             # seed for reproducibility
    n_jobs=-1,                   # parallelisation (-1 = all cores)
)

Documentation

Document	Description
INSTALL.md	pip, uv, and curl installation instructions
docs/api-reference.md	Full API reference for all 13 methods
docs/getting-started-simple.md	Non-technical guide for marketing teams
docs/comparison.md	Detailed comparison vs sklearn, pandas, lifetimes
BENCHMARKS.md	Benchmark methodology and results
docs/troubleshooting.md	Common errors and solutions
docs/architecture.md	Architecture and design decisions
examples/	Runnable example scripts

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.

Bug reports and feature requests: GitHub Issues
Questions and discussion: GitHub Discussions

Authors

Georgi Mammen Mullassery — github.com/Mullassery

License

Released under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clusteraudiencekit-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (231.0 kB view details)

Uploaded Jun 20, 2026 CPython 3.13macOS 11.0+ ARM64

File details

Details for the file clusteraudiencekit-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

Download URL: clusteraudiencekit-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Upload date: Jun 20, 2026
Size: 231.0 kB
Tags: CPython 3.13, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for clusteraudiencekit-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`9b9c3ce34c58feb2bb8ba5ea5c4d4f653f2d62bc11186f22cff16d85904a075e`
MD5	`a57bea321b15961e3d7006c87676d900`
BLAKE2b-256	`100132ce1b1a1003a4d3fbc35d48bca50602b59d73b2a0dfae9e24aa41523288`

See more details on using hashes here.

clusteraudiencekit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ClusterAudienceKit

Why Star ClusterAudienceKit?

Installation

Quick Example

Why ClusterAudienceKit

Performance

Features

Customer Segmentation Methods

RFM + KMeans

RFM + K-Prototypes

Streaming Segment Updates

Configuration Reference

Documentation

Contributing

Authors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes