parameter-free clustering algorithm
Project description
TX-Means
TX-Means is a parameter-free clustering algorithm able to efficiently partitioning transactional data in a completely automatic way. TX-Means is designed for the case where clustering must be applied on a massive number of different datasets, for instance when a large set of users need to be analyzed individually and each of them has generated a long history of transactions.
In this repository we provide the source code of TX-Means, the clustering algorithm competitors and the dataset used in
Riccardo Guidotti, Anna Monreale, Mirco Nanni, Fosca Giannotti, Dino Pedreschi "Clustering Individual Transactional Data for Masses of Users", KDD 2017, 2017, Halifax, NS, Canada
Please cite the paper above if you use our code or dataets.
Where to get it
The source code is currently hosted on GitHub at: https://github.com/riccotti/TX-Means
How to install
pip install TXMeans
How to import (some examples)
from TXMeans.txmeans import TXmeans
from TXMeans.util import count_items, remap_items, sample_size (Util functions)
from TXMeans.util import basket_list_to_bitarray, basket_bitarray_to_list (Converting(Reverting) to(from) bitarray)
from TXMeans.datamanager import read_uci_data (Convert the data in nice basket format)
from TXMeans.validation_measures import delta_k, purity, normalized_mutual_info_score (Measure of Validation)
from TXMeans.util import jaccard_bitarray
Requirements:
- python >= 3
- numpy >= 1.10.1
- pandas >= 0.18.1
- scipy >= 0.17.1
- bitarray >= 0.8.1
- Java >= 8.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.