Skip to main content

A useful package for temporal clustering

Project description

A python package for temporal clustering.

Introduction

Overview

Temporal clustering is a popular unsupervised machine-learning task with applications to datasets including census, finance, and healthcare data that is used to group time series data into different groups according to common temporal trends.

In this tscluster open-source toolbox, we provide a range of methods for temporal clustering that include both traditional and novel methods for temporal clustering as we illustrate below:

alt

As shown above, existing methods of temporal clustering in literature fall under two categories:

  • Time Series Clustering (TSC) [Aghabozorgi et al, 2014]: On the above left is shown an example of TSC which involves grouping time series (as multidimensional vectors) based on similarity metrics (e.g. euclidean distance). While cluster centers change over time, the cluster labels for each entity remain constant. For example, one can identify similar groups of stocks by clustering them on their daily price data.
  • Sequence Labelling Analysis (SLA) [Delmelle et al, 2016]: On the above right is shown an example of SLA that assumes a constant (non-changing) cluster center, but which allows for the cluster labels of each entity to change over time unlike TSC. For example, SLA could be used to identify trends in gentrification as indicated by a census tract transitioning from a low income and high unemployment cluster label to a high income and low unemployment cluster label.

While tscluster supports both TSC and SLA in a common framework, it also provides novel combinations of these methods (e.g., allowing both dynamic cluster centers and cluster labels) as we outline next.

alt

In this table, we organize all clustering methods according to two choices:

  1. In the rows we can choose to have either static (unchanging) cluster centers or dynamic (changing) cluster centers over time.
  2. In the columns we can choose how labels are allowed to change over time: static (no label change), unbounded (unlimited label change), or bounded (an upper limit on the number of label changes allowed).

Perhaps one of the most important novel tools in tscluster is specifically the capability to perform Bounded Fully Dynamic clustering (middle bottom), which allows us to identify the (anomalous) entities that diverge most from existing dynamic trends. As an example use case for census analysis, we can identify census tracts that change due to external forces (e.g., significant rezoning).

Purpose and Benefits

With tscluster, you can:

  • Use opttscluster subpackage to cluster temporal data using any combination of static or fixed cluster labels and centers with optimality guarantees underscored by Mixed Linear Integer Programming.

  • Use opttscluster subpackage to find entities that are most likely to change cluster label assignment if a total number of n label changes are allowed.

Tscluster also encompassed the two existing approaches by proving the following classes available in its tskmeans subpackage:

  • TSKmeans class for TSC (built on top of tslearn).
  • TSGlobalKmeans class for SLA (built on top of sklearn).

Tscluster implemented some utility tools in the following subpackages to help in temporal clustering tasks.

  • preprocessing: This can be used to preprocess and load temporal data. Data can be loaded from either a directory, a file, a list of Pandas DataFrames, or a numpy array file (.npy).

alt

  • metric: contains useful temporal clustering evaluation metrics such as inertia and max_dist.
  • tsplot: Useful for seamlessly generating 2D time series plots and 3D waterfall plots of all features within temporal data and the cluster centers.

License

This software is distributed under the MIT License.

Installation

To install, run:

pip install tscluster

Or you can install the Pre-Release Version via git

pip install git+https://github.com/tscluster-project/tscluster.git

See the complete doc: https://tscluster.readthedocs.io/en/latest/index.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tscluster-1.0.4.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tscluster-1.0.4-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file tscluster-1.0.4.tar.gz.

File metadata

  • Download URL: tscluster-1.0.4.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tscluster-1.0.4.tar.gz
Algorithm Hash digest
SHA256 4bccf3b99eea9e80ea88f8410136c19fe30a9dfbf8c7544d452848b15f4d34ad
MD5 2bb08bb60fcf91d89bf4712c05ec1a9d
BLAKE2b-256 c03bd043f203846239b24beb6dc5e0d15f12bef45a17bc4b90f19d13590df53f

See more details on using hashes here.

File details

Details for the file tscluster-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: tscluster-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tscluster-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 49433692d305ad6b6e2bcb9c3cbeb30a9535cbe8857a93eb6c8087803ed5d401
MD5 2aee8d86f9a4cd4126ecdf596fa7baf8
BLAKE2b-256 39ac3cef62a3416ce1b092fefce979fe03c1d0a8323049db5af3ee31298ff243

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page