Spatio-temporal cluster analysis with density-based distance augmentation
Project description
SCADDA
Spatio-temporal cluster analysis with density-based distance augmentation
SCADDA is a tool for spatio-temporal clustering with density-based distance re-scaling. Its core based is on the ST-DBSCAN algorithm, which is a previous extension of the common DBSCAN algorithm. Extensions that are incorporated in this new method are the re-scaling of the computed distance matrix with kernel density estimation over the spatial dimensions and a modulated logistic function, as well as time series distance measurements with dynamic time warping (DTW), which makes use of global constraints via the Sakoe-Chiba band modified with the Paliwal adjustment window.
As a result, SCADDA is motivated by many real-world spatio-temporal clustering problems in which the spatial distribution of data points is very varied, with large numbers of data points in a few centers and sparsely distributed data points throughout the spatial dimensions. The taken approach allows for taking these geographical issues into consideration when looking for clusters. This alleviates an issue with ST-DBSCAN, which considers most data points outside of such high-density regions as outliers. It is a general-purpose software tool that can be used to cluster any spatio-temporal dataset with known latitude and longitude, as well as a time series for a variable, for each data point.
Installation
SCADDA can be installed via PyPI, with a single command in the terminal:
pip install scadda
Alternatively, the file scadda.py can be downloaded from the folder scadda in this repository and used locally by placing the file into the working directory for a given project. An installation via the terminal is, however, highly recommended, as the installation process will check for the package requirements and automatically update or install any missing dependencies, thus sparing the user the effort of troubleshooting and installing them themselves.
Quickstart guide
SCADDA requires the user to provide spatial data (s_data) as a Nx2 array for N data points, with longitudes in the first and latitudes in the second column, as well as the same number of time series per spatial data point (t_data) as an NxM array with M as the length of the time series. The spatial (s_limit) and temporal (t_limit) maximal distances for points to be considered part of the same cluster, as well as the steepness for the logistic function used for the distance re-scaling (steepness) and the mininum number of neighbors required for a cluster (minimum_neighbors), also have to be provided. In addition, the window size for the Paliwal adjustment window can be set (window_param) by the user, but this parameter is optional and will default to a data-dependent rule-of-thumb calculation.
Lastly, two additional optional parameters can be set: The distance measure (distance_measure) uses the great circle distance for longitudes and latitudes is used by default, but can be set to either "greatcircle" or "euclidean". If a maximum percentage of outliers (outlier_perc) is provided, SCADDA will run additional iterations over the remaining outliers to assign them to pseudo-clusters. This option should be handled with care. For each new iteration, the maximum spatial and temporal distances for data points to be in the same cluster are doubled to assure convergence. Parameters are listed below.
| Variables | Explanations | Default |
|---|---|---|
| s_data | The spatial data, i.e. the data point coordinates | |
| t_data | The temporal data, i.e. time series per data point | |
| s_limit | The maximum same-cluster spatial distance | |
| t_limit | The maximum same-cluster temporal distance | |
| minimum_neighbours | The minimum number of neighbours for non-outliers | |
| steepness | The curve steepness density-based distance weights | |
| window_param (optional) | The window size for the Sakoe-Chiba band for DTW | 0.1 * length(t_data) |
| distance_measure (optional) | The distance measure for the spatial clustering | "greatcircle" |
| outlier_perc (optional) | The maximum allowed percentage of outliers | 100 |
After the installation via PyPI, or using the scadda.py file locally, the usage looks like this:
from scadda import clustering
cluster_assignments = clustering(s_data = your_spatial_data,
t_data = your_time_series_data,
s_limit = your_spatial_limit,
t_limit = your_temporal_limit,
steepness = your_curve_steepness,
minimum_neighbors = your_minimum)
An application example with visualizations can be found on the GitHub page for SCADDA.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scadda-1.0.0-py3-none-any.whl.
File metadata
- Download URL: scadda-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0d8520a661b2100bcb8d94a58bf5089bf9d687fdde918ff39a3b1f387037364
|
|
| MD5 |
2f071984923ffe53d3107103f11016d1
|
|
| BLAKE2b-256 |
8500d0284191967f52f49763145d4f3d87e9d97ce3bf3ea259e573c863c65b6b
|