Skip to main content

User-friendly thresholded subspace-constrained mean shift for geospatial data

Project description

DREDGE

User-friendly thresholded subspace-constrained mean shift for geospatial data

DREDGE, short for Density Ridge Estimation Describing Geospatial Evidence, arguably an unnecessarily forced acronym, offers a new tool to find density ridges in latitude-longitude coordinates based on the subspace-constrained mean shift (SCMS) algorithm introduced by Ozertem and Erdogmus (2011). The tool approximates principal curves for a given set of coordinates, featuring various improvements over the initial algorithm and alterations to facilitate the application to geospatial data: Thresholding, as described in cosmological research by Chen et al. (2015) and Chen et al. (2015), avoids dominant density ridges in sparsely populated areas of the dataset. In addition, the haversine formula is used as a distance metric to calculate the great circle distance, which makes the tool applicable not only to city-scale data, but also to datasets spanning multiple countries by taking the Earth's curvature into consideration.

In essence, DREDGE provides density-based line points which optimize the distance to a dataset of coordinates along those lines, with larger bandwidths leading to a decrease in summed line length and an increase in the average distance to the nearest line. Since DREDGE was initially developed to be applied to crime incident data, the default bandwidth calculation follows a best-practice approach that is well-accepted within quantitative criminology, using the mean distance to a given number of nearest neighbors (Williamson et al., 1999). Since practitioners in that area of study are often interested in the highest-density regions of a dataset, the tool also features the possibility to specify a top-percentage level for a kernel density estimate that the ridge points should fall within.

Installation

DREDGE can be installed via PyPI, with a single command in the terminal:

pip install dredge

Alternatively, the file dredge.py can be downloaded from the folder dredge in this repository and used locally by placing the file into the working directory for a given project. An installation via the terminal is, however, highly recommended, as the installation process will check for the package requirements and automatically update or install any missing dependencies, thus sparing the user the effort of troubleshooting and installing them themselves.

Quickstart guide

DREDGE only requires a two-column NumPy array as its primary input (coordinates), with one data point per row, and latitude and longitude values in the columns. Four additional optional parameters can, however, be set: The number of nearest neighbors (neighbors) used to automatically calculate an optimal bandwidth can be manually changed, the bandwidth (bandwidth) itself can be forced to a certain value, and the threshold used to check for convergence between iterations can be set (threshold). The fourth parameter (percentage) unlocks an additional functionality of DREDGE, as the interest of practitioners is often constrained to high-density areas. For a user-provided percentage value p, the kernel density estimation in the tool's inner workings is used to only retain ridge points above the (100 - p)th percentile of the provided dataset's density landscape. This allows, for example, route matching to be focused on these areas.



Variables Explanations Default
coordinates The spatial data as latitude-longitude coordinates
neighbors (optional) The number of nearest neighbors to get a bandwidth 10
bandwidth (optional) The bandwidth used for kernel density estimates None
convergence (optional) The threshold used for inter-iteration convergence 0.01
percentage (optional) The aimed-for percentage of highest-density ridges None



After the installation via PyPI, or using the dredge.py file locally, the usage looks like this:

from dredge import filaments

filaments(coordinates = your_coordinates,
                        percentage = 5) 

Project details


Release history Release notifications

This version

1.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dredge, version 1.0.0
Filename, size File type Python version Upload date Hashes
Filename, size dredge-1.0.0-py3-none-any.whl (11.4 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size dredge-1.0.0.tar.gz (9.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page