User-friendly thresholded subspace-constrained mean shift for geospatial data

## Project description

# DREDGE

### User-friendly thresholded subspace-constrained mean shift for geospatial data

DREDGE, short for *Density Ridge Estimation Describing Geospatial Evidence*, arguably an unnecessarily forced acronym, offers a new tool to find density ridges in latitude-longitude coordinates based on the subspace-constrained mean shift (SCMS) algorithm introduced by Ozertem and Erdogmus (2011). The tool approximates principal curves for a given set of coordinates, featuring various improvements over the initial algorithm and alterations to facilitate the application to geospatial data: Thresholding, as described in cosmological research by Chen et al. (2015) and Chen et al. (2015), avoids dominant density ridges in sparsely populated areas of the dataset. In addition, the haversine formula is used as a distance metric to calculate the great circle distance, which makes the tool applicable not only to city-scale data, but also to datasets spanning multiple countries by taking the Earth's curvature into consideration.

In essence, DREDGE provides density-based line points which optimize the distance to a dataset of coordinates along those lines, with larger bandwidths leading to a decrease in summed line length and an increase in the average distance to the nearest line. Since DREDGE was initially developed to be applied to crime incident data, the default bandwidth calculation follows a best-practice approach that is well-accepted within quantitative criminology, using the mean distance to a given number of nearest neighbors (Williamson et al., 1999). Since practitioners in that area of study are often interested in the highest-density regions of a dataset, the tool also features the possibility to specify a top-percentage level for a kernel density estimate that the ridge points should fall within.

### Installation

DREDGE can be installed via PyPI, with a single command in the terminal:

```
pip install dredge
```

Alternatively, the file `dredge.py`

can be downloaded from the folder `dredge`

in this repository and used locally by placing the file into the working directory for a given project. An installation via the terminal is, however, highly recommended, as the installation process will check for the package requirements and automatically update or install any missing dependencies, thus sparing the user the effort of troubleshooting and installing them themselves.

### Quickstart guide

DREDGE only requires a two-column NumPy array as its primary input (`coordinates`

), with one data point per row, and latitude and longitude values in the columns. Four additional optional parameters can, however, be set: The number of nearest neighbors (`neighbors`

) used to automatically calculate an optimal bandwidth can be manually changed, the bandwidth (`bandwidth`

) itself can be forced to a certain value, and the threshold used to check for convergence between iterations can be set (`threshold`

). The fourth parameter (`percentage`

) unlocks an additional functionality of DREDGE, as the interest of practitioners is often constrained to high-density areas. For a user-provided percentage value *p*, the kernel density estimation in the tool's inner workings is used to only retain ridge points above the (100 - *p*)th percentile of the provided dataset's density landscape. This allows, for example, route matching to be focused on these areas.

Variables | Explanations | Default |
---|---|---|

coordinates | The spatial data as latitude-longitude coordinates | |

neighbors (optional) | The number of nearest neighbors to get a bandwidth | 10 |

bandwidth (optional) | The bandwidth used for kernel density estimates | None |

convergence (optional) | The threshold used for inter-iteration convergence | 0.01 |

percentage (optional) | The aimed-for percentage of highest-density ridges | None |

After the installation via PyPI, or using the `dredge.py`

file locally, the usage looks like this:

```
from dredge import filaments
filaments(coordinates = your_coordinates,
percentage = 5)
```

## Project details

## Release history Release notifications | RSS feed

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.