Spatial (geographic) data clustering: library of algorithms, create and test customized formulations using data simulations, visualization, map data utilities.
Project description
Analytical regionalization (also known as spatially constrained clustering) is a scientific way to decide how to group a large number of geographic areas or points into a smaller number of regions based on similarities in one or more variables (i.e., income, ethnicity, environmental condition, etc.) that the researcher believes are important for the topic at hand. Conventional conceptions of how areas should be grouped into regions may either not be relevant to the information one is trying to illustrate (i.e., using political regions to map air pollution) or may actually be designed in ways to bias aggregated results. For a literature review on spatially constrained algorithms see [Murtagh1985], [Gordon1996], [Duque_Ramos_Surinach2007].
Working with arbitrary spatial units may lead to aggregation problems such as the modifiable areal unit problem, the small numbers problem, spurious spatial autocorrelation, aggregation bias, aggregation error (in location allocation problems). Analytical regions arise as a way to minimize this type of problems.
Developer team
Juan C. Duque (Director and Co-founder)
Clustering a regular lattice:
import clusterpy n100 = clusterpy.importArcData("clusterpy/data_examples/n100") n100.cluster('arisel', ['SAR1'], 6, wType='rook', inits=10, dissolve=1) n100.results[0].exportArcData('testOutput/demo')
Clustering California:
import clusterpy calif = clusterpy.importArcData("clusterpy/data_examples/CA_Polygons") calif.cluster('arisel', ['PCR2002'], 9, wType='rook', inits=10, dissolve=1) calif.results[0].exportArcData('testOutput/demo')
Special features
- Customized analytical regionalizations based on the following user specifications/inputs:
Key areal attribute to regionalize on: User regionalizes (or clusters) data based on different variables she considers important for her problem at hand. (i.e., use your own analytical regions versus normative or administrative regions)
Maximum or minimum number of regions.
Threshold conditions of the maximum or minimum value that all regional clusters must meet for a given variable (i.e., a minimum threshold for a social or business project might be for all regions to have at least 100,000 people, or for an ecological project regions should have an area of at least 100 square miles).
Spatial contiguity constraints (W matrix , GAL, GWT formats), or they will be created for you based the shared geographic borders of your areal units.
Time-series signature clustering: not only can areas by clustered by a cross-sectional variable, but also by the correlation of their time-series signatures of the variable.
Non-geographic clustering: In a more general sense, our algorithms can also be extended to cluster non-geographic units based given some sort of a priori spatial (or topological) constraint.
- Create New ESRI shapefiles:
Create new variables, dissolve a map based on a solution from an aggregation variable and export all this new information as a new shapefile with just one command.
- Current algorithms:
Arisel [Duque_Church2004]:
AZP [Openshaw_Rao1995]:
AZP-Simulated Annealing [Openshaw_Rao1995].
AZP-Tabu [Openshaw_Rao1995].
AZP-R-Tabu [Openshaw_Rao1995].
Max-p-regions (Tabu) [Duque_Anselin_Rey2010].
AMOEBA [Alstadt_Getis2006], [Duque_Alstadt_Velasquez_Franco_Betancourt2010].
SOM [Kohonen1990].
geoSOM [Bacao_Lobo_Painho2004].
Random
Help and documentation
Online documentation
Python shell help system
After importing clusterPy you can use the CPhelp command for more information about a function:
To see the help of a class, in this case ‘’Layer’’, type:
import clusterpy clusterpy.CPhelp("Layer")
For a specific function, just type the name of the function
Example 1:
import clusterpy clusterpy.CPhelp("importArcData")
Example 2:
import clusterpy clusterpy.CPhelp("new")
Citing ClusterPy library
Please cite ClusterPy when using the software in your work
Please cite ClusterPy when using the software in your work
Duque, J.C.; Dev, Boris; Betancourt, A.; Franco, J.L. (2011).ClusterPy: Library of spatially constrained clustering algorithms, Version 0.9.9. RiSE-group (Research in Spatial Economics). EAFIT University. http://www.rise-group.org.
A BibTeX entry for LaTeX users is:
@Manual{ClusterPy, title = {ClusterPy: {Library} of spatially constrained clustering algorithms, {Version} 0.9.9.}, author = {Juan C. Duque and Boris Dev and Alejandro Betancourt and Jose L. Franco}, organization = {RiSE-group (Research in Spatial Economics). EAFIT University.}, address = {Colombia}, year = {2011}, url = {http://www.rise-group.org}, }
Bibliography
Openshaw, S. and Rao, L. (1995). Algorithms for reengineering 1991 census geography. Environment and Planning A, 27(3):425-446.
Duque, J.C.; Anselin L., Rey S. (2010). The max-p region problem. Working Paper. GeoDa Center for Geospatial Analysis and Computation.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE.
Khonen, T. (2001). Self-Organizing Maps. (Springer, Eds.) (3rd ed.). Berlin: Springer.
Alstadt, J. and Getis, A. (2006). Using AMOEBA to create a spatial weights matrix and identify spatial clusters. Geographical Analysis, 38(4):327-343
Bação, F.; Lobo, V. and Painho, M. (2004). Geo-self-organizing map (Geo-SOM) for building and exploring homogeneous regions. Geographic Information Science, 22-37.
Schmidt C.; Rey S.; Skupin A. (2010). Effects of Irregular Topology in Spherical Self-Organizing Maps. International Regional Science Review, 34(2)215-229.
Duque, J.C. and Church, R.L. (2004). A new heuristic model for designing analytical regions. In North American Meeting of the Regional Science Association International, Seattle, WA. November.
Duque, J, Ramos, R, and Surinach, J (2007). Supervised regionalization methods: A survey. International Regional Science Review, 30:195-220.
Duque, J. C., Aldstadt, J., Velasquez, E., Franco, J., & Betancourt, A. (2010). A computationally efficient method for delineating irregularly shaped spatial clusters. Journal of Geographical Systems, 1–18. Springer.
Duque, J.C., Church, R.L., and Middleton, R.S. (2011) The p-regions problem. Geographical Analysis, 43(1):104-126.
Gordon, A D (1996). A survey of constrained classification. Computational Statistics & Data Analysis, 21:17-29.
Murtagh, F (1985). A survey of algorithms for contiguity-constrained clustering and related problems. The Computer Journal, 28(1):82-88.
Ord J, Getis A (1995) Local spatial autocorrelation statistics: Distributional issues and application. Geographical Analysis 27(4):286-306
Getis A, Ord J (1992) The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3):189-206
Glover F (1977). Heuristic for integer programming using surrogate constraints. Decision Science 8:156-166.
Battiti R and Tecchiolli G (1994). The reactive tabu search. ORSA J Comput 6(2): 126-140.
Kirkpatrick S, Gelatt CD, Vecchi MP (1983). Optimization by simulated annealing. Science 220:671-680.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file clusterPy-0.9.9.tar.gz
.
File metadata
- Download URL: clusterPy-0.9.9.tar.gz
- Upload date:
- Size: 61.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e46749b806dbf098c78de3c991ba146820ac428ca9ad1882369672076cd4fa2 |
|
MD5 | a08ed6d3bb07f40aedb1d5161a9dc72e |
|
BLAKE2b-256 | 8750f8d16782911034e16261920463c043eacabd7c9f1377669a68d9f8f9b965 |