A Python package for computing geometric/spatial entropy metrics for data in matrix format.
Project description
GeoEntropy: A Python Package for Computing Spatial/Geometric Entropy
GeoEntropy is currently in a very early version. There is no guarantee for the accuracy or correctness of the results. The source code is available on GitHub, meaningful contributions are very welcome :-).
GeoEntropy is a Python package designed to compute various entropy measures for spatial data represented in matrices ( numpy arrays). GeoEntropy is inspired by the R package SpatEntropy by L. Altieri, D. Cocchi, and G. Roli and offers tools for analyzing the entropy of spatial data.
With GeoEntropy, you can easily partition spatial data and compute entropy measures such as Batty's entropy, Shannon's entropy, and more.
Installation
You can install GeoEntropy using pip:
pip install geoentropy
Usage
Convert CSV-Files to a 2D numpy array
The csv_to_matrix function converts multiple CSV files, each representing a different category, into a matrix for
visualization. It processes the coordinates, normalizes them based on the specified cell size, and fills a matrix with
values representing each category. If two points from different CSV files have the same coordinates, the cell size is
first set to one tenth. When min_cell_size is reached, the point from the prioritized CSV file remains in place,
while points from other CSV files with lower priority are randomly moved to one of the neighboring cells in the von
Neumann neighborhood.
Parameters:
file_paths: List or dictionary of file paths. If a list, default priorities are assigned. If a dictionary, it maps file paths to their respective priorities.coordinate_columns: List of two integers specifying the columns in the CSV files that contain the x and y coordinates. Default is [0, 1].max_cell_size: Initial size of the cells in the matrix. Default is 1.min_cell_size: Minimum allowable size of the cells. Default is 0.01.plot_output: Boolean indicating whether to plot the resulting matrix. Default is False.
from geoentropy import csv_to_matrix
import numpy as np
file_paths = ['coordinates_category_1.csv', 'coordinates_category_2.csv']
file_paths_with_priorities = {'coordinates_category_1.csv': 2, 'coordinates_category_2.csv': 1}
data_matrix = csv_to_matrix(file_paths_with_priorities, coordinate_columns=[0, 1], max_cell_size=1, min_cell_size=0.01,
plot_output=True)
print(data_matrix)
Output:
Cell size changed to 0.1 to resolve overlapping points.
[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 1.]]
Spatial Partitioning
The spatial_partition function divides a given 2D data matrix into spatial partitions using Voronoi tessellation. This
helps to analyze the spatial distribution of data by assigning each grid point to a partition based on proximity to
randomly generated or specified partition centers.
Parameters:
data_matrix: A 2D numpy array representing the grid data. The function validates that the input is a 2D matrix.partitions: The number of partitions to create. Can be an integer for random generation or a list of coordinates for specific partition centers. Default is10.cell_size: The size of the cells in the matrix grid. Default is1.window: Optional parameter to specify the observation window as a tuple (min_x, min_y, max_x, max_y). Default isNone.plot_output: Boolean indicating whether to plot the partitioned data overlaid with Voronoi diagrams. Default isTrue.
The function returns a dictionary containing the partition coordinates and the data with assigned partitions, which can be further used for spatial analysis or entropy calculations.
from geoentropy import spatial_partition
import numpy as np
data_matrix = np.array([
[1, 2, 1, 3],
[2, 1, 3, 3],
[1, 1, 2, 2],
[3, 3, 1, 1]
])
result = spatial_partition(data_matrix, partitions=5, cell_size=1, window=None, plot_output=False)
print("Partition Coordinates:\n", result['partition_coordinates'])
print("Data with Partitions:\n", result['data_with_partitions'].head())
Output:
Partition Coordinates:
[[3.2190276 2.7650245 ]
[0.54426262 0.16163646]
[2.66600046 2.83171161]
[3.93007506 2.1637025 ]
[0.84704577 3.96334437]]
Data with Partitions:
x y category partition
0 0.5 0.5 1 2
1 0.5 1.5 2 2
2 0.5 2.5 1 5
3 0.5 3.5 3 5
4 1.5 0.5 2 2
Batty Entropy
The batty function calculates Batty's entropy, a measure of spatial segregation, for a given 2D data matrix. This
entropy measure helps to understand the spatial distribution and organization of a particular category within the
matrix. The function supports
rescaling to handle small area sizes and can optionally visualize the partitioned data.
Parameters:
data_matrix: A 2D numpy array representing the grid data. The function validates that the input is a 2D matrix.category: The category to analyze within the data matrix. Default is1.cell_size: The size of the cells in the matrix for partitioning. Default is1.partitions: The number of partitions to divide the data into. Default is10.window: Optional parameter to specify a window size for partitioning. Default isNone.rescale: Boolean indicating whether to rescale small area sizes to avoid computational issues. Default isTrue.plot_output: Boolean indicating whether to plot the resulting partitions and their distribution. Default isTrue.
from geoentropy import batty
import numpy as np
data_matrix = np.array([
[1, 2, 1, 1],
[1, 1, 2, 2],
[2, 2, 1, 1],
[1, 1, 2, 2]
])
result = batty(data_matrix, category=1, cell_size=1, partitions=4, window=None, rescale=True, plot_output=False)
print("Batty Entropy:", result['batty_entropy'])
print("Entropy Range:", result['entropy_range'])
print("Relative Batty Entropy:", result['relative_batty_entropy'])
Output:
Batty Entropy: 2.7656685561977836
Entropy Range: {'minimum': np.float64(0.6931471805599453), 'maximum': np.float64(2.772588722239781)}
Relative Batty Entropy: 0.9975040776922705
Karlström Entropy
The karlstrom function calculates Karlstrom's entropy, a measure of spatial segregation, for a given 2D data matrix.
This entropy measure helps to understand the spatial distribution and organization of a particular category within the
matrix. The function allows specifying the method for determining neighbors and can optionally visualize the partitioned
data.
Parameters:
data_matrix: A 2D numpy array representing the grid data. The function validates that the input is a 2D matrix.category: The category to analyze within the data matrix. Default is1.cell_size: The size of the cells in the matrix for partitioning. Default is1.partition: The number of partitions to divide the data into. Default is10.observation_window: Optional parameter to specify a window size for partitioning. Default isNone.neighbors: The number of neighbors or distance for determining neighbors. Default is4.method: The method for determining neighbors, either by a specific number ("number") or by a distance ("distance"). Default is"number".plot_output: Boolean indicating whether to plot the resulting partitions and their distribution. Default isTrue.
The function processes the input data matrix, partitions it using Voronoi tessellation, calculates the frequencies and areas of the partitions, and then computes Karlstrom's entropy based on the specified method for determining neighbors. It returns a dictionary containing Karlstrom's entropy, the entropy range, the relative Karlstrom entropy, detailed area data, and partition coordinates. This provides a comprehensive overview of the spatial segregation and distribution of the specified category within the data matrix.
from geoentropy import karlstrom
import numpy as np
data_matrix = np.array([
[1, 2, 1, 1],
[1, 1, 2, 2],
[2, 2, 1, 1],
[1, 1, 2, 2]
])
result = karlstrom(data_matrix, category=1, cell_size=1, partition=4, observation_window=None, neighbors=4,
method="number", plot_output=False)
print("Karlström Entropy:", result['karlstrom_entropy'])
print("Entropy Range:", result['entropy_range'])
print("Relative Karlström Entropy:", result['relative_karlstrom_entropy'])
Output:
Karlström Entropy: 1.324293923495886
Entropy Range: {'minimum': 0, 'maximum': np.float64(2.772588722239781)}
Relative Karlström Entropy: 0.47763806902672573
Leibovici Entropy
The leibovici function calculates Leibovici's entropy, a measure of spatial association, for a given 2D data matrix.
This entropy measure helps to understand the spatial relationships and organization of different categories within the
matrix based on a specified critical distance.
Parameters:
data_matrix: A 2D numpy array representing the grid data. The function validates that the input is a 2D matrix.cell_size: The size of the cells in the matrix. Can be a scalar or an array specifying the size for each dimension. Default is1.critical_distance: The critical distance within which to count adjacent pairs. Default is1.plot_output: Boolean indicating whether to plot the data matrix. Default isTrue.
The function processes the input data matrix, validates the cell size and critical distance, counts adjacent pairs within the specified distance, and calculates Leibovici's entropy. It returns a dictionary containing Leibovici's entropy, the entropy range, the relative Leibovici entropy, and the probability distribution of observed pairs. The function also provides an option to visualize the data matrix, offering a comprehensive view of spatial associations within the data.
from geoentropy import leibovici
import numpy as np
data_matrix = np.array([
[1, 2, 1, np.nan],
[2, 1, np.nan, 2],
[1, 1, 2, 1],
[np.nan, 2, 1, 2]
])
result = leibovici(data_matrix, cell_size=1, critical_distance=2, plot_output=False)
print("Leibovici Entropy:", result['leibovici_entropy'])
print("Entropy Range:", result['entropy_range'])
print("Relative Leibovici Entropy:", result['relative_leibovici_entropy'])
print("Probability Distribution:\n", result['probability_distribution'])
Output:
Leibovici Entropy: 1.3521103558155638
Entropy Range: {'minimum': 0, 'maximum': 1.3862943611198906}
Relative Leibovici Entropy: 0.9753414525348629
Probability Distribution:
pair absolute_frequency relative_frequency
0 1.0-2.0 13 0.333333
1 1.0-1.0 10 0.256410
2 2.0-1.0 10 0.256410
3 2.0-2.0 6 0.153846
O'Neill Entropy
The oneill function calculates O'Neill's entropy, a measure of spatial association, for a given 2D data matrix. This
entropy measure helps to understand the spatial relationships and organization of different categories within the matrix
by analyzing adjacent pairs of data points.
Parameters:
data_matrix: A 2D numpy array representing the grid data. The function validates that the input is a 2D matrix.plot_output: Boolean indicating whether to plot the data matrix. Default isFalse.
The function processes the input data matrix, collects adjacent pairs of data points, and calculates O'Neill's entropy based on the frequency of these pairs. It returns a dictionary containing O'Neill's entropy, the entropy range, the relative O'Neill entropy, and the probability distribution of observed pairs. The function also provides an option to visualize the data matrix, offering a comprehensive view of spatial associations within the data.
from geoentropy import oneill
import numpy as np
data_matrix = np.array([
[1, 2, 1, np.nan],
[2, 1, np.nan, 2],
[1, 1, 2, 1],
[np.nan, 2, 1, 2]
])
result = oneill(data_matrix, plot_output=False)
print("O'Neill Entropy:", result['oneill_entropy'])
print("Entropy Range:", result['entropy_range'])
print("Relative O'Neill Entropy:", result['relative_oneill_entropy'])
print("Probability Distribution:\n", result['probability_distribution'])
Output:
O'Neill Entropy: 0.9743147528693494
Entropy Range: {'minimum': 0, 'maximum': 1.3862943611198906}
Relative O'Neill Entropy: 0.7028195311147832
Probability Distribution:
pair absolute_frequency relative_frequency
0 2.0-1.0 8 0.500
1 1.0-2.0 6 0.375
2 1.0-1.0 2 0.125
Shannon Entropy
The shannon function calculates Shannon's entropy, a measure of information entropy, for a given data matrix. Unlike
other entropy measures in this library, Shannon's entropy does not account for spatial relationships; it simply measures
the uncertainty or diversity of categories within the dataset.
Parameters:
data_matrix: A numpy array representing the data. The function validates that the input is a non-empty numpy array.
The function processes the input data matrix, calculates the probabilities of each category, and computes Shannon's entropy based on these probabilities. It also calculates the variance of the entropy and provides a range for the entropy values. The function returns a dictionary containing Shannon's entropy, the entropy range, the relative Shannon entropy, the probability distribution of categories, and the variance of the entropy. This provides a comprehensive overview of the informational diversity within the data, without considering spatial arrangement.
from geoentropy import shannon
import numpy as np
data_matrix = np.array([
[1, 2, 1, 3],
[2, 1, 3, 3],
[1, 1, 2, 2],
[3, 3, 1, 1]
])
result = shannon(data_matrix)
print("Shannon Entropy:", result['shannon_entropy'])
print("Entropy Range:", result['shannon_entropy_range'])
print("Relative Shannon Entropy:", result['relative_shannon_entropy'])
print("Probability Distribution:\n", result['probability_distribution'])
print("Variance:", result['variance'])
Output:
Shannon Entropy: 1.0717300941124526
Entropy Range: {'minimum': 0, 'maximum': 1.0986122886681098}
Relative Shannon Entropy: 0.9755307720176264
Probability Distribution:
[{'category': np.int64(1), 'absolute_frequency': 7, 'relative_frequency': 0.4375}, {'category': np.int64(2), 'absolute_frequency': 4, 'relative_frequency': 0.25}, {'category': np.int64(3), 'absolute_frequency': 5, 'relative_frequency': 0.3125}]
Variance: 0.05362144899780308
Shannon Z Entropy
The shannon_z function calculates Shannon's entropy for pairs of categories, known as Shannon Z entropy. This measure
extends Shannon's entropy to consider the distribution of pairs of categories within the data. Similar to Shannon's
entropy, Shannon Z entropy does not account for spatial relationships.
Parameters:
data_matrix: A numpy array representing the data. The function validates that the input is a non-empty numpy array.
The function processes the input data matrix, calculates the probabilities of pairs of categories, and computes Shannon Z entropy based on these probabilities. It also calculates the variance of the entropy and provides a range for the entropy values. The function returns a dictionary containing Shannon Z entropy, the entropy range, the relative Shannon Z entropy, the probability distribution of category pairs, and the variance of the entropy. This provides a comprehensive overview of the informational diversity of category pairs within the data, without considering spatial arrangement.
from geoentropy import shannon_z
import numpy as np
data_matrix = np.array([
[1, 2, 1, 3],
[2, 1, 3, 3],
[1, 1, 2, 2],
[3, 3, 1, 1]
])
result = shannon_z(data_matrix)
print("Shannon Entropy Z:", result['shannon_entropy_z'])
print("Entropy Z Range:", result['shannon_entropy_z_range'])
print("Relative Shannon Entropy Z:", result['relative_entropy_z'])
print("Variance:", result['variance'])
print("Pair Probabilities:\n", result['pair_probabilities'])
Output:
Shannon Entropy Z: 1.6594506357352485
Entropy Z Range: {'minimum': 0, 'maximum': 1.791759469228055}
Relative Shannon Entropy Z: 0.9261570340410652
Variance: 0.21318393262341617
Pair Probabilities:
[{'pair': '1-1', 'absolute_frequency': 21, 'relative_frequency': np.float64(0.175)}, {'pair': '1-2', 'absolute_frequency': 28, 'relative_frequency': np.float64(0.23333333333333334)}, {'pair': '1-3', 'absolute_frequency': 35, 'relative_frequency': np.float64(0.2916666666666667)}, {'pair': '2-2', 'absolute_frequency': 6, 'relative_frequency': np.float64(0.05)}, {'pair': '2-3', 'absolute_frequency': 20, 'relative_frequency': np.float64(0.16666666666666666)}, {'pair': '3-3', 'absolute_frequency': 10, 'relative_frequency': np.float64(0.08333333333333333)}]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file GeoEntropy-0.2.0.tar.gz.
File metadata
- Download URL: GeoEntropy-0.2.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5afa89ad164d3126380d29b0c6c8e24af43ecbbf0f51fce1ee5461ee8ab8a2c
|
|
| MD5 |
2ca949209f3a431c0639bb0f0c06727e
|
|
| BLAKE2b-256 |
2eb6f7e623455d2e12d7c48b6e863dc85e204ec502bfd456c14d14062a53bd1a
|
File details
Details for the file GeoEntropy-0.2.0-py3-none-any.whl.
File metadata
- Download URL: GeoEntropy-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e6e9977936335f9a80f82f41f8c26ab830dff0ffdfcf35e4c33f2e029158ed0
|
|
| MD5 |
1d36b714c722c5d846fd8089380f030e
|
|
| BLAKE2b-256 |
99486ac9bc9359aeae37bf0832e064aee8a1e0f198c8e6121ede22e456d4fd0f
|