Quickly and easily aggregate point data on customisable 2D grids.
Project description
PyGridAgg
PyGridAgg is a lightweight Python package that allows you to easily aggregate point data on 2D grids. It includes efficient built-in aggregation schemes that can process large point datasets quickly. Defining grid layouts is also simple through several alternative grid constructors. While originally developed for geo-data analysis, PyGridAgg only depends on numpy and requires no GIS toolchain.
Installation
PyGridAgg is available on PyPI and can be
installed using pip:
pip install pygridagg
Quickstart
import matplotlib.pyplot as plt
import pygridagg as pga
from pygridagg.examples import load_japanese_earthquake_data
# Load example data on earthquakes around Japan
quake_coords, magnitudes = load_japanese_earthquake_data()
# Define a square grid layout with 10k cells encompassing all earthquake locations
layout = pga.SquareGridLayout.from_points(quake_coords, num_cells=100**2)
# Count earthquakes across grid cells
agg_counts = pga.CountAggregator(layout, quake_coords)
# Show a heatmap
agg_counts.plot(title="Earthquakes around Japan (2010-2023)")
plt.show()
Simple and fast
For performance, all built-in point aggregators leverage in-place operations via
np.ufunc.at. In the timed example below,
10 million random points are aggregated on a grid with 250,000 cells. For illustration, points are aggregated
using a weighted average, with point weights being assigned as a function of position:
import time
import numpy as np
import matplotlib.pyplot as plt
import pygridagg as pga
# Define a grid layout on the unit square
bbox = 0, 1, 0, 1 # (x_min, x_max, y_min, y_max)
layout = pga.SquareGridLayout(*bbox, num_cells=500 ** 2)
# Generate random points, assign point weights in a smooth, periodic pattern
N, freq = 10_000_000, 50
rand_coords = np.random.randn(N, 2) * 0.1 + 0.5
rand_weights = np.sin(freq * rand_coords[:, 0]) * np.cos(freq * rand_coords[:, 1])
# Time the data aggregation
start_time = time.time()
agg = pga.WeightedAverageAggregator(
layout, rand_coords,
point_weights=rand_weights,
)
elapsed_time = time.time() - start_time
print(f"Execution time: {elapsed_time:.f} seconds")
# Show a heatmap
agg.plot()
plt.show()
Further details
Defining grid layouts
You can choose between two different grid layouts:
-
SquareGridLayout: Is restricted to have the same width and height, as well as the same number of columns and rows.
-
FlexibleGridLayout: Allows you to independently set the grid's width and height, as well as the number of columns and rows.
When defining either grid layout, you can set the grid bounds by passing any of the following:
- a bounding box (via the default
__init__of both layout classes); - the desired centre coordinate and side dimensions of the grid (using the
from_centroidconstructor); - a collection of template points from which grid limits are inferred (using the
from_pointsconstructor).
Built-in point aggregators
The following aggregator classes are currently available:
-
CountAggregator: Simply counts the number of points in each grid cell.
-
WeightedSumAggregator and WeightedAverageAggregator: Compute a weighted sum or weighted average of points in each cell (given an array of aggregation weights).
-
MinimumWeightAggregator and MaximumWeightAggregator: Compute the minimum or maximum weight of points in each grid cell (given an array of aggregation weights).
Out-of-bounds points
Points outside the grid bounds do not affect the data aggregation. However, aggregator
classes will issue a warning when out-of-bounds points are present. To silence this warning,
set warn_out_of_bounds=True when instantiating an aggregator class.
Column and row indexes
To access the column and row indexes of points, use the grid_col_ids and row_col_ids attributes
of an aggregator instance. Points located outside the grid bounds receive a column and row index of -1.
Coordinate Reference Systems?
PyGridAgg aims to be as lightweight as possible and does not depend on GIS libraries like
pyproj
or geopandas. As such, you need to handle transformations between coordinate
reference systems yourself. The package performs no cheks to see whether provided coordinates for points and grid layouts are valid.
Implementing custom aggregators
You can define your own data aggregators by inheriting from BasePointAggregator and implementing the aggregate function.
The example below illustrates this with a custom aggregator class that only counts points inside a grid cell if an associated
point weight is above a threshold value.
import numpy as np
import pygridagg as pga
from pygridagg.examples import load_japanese_earthquake_data
class CustomThresholdCounter(pga.BasePointAggregator):
"""Counts the number of points whose weight is above a threshold."""
def aggregate(self, point_weights, threshold):
# Initialise grid counts with zeroes
counts = np.full(self.layout.shape, fill_value=0, dtype=int)
# Select the column and row indexes of eligible points.
# `self.inside_mask` is True for points inside the grid bounds.
point_mask = self.inside_mask & (point_weights > threshold)
col_ids = self.grid_col_ids[point_mask]
row_ids = self.grid_row_ids[point_mask]
# Use `np.add.at` for fast in-place addition
np.add.at(counts, (row_ids, col_ids), 1)
# Note: Returned array must always have shape (rows, columns)
return counts
quake_coords, magnitudes = load_japanese_earthquake_data()
layout = pga.SquareGridLayout.from_points(quake_coords, num_cells=2_500)
# Only count earthquakes above magnitude 6
thresh = 6
agg = CustomThresholdCounter(layout, quake_coords, point_weights=magnitudes, threshold=thresh)
# Check that no earthquakes were 'lost'
assert agg.cell_aggregates.sum() == (magnitudes > thresh).sum()
# Show counts of major earthquakes with a heatmap
ax = agg.plot()
Requirements
numpymatplotlib
License
This project is licensed under the MIT License. See LICENSE.txt for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file PyGridAgg-0.1.1.tar.gz.
File metadata
- Download URL: PyGridAgg-0.1.1.tar.gz
- Upload date:
- Size: 172.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a751f7534fc34f76eaaaf6dcd5470e6e43406c8cf4fab97935a32bf8882e2ff9
|
|
| MD5 |
e59ae42e81d0cab0ee34ddce4932171f
|
|
| BLAKE2b-256 |
613f6df1fc0bfc705a7b7da95bdcfb367366e19cfbc48517ed691a8aaaf689df
|
File details
Details for the file PyGridAgg-0.1.1-py3-none-any.whl.
File metadata
- Download URL: PyGridAgg-0.1.1-py3-none-any.whl
- Upload date:
- Size: 177.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca093bef3bbbff2c5f52f0ceab8ddff6c59ef69d73fb0c2911a02031ecf8f618
|
|
| MD5 |
5e87fad1fd515fe53534d408e8e45426
|
|
| BLAKE2b-256 |
e735f5864c226de80b4e01fbb0044289cc90abccc142272ed9bfa15dc27841db
|