Skip to main content

Quickly and easily aggregate point data on customisable 2D grids.

Project description

PyGridAgg

PyGridAgg is a lightweight Python package that allows you to easily aggregate point data on 2D grids. It includes efficient built-in aggregation schemes that can process large point datasets quickly. Defining grid layouts is also simple through several alternative grid constructors. While originally developed for geo-data analysis, PyGridAgg only depends on numpy and requires no GIS toolchain.

Installation

PyGridAgg is available on PyPI and can be installed using pip:

pip install pygridagg

Quickstart

import matplotlib.pyplot as plt

import pygridagg as pga
from pygridagg.examples import load_japanese_earthquake_data

# Load example data on earthquakes around Japan
quake_coords, magnitudes = load_japanese_earthquake_data()

# Define a square grid layout with 10k cells encompassing all earthquake locations
layout = pga.SquareGridLayout.from_points(quake_coords, num_cells=100**2)

# Count earthquakes across grid cells
agg_counts = pga.CountAggregator(layout, quake_coords)

# Show a heatmap
agg_counts.plot(title="Earthquakes around Japan (2010-2023)")
plt.show()

Simple and fast

For performance, all built-in point aggregators leverage in-place operations via np.ufunc.at. In the timed example below, 10 million random points are aggregated on a grid with 250,000 cells. For illustration, points are aggregated using a weighted average, with point weights being assigned as a function of position:

import time
import numpy as np
import matplotlib.pyplot as plt

import pygridagg as pga

# Define a grid layout on the unit square
bbox = 0, 1, 0, 1  # (x_min, x_max, y_min, y_max)
layout = pga.SquareGridLayout(*bbox, num_cells=500 ** 2)

# Generate random points, assign point weights in a smooth, periodic pattern 
N, freq = 10_000_000, 50
rand_coords = np.random.randn(N, 2) * 0.1 + 0.5
rand_weights = np.sin(freq * rand_coords[:, 0]) * np.cos(freq * rand_coords[:, 1])

# Time the data aggregation
start_time = time.time()
agg = pga.WeightedAverageAggregator(
    layout, rand_coords,
    point_weights=rand_weights,
)
elapsed_time = time.time() - start_time
print(f"Execution time: {elapsed_time:.f} seconds")

# Show a heatmap
agg.plot()
plt.show()

Further details

Defining grid layouts

You can choose between two different grid layouts:

  • SquareGridLayout: Is restricted to have the same width and height, as well as the same number of columns and rows.

  • FlexibleGridLayout: Allows you to independently set the grid's width and height, as well as the number of columns and rows.

When defining either grid layout, you can set the grid bounds by passing any of the following:

  • a bounding box (via the default __init__ of both layout classes);
  • the desired centre coordinate and side dimensions of the grid (using the from_centroid constructor);
  • a collection of template points from which grid limits are inferred (using the from_points constructor).

Built-in point aggregators

The following aggregator classes are currently available:

  • CountAggregator: Simply counts the number of points in each grid cell.

  • WeightedSumAggregator and WeightedAverageAggregator: Compute a weighted sum or weighted average of points in each cell (given an array of aggregation weights).

  • MinimumWeightAggregator and MaximumWeightAggregator: Compute the minimum or maximum weight of points in each grid cell (given an array of aggregation weights).

Out-of-bounds points

Points outside the grid bounds do not affect the data aggregation. However, aggregator classes will issue a warning when out-of-bounds points are present. To silence this warning, set warn_out_of_bounds=True when instantiating an aggregator class.

Column and row indexes

To access the column and row indexes of points, use the grid_col_ids and row_col_ids attributes of an aggregator instance. Points located outside the grid bounds receive a column and row index of -1.

Coordinate Reference Systems?

PyGridAgg aims to be as lightweight as possible and does not depend on GIS libraries like pyproj or geopandas. As such, you need to handle transformations between coordinate reference systems yourself. The package performs no cheks to see whether provided coordinates for points and grid layouts are valid.

Implementing custom aggregators

You can define your own data aggregators by inheriting from BasePointAggregator and implementing the aggregate function. The example below illustrates this with a custom aggregator class that only counts points inside a grid cell if an associated point weight is above a threshold value.

import numpy as np

import pygridagg as pga
from pygridagg.examples import load_japanese_earthquake_data


class CustomThresholdCounter(pga.BasePointAggregator):
    """Counts the number of points whose weight is above a threshold."""

    def aggregate(self, point_weights, threshold):
        # Initialise grid counts with zeroes
        counts = np.full(self.layout.shape, fill_value=0, dtype=int)

        # Select the column and row indexes of eligible points.
        # `self.inside_mask` is True for points inside the grid bounds.
        point_mask = self.inside_mask & (point_weights > threshold)
        col_ids = self.grid_col_ids[point_mask]
        row_ids = self.grid_row_ids[point_mask]

        # Use `np.add.at` for fast in-place addition
        np.add.at(counts, (row_ids, col_ids), 1)

        # Note: Returned array must always have shape (rows, columns)
        return counts


quake_coords, magnitudes = load_japanese_earthquake_data()
layout = pga.SquareGridLayout.from_points(quake_coords, num_cells=2_500)

# Only count earthquakes above magnitude 6
thresh = 6
agg = CustomThresholdCounter(layout, quake_coords, point_weights=magnitudes, threshold=thresh)

# Check that no earthquakes were 'lost'
assert agg.cell_aggregates.sum() == (magnitudes > thresh).sum()

# Show counts of major earthquakes with a heatmap
ax = agg.plot()

Requirements

  • numpy
  • matplotlib

License

This project is licensed under the MIT License. See LICENSE.txt for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyGridAgg-0.1.1.tar.gz (172.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyGridAgg-0.1.1-py3-none-any.whl (177.7 kB view details)

Uploaded Python 3

File details

Details for the file PyGridAgg-0.1.1.tar.gz.

File metadata

  • Download URL: PyGridAgg-0.1.1.tar.gz
  • Upload date:
  • Size: 172.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for PyGridAgg-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a751f7534fc34f76eaaaf6dcd5470e6e43406c8cf4fab97935a32bf8882e2ff9
MD5 e59ae42e81d0cab0ee34ddce4932171f
BLAKE2b-256 613f6df1fc0bfc705a7b7da95bdcfb367366e19cfbc48517ed691a8aaaf689df

See more details on using hashes here.

File details

Details for the file PyGridAgg-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: PyGridAgg-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 177.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.9.6 readme-renderer/27.0 requests/2.28.1 requests-toolbelt/1.0.0 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/4.8.1 keyring/23.2.1 rfc3986/2.0.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for PyGridAgg-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca093bef3bbbff2c5f52f0ceab8ddff6c59ef69d73fb0c2911a02031ecf8f618
MD5 5e87fad1fd515fe53534d408e8e45426
BLAKE2b-256 e735f5864c226de80b4e01fbb0044289cc90abccc142272ed9bfa15dc27841db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page