Skip to main content

A map binning tool for spatial resampling

Project description

Map Binning Tool

Python Version License: MIT CI/CD Pipeline codecov

A Python package for spatial resampling and binning of geospatial data, specifically designed for oceanographic datasets. This tool enables efficient downsampling of high-resolution gridded data onto coarser grids while preserving spatial accuracy through intelligent neighborhood averaging.

Overview

The Map Binning Tool provides a robust solution for spatial data aggregation, particularly useful for:

  • Downsampling high-resolution oceanographic data (e.g., sea level anomaly, ocean currents)
  • Creating consistent multi-resolution datasets
  • Reducing computational load while maintaining spatial representativeness
  • Processing time-series of gridded data efficiently

The package uses k-d tree algorithms for fast spatial queries and supports both in-memory processing and persistent caching of spatial indices for repeated operations.

Key Features

  • Efficient Spatial Binning: Uses scipy's cKDTree for fast nearest-neighbor searches
  • Flexible Grid Support: Works with any xarray-compatible gridded dataset
  • Automatic Radius Calculation: Intelligently determines search radius based on target grid spacing
  • Persistent Caching: Save and reuse spatial indices using pickle serialization
  • Time Series Support: Handles datasets with temporal dimensions
  • Memory Efficient: Processes large datasets without excessive memory usage
  • Oceanographic Focus: Optimized for CMEMS and similar oceanographic data formats

Installation

From conda-forge (Recommended)

conda install -c conda-forge map-binning

From PyPI

pip install map-binning

With optional dependencies for development

pip install map-binning[dev]

Developer Installation

Using conda environment

# Create and activate conda environment
conda env create -f environment.yml
conda activate map-binning

# Install the package in development mode
pip install -e .

From source

git clone <repository-url>
cd map_binning
pip install -e .

Quick Start

Basic Usage

import xarray as xr
from map_binning import Binning

# Load your datasets
ds_high = xr.open_dataset('high_resolution_data.nc')
ds_low = xr.open_dataset('low_resolution_grid.nc')

# Initialize the binning tool
binning = Binning(
    ds_high=ds_high,
    ds_low=ds_low,
    var_name='sla',  # variable in the dataset to bin (e.g., sea level anomaly)
    xdim_name='longitude',  # longitude dimension name
    ydim_name='latitude',   # latitude dimension name
    search_radius=0.1  # optional: search radius in degrees
)

# Perform binning
result = binning.mean_binning()

Advanced Usage with Caching

# Create binning index and save it for reuse
result = binning.mean_binning(
    precomputed_binning_index=False,
    pickle_filename="my_binning_index.pkl",
    pickle_location="./cache"
)

# Reuse the saved index for subsequent operations
result = binning.mean_binning(
    precomputed_binning_index=True,
    pickle_filename="my_binning_index.pkl",
    pickle_location="./cache"
)

API Reference

Binning Class

Constructor Parameters

  • ds_high (xr.Dataset): High-resolution source dataset
  • ds_low (xr.Dataset): Low-resolution target grid dataset
  • var_name (str): Name of the variable to bin
  • xdim_name (str, optional): Longitude dimension name (default: 'lon')
  • ydim_name (str, optional): Latitude dimension name (default: 'lat')
  • search_radius (float, optional): Search radius in degrees (auto-calculated if None)

Methods

create_binning_index() Creates a spatial mapping between high and low resolution grids.

mean_binning(precomputed_binning_index=False, pickle_filename=None, pickle_location=None) Performs spatial binning using mean aggregation.

Parameters:

  • precomputed_binning_index (bool): Use pre-saved spatial index
  • pickle_filename (str): Filename for saving/loading spatial index
  • pickle_location (str): Directory path for pickle files

Returns: xr.DataArray with binned data on the target grid

Project Structure

map_binning/
├── map_binning/           # Main package directory
│   ├── __init__.py        # Package initialization
│   ├── binning.py         # Core binning algorithms
│   ├── index_store.py     # Pickle serialization utilities
│   └── main.py            # Command-line interface
├── notebooks/             # Jupyter notebooks for examples
│   └── cmems_nrt_coastal_bin.ipynb
├── tests/                 # Unit tests
│   ├── __init__.py
│   └── test_main.py
├── pickle_folder/         # Default location for cached indices
├── pyproject.toml         # Project configuration
├── environment.yml        # Conda environment specification
├── .env.template          # Environment variables template
└── README.md              # This file

Configuration for CMEMS data download

Environment Variables

Copy .env.template to .env and configure:

# Copernicus Marine Service credentials (if using CMEMS data)
COPERNICUSMARINE_SERVICE_USERNAME=<your_username>
COPERNICUSMARINE_SERVICE_PASSWORD=<your_password>

Dependencies

Core dependencies:

  • numpy: Numerical computing
  • scipy: Scientific computing (k-d tree algorithms)
  • xarray: Labeled multi-dimensional arrays
  • netcdf4: NetCDF file I/O
  • python-dotenv: Environment variable management

Development dependencies:

  • pytest: Unit testing framework
  • black: Code formatting
  • flake8: Code linting
  • mypy: Static type checking

Examples

Working with CMEMS Data

import xarray as xr
from map_binning import Binning

# Example with Copernicus Marine data
ds_high = xr.open_dataset('cmems_high_res_sla.nc')
ds_low = xr.open_dataset('cmems_low_res_grid.nc')

# Initialize for sea level anomaly processing
sla_binning = Binning(
    ds_high=ds_high,
    ds_low=ds_low,
    var_name='sla',
    xdim_name='longitude',
    ydim_name='latitude'
)

# Process and cache the result
binned_sla = sla_binning.mean_binning(
    pickle_filename="cmems_sla_index.pkl",
    pickle_location="./cache"
)

# Save the result
binned_sla.to_netcdf('binned_sla_data.nc')

Time Series Processing

The tool automatically handles time dimensions:

# Works seamlessly with time-varying datasets
# Input: (time, lat, lon) -> Output: (time, lat_low, lon_low)
result = binning.mean_binning()

Performance Considerations

  • Memory Usage: The tool processes data in chunks and uses efficient numpy operations
  • Spatial Index Caching: Save computed spatial indices to avoid recalculation
  • Grid Resolution: Performance scales with the product of grid sizes
  • Search Radius: Smaller radii improve performance but may miss relevant data points

Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Run the test suite (pytest)
  5. Format your code (black map_binning/)
  6. Submit a pull request

Development Setup

# Clone and setup development environment
git clone <repository-url>
cd map-binning-project
conda env create -f environment.yml
conda activate map-binning
pip install -e .[dev]

# Run tests
pytest

# Format code
black map_binning/

# Type checking
mypy map_binning/

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=map_binning

# Run specific test file
pytest tests/test_main.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this tool in your research, please cite:

@software{map_binning_2024,
  author = {Chia-Wei Hsu},
  title = {Map Binning Tool: Spatial Resampling for Oceanographic Data},
  url = {https://github.com/chiaweh2/map_binning},
  version = {0.1.0},
  year = {2024}
}

Support

  • Issues: Please report bugs and feature requests via GitHub Issues
  • Documentation: Additional examples available in the notebooks/ directory
  • Contact: Chia-Wei Hsu (chiaweh2@uci.edu)

Acknowledgments

  • Built with support for Copernicus Marine Environment Monitoring Service (CMEMS) data
  • Utilizes scipy's efficient spatial algorithms
  • Designed for the oceanographic research community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

map_binning-0.3.2.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

map_binning-0.3.2-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file map_binning-0.3.2.tar.gz.

File metadata

  • Download URL: map_binning-0.3.2.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for map_binning-0.3.2.tar.gz
Algorithm Hash digest
SHA256 933760d9eb061176838b5ba41f4ffa95b6e41432569c4ceab3ab0c32381742d1
MD5 d422285d43436558f318c6f6a275344f
BLAKE2b-256 7bcbcca771c16b9db35fafc94fc478dd1bdc0ea017bf9e94eb51bbff71544bd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for map_binning-0.3.2.tar.gz:

Publisher: release.yml on chiaweh2/map_binning

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file map_binning-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: map_binning-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for map_binning-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 20764505340daa79012cef134be02f897f9d86a7b5d8add567253ebcec98a333
MD5 34cbd31186d1a0ac236d018f8eea7281
BLAKE2b-256 d14e990913132bdd211a7c79a2ea6b0b3cb80c9305403425b594c28e8de52aa3

See more details on using hashes here.

Provenance

The following attestation bundles were made for map_binning-0.3.2-py3-none-any.whl:

Publisher: release.yml on chiaweh2/map_binning

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page