Skip to main content

No project description provided

Project description

GeoPre: Geospatial Data Processing Toolkit

GeoPre is a Python library designed to streamline common geospatial data operations, offering a unified interface for handling raster and vector datasets. It simplifies preprocessing tasks essential for GIS analysis, machine learning workflows, and remote sensing applications.

Key Features

  • Data Scaling:

    • Normalization (Z-Score) and Min-Max scaling for raster bands.
    • Prepares data for ML models while preserving geospatial metadata.
  • CRS Management:

    • Retrieve and compare Coordinate Reference Systems (CRS) across raster (Rasterio/Xarray) and vector (GeoPandas) datasets.
    • Ensure consistency between datasets with automated CRS checks.
  • Reprojection:

    • Reproject vector data (GeoDataFrames) and raster data (Rasterio/Xarray) to any target CRS.
    • Supports EPSG codes, WKT, and Proj4 strings.
  • No-Data Masking:

    • Handle missing values in raster datasets (NumPy/Xarray) with flexible masking.
    • Integrates seamlessly with raster metadata for error-free workflows.
  • Cloud Masking:

    • Identify and mask clouds in Sentinel-2 and Landsat imagery.
    • Supports multiple methods: QA bands, scene classification layers (SCL), probability bands, and OmniCloudMask AI-based detection.
    • Optionally mask cloud shadows for improved accuracy.
  • Band Stacking:

    • Stack multiple raster bands from a folder into a single multi-band raster for analysis.
    • Supports automatic band detection and resampling for different resolutions.

Supported Data Types

  • Raster: NumPy arrays, Rasterio DatasetReader, Xarray DataArray (via rioxarray).
  • Vector: GeoPandas GeoDataFrame.

Benefits of GeoPre

  • Unified Workflow: Eliminates boilerplate code by providing consistent functions for raster and vector data.
  • Interoperability: Bridges gaps between GeoPandas, Rasterio, and Xarray, ensuring smooth data transitions.
  • Robust Error Handling: Automatically detects CRS mismatches and missing metadata to prevent silent failures.
  • Efficiency: Optimized reprojection and masking operations reduce preprocessing time for large datasets.
  • ML-Ready Outputs: Scaling functions preserve data structure, making outputs directly usable in machine learning pipelines.

Ideal for researchers and developers working with geospatial data, GeoPre enhances productivity by standardizing preprocessing steps and ensuring compatibility across diverse geospatial tools.

Installation

Ensure you have the required dependencies installed before using this library:

pip install numpy geopandas rasterio rioxarray xarray pyproj

Usage

1. Data Scaling

import numpy as np
from scaling_and_reproject import Z_score_scaling, Min_Max_Scaling

data = np.array([[10, 20, 30], [40, 50, 60]])
z_scaled = Z_score_scaling(data)
minmax_scaled = Min_Max_Scaling(data)

2. CRS Management

import geopandas as gpd
import rasterio
from scaling_and_reproject import get_crs, compare_crs

vector = gpd.read_file("data.shp")
raster = rasterio.open("image.tif")

print(get_crs(vector))  # EPSG:4326
print(compare_crs(raster, vector))  # CRS comparison results

3. Reprojection

import rasterio
import xarray as xr
from scaling_and_reproject import reproject_data

# Vector reprojection
reprojected_vector = reproject_data(vector, "EPSG:3857")

# Raster reprojection (Rasterio)
with rasterio.open("input.tif") as src:
    array, metadata = reproject_data(src, "EPSG:32633")

# Xarray reprojection
da = xr.open_rasterio("image.tif")
reprojected_da = reproject_data(da, "EPSG:4326")

4. Data Masking

import xarray as xr
import rasterio
from scaling_and_reproject import mask_raster_data

# Rasterio workflow
with rasterio.open("data.tif") as src:
    data = src.read(1)
    masked, profile = mask_raster_data(data, src.profile)

# rioxarray workflow
da = xr.open_rasterio("data.tif")
masked_da = mask_raster_data(da)

5. Cloud Masking

mask_clouds_S2

Description: Masks clouds and optionally shadows in a Sentinel-2 raster image using various methods.

Parameters:

  • image_path (str): Path to the input raster image.
  • output_path (str, optional): Path to save the masked output raster. Defaults to the same directory as the input with '_masked' appended to the filename.
  • method (str, optional): The method for masking ('auto', 'qa', 'probability', 'omnicloudmask', 'scl', 'standard'). Defaults to 'auto'.
  • mask_shadows (bool): Whether to mask cloud shadows. Defaults to False.
  • threshold (int): Cloud probability threshold (if using a cloud probability band), from 0 to 100. Defaults to 20.
  • nodata_value (int): Value for no-data regions. Defaults to np.nan.

Returns:

  • (str): The path to the saved masked output raster.

Example:

from cloud_masking import mask_clouds_S2

output_s2 = mask_clouds_S2("sentinel2_image.tif", method='auto', mask_shadows=True)

mask_clouds_landsat

Description: Masks clouds and optionally shadows in a Landsat raster image using various methods.

Parameters:

  • image_path (str): Path to the input multi-band raster image.
  • output_path (str, optional): Path to save the masked output raster. Defaults to the same directory as the input with '_masked' suffix.
  • method (str): The method for masking ('auto', 'qa', 'omnicloudmask'). Defaults to 'auto'.
  • mask_shadows (bool): Whether to mask cloud shadows. Defaults to False.
  • nodata_value (int): Value for no-data regions. Defaults to np.nan.

Returns:

  • (str): The path to the saved masked output raster.

Example:

from cloud_masking import mask_clouds_landsat

output_landsat = mask_clouds_landsat("landsat_image.tif", method='auto', mask_shadows=True)

6. Band Stacking

stack_bands

Description: Stacks multiple raster bands from a folder into a single multi-band raster.

Parameters:

  • input_path (str or Path): Path to the folder containing band files.
  • required_bands (list of str): List of band name identifiers (e.g., ["B4", "B3", "B2"]).
  • output_path (str or Path, optional): Path to save the stacked raster. Defaults to "stacked.tif" in the input folder.
  • resolution (float, optional): Target resolution for resampling. Defaults to the highest available resolution.

Returns:

  • (str): The path to the saved stacked output raster.

Example:

from stacking import stack_bands

stacked_image = stack_bands("/path/to/folder/containing/bands", ["B4", "B3", "B2"])

Contributing

  1. Fork the repository

    Click the "Fork" button at the top-right of this repository to create your copy.

  2. Create your feature branch

    git checkout -b feature/your-feature
    
  3. Commit changes

    git commit -am 'Add some feature'
    
  4. Push to branch

    git push origin feature/your-feature
    
  5. Open a Pull Request

    Navigate to the Pull Requests tab in the original repository and click "New Pull Request" to submit your changes.

License

This project is licensed under the MIT License. See LICENSE for more information.

Author

[Your Name] – [Your Email or GitHub Profile]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geopreprova1-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geopreprova1-0.1.1-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file geopreprova1-0.1.1.tar.gz.

File metadata

  • Download URL: geopreprova1-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for geopreprova1-0.1.1.tar.gz
Algorithm Hash digest
SHA256 eab0f91d19cb52e27bd199f58d82541022012139d4c133e87b90487eeb65498c
MD5 26730ff1e06bca77435a631e9f56bea5
BLAKE2b-256 40f2e78a1b811a41a64ace1d793bdc7bc5c137a5120538a1cf13a65759c33009

See more details on using hashes here.

File details

Details for the file geopreprova1-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: geopreprova1-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for geopreprova1-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e0300a4913ba03a0267b96e64a5234e219cb72c6f79ec63a7689b8c9c8f8ac1f
MD5 01b20d2719b52112beaeb49c45fbfce6
BLAKE2b-256 3e8e5242990aa74c087424062e1682b743547daed354108f06814a0e76b6a3b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page