Tool for creating patches from geo-referenced and non geo-referenced image and label pairs

These details have not been verified by PyPI

Project links

Homepage

Project description

GeoTIFF Tiler

A Python package for creating training patches from geospatial imagery and label pairs for machine learning applications.

Overview

GeoTIFF Tiler is designed to streamline the creation of training data patches from geo-referenced and non-geo-referenced image and label pairs. It helps prepare data for machine learning models requiring consistent input dimensions, particularly for geospatial applications. The library supports modern cloud-native geospatial workflows and provides robust data management capabilities.

Features

Multi-format Input Support:
- Images: GeoTIFFs (geo-referenced and non-geo-referenced), STAC imagery, cloud-optimized GeoTIFFs
- Labels: GeoTIFFs (geo-referenced and non-geo-referenced), vector data (.geojson, .gpkg, .shp)
WebDataset Output Format: Efficient sharded format for distributed training
Intelligent Data Splitting:
- Spatial validation splitting with configurable grid-based selection
- Class-balanced validation sets with customizable weights
- Automatic handling of train/validation/test splits
Advanced Patch Management:
- Intelligent patch filtering based on label content and thresholds
- Padding for edge patches to maintain consistent dimensions
- Automatic handling of CRS and alignment issues
- Memory-efficient processing with resource management
STAC Integration: Native support for SpatioTemporal Asset Catalog items
Robust Processing:
- Resumable operations with comprehensive manifest system
- Automatic retry mechanisms for failed processing
- Progress tracking and detailed logging
- Memory monitoring and optimization
Multi-sensor Support:
- Band selection and mapping for different sensor types
- Automatic normalization statistics calculation
Quality Assessment Tools:
- WebDataset-compatible visualization functions
- Automatic visualization generation during processing
- Dataset summary visualizations across splits
- Per-image patch visualization with metadata
- Class distribution analysis and validation reports

Installation

pip install geotiff-tiler

Quick Start

Basic Usage

from geotiff_tiler.tiler import Tiler

# Define your image-label pairs with metadata
data = [{
    "image": "./path/to/image.tif",
    "label": "./path/to/label.tif",
    "metadata": {"collection": "satellite-name", "gsd": 0.5}
}]

# Initialize the tiler with your configuration
tiler = Tiler(
    input_dict=data,
    patch_size=(256, 256),                            # Height, Width
    bands_requested=["red", "green", "blue", "nir"],  # Band selection
    stride=128,                                       # Overlap between patches
    discard_empty=True,                               # Skip patches with no labels
    label_threshold=0.05,                             # Minimum non-zero label coverage
    output_dir='./output/patches',
    prefix='dataset_v1'                               # Dataset identifier
)

# Create the patches
tiler.create_tiles()

STAC Integration

The library supports STAC (SpatioTemporal Asset Catalog) items, making it compatible with cloud-native geospatial workflows:

# Using STAC items directly
data = [{
    "image": "https://stac-api.example.com/collections/sentinel-2/items/item-id",
    "label": "./path/to/label.geojson",
    "metadata": {"collection": "sentinel-2", "gsd": 10.0}
}]

tiler = Tiler(
    input_dict=data,
    patch_size=(512, 512),
    bands_requested=["red", "green", "blue", "nir"],
    attr_field=["class"],       # Vector label field
    attr_values=[1, 2, 3, 4],   # Class values to extract
    output_dir='./stac_patches'
)

Advanced Configuration

tiler = Tiler(
    input_dict=data,
    patch_size=(1024, 1024),
    bands_requested=["red", "green", "blue", "nir"],
    stride=512,
    
    # Validation splitting parameters
    grid_size=8,                    # Spatial grid for validation selection
    val_ratio=0.2,                  # 20% for validation
    class_balance_weight=0.6,       # Weight for class balance in validation
    spatial_weight=0.4,             # Weight for spatial coverage in validation
    
    # Label processing
    attr_field=["class", "category"],  # Fields in vector data
    attr_values=[1, 2, 3, 4],          # Values to extract
    class_ids={                        # Custom class mapping
        'background': 0,
        'water': 1,
        'vegetation': 2,
        'urban': 3,
        'bare_soil': 4
    },
    
    # Quality control
    discard_empty=True,
    label_threshold=0.1,            # 10% minimum label coverage
    
    # Output configuration
    prefix='landcover_v1',
    output_dir='./datasets/landcover'
)

result = tiler.create_tiles()
print(f"Processing complete: {result}")

Parameters

Core Parameters

input_dict: List of dictionaries with "image", "label", and "metadata" keys
patch_size: Tuple of (height, width) for the output patches
bands_requested: List of band names to extract (e.g., ["red", "green", "blue", "nir"])
stride: Spacing between patches (determines overlap); if None, uses max(patch_size)
output_dir: Directory to save the output patches
prefix: Dataset identifier for output files
create_viz: Whether to automatically create visualizations for completed images (default: False)

Label Processing

attr_field: Field name(s) in vector data to use for labeling (list of strings)
attr_values: Values to extract from the attribute field (list of strings or numbers)
class_ids: Dictionary mapping class names to numeric IDs
discard_empty: Whether to skip patches with no labels
label_threshold: Minimum fraction of non-zero pixels required in a label patch

Validation Splitting

grid_size: Size of spatial grid for validation selection (default: 4)
val_ratio: Fraction of data to use for validation (default: 0.2)
class_balance_weight: Weight for class balance in validation selection (default: 0.5)
spatial_weight: Weight for spatial coverage in validation selection (default: 0.5)

Output Format

The tiler creates datasets in WebDataset format with the following structure:

output_dir/
├── prefix_manifest.json           # Processing manifest and statistics
├── normalization_stats.json       # Band statistics for normalization
├── prefix/
│   ├── trn/                       # Training shards
│   │   ├── prefix_000000.tar
│   │   ├── prefix_000001.tar
│   │   └── ...
│   ├── val/                       # Validation shards
│   │   ├── prefix_000000.tar
│   │   └── ...
│   ├── tst/                       # Test shards (if applicable)
│   │   └── ...
│   └── viz/                       # Visualization outputs (if create_viz=True)
│       ├── trn/
│       │   ├── image1_trn.png
│       │   └── ...
│       └── val/
│           ├── image1_val.png
│           └── ...

Each shard contains:

Image patches: .npy format
Label patches: .npy format
Metadata: .json format with spatial and processing information

Advanced Features

Resumable Operations

The tiler automatically saves progress and can resume interrupted processing:

# If processing is interrupted, simply run again with the same configuration
tiler = Tiler(input_dict=data, output_dir='./output', prefix='dataset_v1')
tiler.create_tiles()  # Will automatically resume from where it left off

Retry Failed Images

# Retry processing of failed images
retry_results = tiler.retry_failed_images(max_retries=3)
print(f"Retry results: {retry_results}")

Export Statistics

# Export normalization statistics for model training
tiler.export_normalization_stats('./model_stats.json')

Visualization

from geotiff_tiler.utils.visualization import visualize_webdataset_patches, create_dataset_summary_visualization

# Visualize patches from a specific image
visualize_webdataset_patches(
    dataset_dir='./output',
    prefix='dataset_v1',
    split='trn',
    image_name='specific_image_name',  # Optional: target specific image
    n_samples=5,
    save_path='./visualization.png'
)

# Create comprehensive dataset summary across all splits
create_dataset_summary_visualization(
    output_dir='./output',
    prefix='dataset_v1',
    samples_per_split=6,
    images_per_split=3
)

# Enable automatic visualization during processing
tiler = Tiler(
    input_dict=data,
    patch_size=(256, 256),
    create_viz=True,  # Automatically create visualizations for completed images
    output_dir='./output'
)

Requirements

See requirements.txt for complete dependency list.

License

MIT License

Author

Victor Alhassan (victor.alhassan@nrcan-rncan.gc.ca)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.2

May 30, 2025

1.1.1

May 30, 2025

1.1.0

May 29, 2025

This version

1.0.0

May 27, 2025

0.1.6

Apr 29, 2025

0.1.5

Apr 18, 2025

0.1.4

Apr 15, 2025

0.1.3

Apr 14, 2025

0.1.2

Apr 10, 2025

0.1.1

Apr 10, 2025

0.1.0

Apr 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geotiff_tiler-1.0.0.tar.gz (39.8 kB view details)

Uploaded May 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

geotiff_tiler-1.0.0-py3-none-any.whl (40.2 kB view details)

Uploaded May 27, 2025 Python 3

File details

Details for the file geotiff_tiler-1.0.0.tar.gz.

File metadata

Download URL: geotiff_tiler-1.0.0.tar.gz
Upload date: May 27, 2025
Size: 39.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.5

File hashes

Hashes for geotiff_tiler-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f02de73c99b75077f5351b3d17bb838f4d0b2cd32631c7fc1d7fa566df48cd57`
MD5	`dcacbc05e3eba4477dcf697bb7eee034`
BLAKE2b-256	`9fc7155c8f040f7063241d7f02f64ba742ca6fe874c654bed492c1f4b9636019`

See more details on using hashes here.

File details

Details for the file geotiff_tiler-1.0.0-py3-none-any.whl.

File metadata

Download URL: geotiff_tiler-1.0.0-py3-none-any.whl
Upload date: May 27, 2025
Size: 40.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.5

File hashes

Hashes for geotiff_tiler-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8598b523157196c9452c6146c77a962a12fd4fcabaa0726192f4c296953ab6d9`
MD5	`19fd571019edf668f3e6c44cc20e75d1`
BLAKE2b-256	`342b9e003f71ffd7155c3aaee06a4587c1839d2edee32b58b5e37260b31091c2`

See more details on using hashes here.

geotiff-tiler 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GeoTIFF Tiler

Overview

Features

Installation

Quick Start

Basic Usage

STAC Integration

Advanced Configuration

Parameters

Core Parameters

Label Processing

Validation Splitting

Output Format

Advanced Features

Resumable Operations

Retry Failed Images

Export Statistics

Visualization

Requirements

License

Author

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes