Skip to main content

A package for satellite image AI data prep

Project description

SatChip

A package for satellite image AI data prep. This package "chips" data labels and satellite imagery into 264x264 image arrays following the TerraMind extension of the MajorTom specification.

Usage

SatChip relies on a two-step process; chip your label train data inputs, then create corresponding chips for different remote sensing data sources.

Step 1: Chip labels

The chiplabel CLI tool takes a GDAL-compatible image, a collection date, and an optional output directory as input using the following format:

chiplabel PATH/TO/LABELS.tif DATE(UTC FORMAT) --outdir OUTPUT_DIR

For example:

chiplabel LA_damage_20250113_v0.tif 2024-01-01T01:01:01 --outdir chips

This will produce an output zipped Zarr store label dataset with the name {LABELS}.zarr.zip in the specified output directory (--outdir). This file will be the input to the remote sensing data chipping step.

For more information on usage see chiplabel --help

Step 2: Chip remote sensing data

The chipdata CLI tool takes a label zipped Zarr store, a dataset name, a date range and a set of optional parameters using the following format:

chipdata PATH/TO/LABELS.zarr.zip DATASET Ymd-Ymd \ 
    --maxcloudpct MAX_CLOUD_PCT --strategy STRATEGY \
    --outdir OUTPUT_DIR --scratchdir SCRATCH_DIR

For example:

chipdata LA_damage_20250113_v0.zarr.zip S2L2A 20250112-20250212 --maxcloudpct 20 --outdir chips --scratchdir images

Similarly to step 1, this will produce an output zipped Zarr store that contains chipped data for your chosen dataset with the name {LABELS}_{DATASET}.zarr.zip. The arguments are as follows:

  • PATH/TO/LABELS.zarr.zip: the path to your training lables.
  • DATASET: The satellite imagery dataset you would like to create labels for. See the list below for all current options.
  • Ymd-Ymd: The date range to select imagery from. For example, 20250112-20250212 selects imagery between January 12 and February 12, 2025.
  • MAX_CLOUD_PCT: For optical data, this optional parameter lets you set the maximum amount of cloud coverage allowed in a chip. Values between 0 and 100 are allowed. Cloud coverage is calculated on a per-chip basis. The default is 100 i.e., no limit.
  • STRATEGY: Lets you selected what data inside your date range will be used to create chips. Specifying BEST (the default) will create a chip for the image closest to the beginning of your date range that has at least 95% spatial coverage. Specifying ALL will create chips for all images within your date range that have at least 95% spatial coverage.
  • OUTPUT_DIR: Specifies the directory where the image chips will be saved. If not specified, this defaults to your current directory.
  • SCRATCH_DIR: Specifies the directory where the full-size satellite images will be downloaded to. If this argument is not provided, the images will be stored in a scratch directory that will be deleted when the chipdata call finishes.

Currently supported datasets include:

Tiling Schema

This package chips images based on the TerraMesh grid system, which builds on the MajorTOM grid system.

The MajorTOM grid system provides a global set of fixed image grids that are 1068x1068 pixels in size. A MajorTOM grid can be defined for any tile size, but we fix the grid to 10x10 Km tiles. Tiles are named using the format:

ROW[U|D]_COL[L|R]

Where, ROW is indexed from the equator, with a suffix U (up) for tiles north of the equator and D (down) for tiles south of it, and COL is indexed from the prime meridian, with a suffix L (left) for tiles east of the prime meridian and R (right) for tiles west of it.

To support finer subdivisions, the TerraMesh grid system divides each MajorTOM grid into a 4x4 set of sub-tiles, each 264x264 pixels. The subgrid is centered within the parent tile, leaving a 6-pixel border around each sub-tile. Subgrid names extend the base format with two additional indices:

ROW[U|D]_COL[L|R]_SUBCOL_SUBROW

For instance, the bottom-left subgrid of MajorTOM tile 434U_876L is named 434U_876L_0_3. See the figure below for a visual description:

TerraMesh tiling schema

Viewing Chips

Assessing chips after their creation can be challenging due to the large number of small images created. To address this issue, SatChip includes a chipview CLI tool that uses Matplotlib to quickly visualize the data included within the created zipped Zarr stores:

chipview PATH/TO/CHIPS.zarr.zip BAND --idx IDX

Where PATH/TO/CHIPS.zarr.zip is the path to the chip file (labels or image data), BAND is the name of the band you would like to view, and IDX is an optional integer index of which dataset you would like to initially view.

License

SatChip is licensed under the BSD-3-Clause open source license. See the LICENSE file for more details.

Contributing

Contributions to the SatChip are welcome! If you would like to contribute, please submit a pull request on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

satchip-0.3.0.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

satchip-0.3.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file satchip-0.3.0.tar.gz.

File metadata

  • Download URL: satchip-0.3.0.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for satchip-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3d60ddfe50d5cf8ec993642e3fbf4ff76c9034729b220971568fa4024107b0a9
MD5 f58b33d2508c36920b60233390420d32
BLAKE2b-256 226378019cf2d987cd0d790d98722c96dc07d5a4628d4a3c4aef21a25266d2e7

See more details on using hashes here.

File details

Details for the file satchip-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: satchip-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for satchip-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d7d85b84d4a3250f0be3adde12361e45e3c505dac8d51795d74c451ad50f8910
MD5 1b8819d3aadc028d24b7f43dc0c25247
BLAKE2b-256 eaf7eee95c98903be63b0d89c9fde59a1d6631c6d6eb5c22243a040deb4c04ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page