Skip to main content

A Python package for finding and getting geospatial data.

Project description

TerraKit

PyPI version Documentation Downloads License GitHub stars

🚀 A comprehensive library for generating ML-ready geospatial dataset 🌍

TerraKit Demo

Installation

We recommend using uv to manage your Python projects.

If you haven't created a uv-managed project yet, create one:

uv init terrakit-demo
cd terrakit-demo

Then add TerraKit to your project dependencies:

uv add terrakit

Alternatively, for projects using pip for dependencies:

pip install terrakit

Check TerraKit is working as expected by running:

python -c "import terrakit; data_source='sentinel_aws'; dc = terrakit.DataConnector(connector_type=data_source)"

NOTE: Activate the uv virtual environment using source .venv/bin/activate. Alternatively use uv run ahead of any python and pip commands.

NOTE: TerraKit requires gdal to be installed, which can be quite a complex process. If you don't have GDAL set up on your system, we recommend using uv as follows assuming you are running on a linux system:

apt-get update
apt-get install -y gdal-bin libgdal-dev
uv pip install geospatial

Alternatively, you can use a conda environment and installing it with conda install -c conda-forge gdal.

Data Connectors

Data connectors are classes which enable a user to search for data and query data from a particular data source using a common set of functions. Each data connector has the following mandatory methods:

  • list_collections()
  • find_data()
  • get_data()

Available data connectors

The following data connectors and associated collections are available:

Connectors Collections
sentinelhub s2_l1c, dem, s1_grd, hls_l30, s2_l2a, hls_s30
nasa_earthdata HLSL30_2.0, HLSS30_2.0
sentinel_aws sentinel-2-l2a
IBMResearchSTAC ukcp18-land-cpm-uk-2.2km, ch4, sentinel-5p-l3grd-ch4-wfmd
TheWeatherCompany weathercompany-daily-forecast

Quick start

Here is an example using the SentinelHub data connector.

from terrakit import DataConnector
dc = DataConnector(connector_type='sentinelhub')
dc.connector.list_collections()

For more examples, take a look at terrakit_download.ipynb.

TerraKit CLI

We can also run TerraKit using the CLI. Take a look at the TerraKit CLI Notebook for some examples of how to use this.

Data connector access

Each data connector has a different access requirements. For connecting to SentinelHub and NASA EarthData, you will need to obtain credentials from each provider. Once these have been obtained, they can be added to a .env file at the root directory level using the following syntax:

SH_CLIENT_ID="<SentinelHub Client ID>"
SH_CLIENT_SECRET="<SentinelHub Client Secret>"
NASA_EARTH_BEARER_TOKEN="<NASA EarthData Bearer Token>"

NASA Earthdata

To access NASA Earthdata, register for an Earthdata Login profile and requests a bearer token. https://urs.earthdata.nasa.gov/profile

Sentinel Hub

To access sentinel hub, register for an account and requests an OAuth client using the Sentinel Hub dashboard https://www.planet.com

Sentinel AWS

Access sentinel AWS data is open and does not require any credentials.

The Weather Company

To access The Weather Company, register for an account and requests an API Key https://www.weathercompany.com/weather-data-apis/. Once you have an API key, set the following environment variable:

THE_WEATHER_COMPANY_API_KEY="<>"

IBM Research STAC

Access IBM Research STAC is currently restricted to IBMers and partners. If you're elegible, you need to register for an IBM AppID account and set the following environment variables:

APPID_ISSUER=<issuer>
APPID_USERNAME=<user-email>
APPID_PASSWORD=<user-password>
CLIENT_ID=<client-id>
CLIENT_SECRET=<client-secret>

Please reach out the maintainers of this repo.

IBMers don't need credentials to access the internal instance of the STAC service.

This data connector allows you to save files as netcdf or tif. The get_data(..) method has a parameter called save_file. If you set save_file to a path that ends with nc then it will save as netcdf. If you set to a path that ends with tif it will save as tif files.

Example data

To download a pair of example label files from Copernicus Emergency Management Service, use the rapid_mapping_geojson_downloader function as follows:

python -c "from terrakit.general_utils.labels_downloader import rapid_mapping_geojson_downloader;\
rapid_mapping_geojson_downloader(event_id='748', aoi='01', monitoring_number='05', version='v1', dest='docs/examples/test_wildfire_vector');\
rapid_mapping_geojson_downloader(event_id='801', aoi='01', monitoring_number='02', 
version='v1', dest='docs/examples/test_wildfire_vector');"

Development setup

Git clone this repo:

git clone git@github.com/terrastackai/terrakit.git
cd terrakit

Install uv package manger using pip install uv, then install the package dependencies:

uv sync

Test out TerraKit:

uv run python -c "from terrakit import DataConnector; dc = DataConnector(connector_type='nasa_earthdata')"

Setup dev dependencies

Install dev dependencies

uv sync --group dev

If needed, dev dependencies can be excluded using the following:

uv sync --no-group dev

Check venv is set up as expected:

uv venv check

To install a new package and include it in the uv environment:

uv add <new_package>; uv sync.

Install pre-commit

Install the .pre-commit-config.yaml:

uv run pre-commit install

NOTE: Follow the steps under Detect secrets to install the IBM Detect Secrets library used by one of the pre-commit hooks.

To run pre-commit tasks which include ruff format, pytest, pytest coverage, detect secrets and mypy:

uv run pre-commit

The pre-commit tasks will run before as part of a git commit command. If any of the pre-commit tasks fail, git commit will also fail. Please resolve any issues before re running git commit.

Ruff usage

Run the Ruff formatter on the given files or directories

ruff format <file or directory name>

Use the [ruff.tool] > ignore section to include rules which should be ignored.

[tool.ruff]
target-version = "py310"
line-length = 120
ignore = [
    "Q000" # allow single quotes
]

Detect secrets

Install IBM detect secrets:

uv pip install --upgrade "git+https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets"

Run the following command from within the root directory to scan it for existing secrets, logging the results in .secrets.baseline.

uv run detect-secrets scan --update .secrets.baseline

Running pytests

To run all unit tests:

uv run pytest

To complete a pytest coverage report:

uv run pytest --cov=src/terrakit tests/

Running integration tests

uv run python tests/integration_tests/dev.py

Add a new data connectors

To add a new data connector, use the connector_template.py as a starting point. The new connector should implement the list_collection, find_data and get_data functions and extend the Connector class from the terrakit.download.connector module. Finally update terrakit.py to enable the new connector to be selected.

To also include new tests for the new connector, please make use of test_connector_template.py.

Make sure to also update the documentation. Each data connector has a separate markdown file making it easy to add new docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

terrakit-0.1.7.tar.gz (328.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

terrakit-0.1.7-py3-none-any.whl (349.0 kB view details)

Uploaded Python 3

File details

Details for the file terrakit-0.1.7.tar.gz.

File metadata

  • Download URL: terrakit-0.1.7.tar.gz
  • Upload date:
  • Size: 328.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.16

File hashes

Hashes for terrakit-0.1.7.tar.gz
Algorithm Hash digest
SHA256 b278d97cb59bcb4867c947bbdb693e1d7731e973b01c758ff430ce00ba05ccbd
MD5 66f0824d83bdedeb2c46c8ba94b382ba
BLAKE2b-256 db04fddfc22e58357b4ba60949d5c79a8bb92d9c351d290ae1f21fe10538545a

See more details on using hashes here.

File details

Details for the file terrakit-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: terrakit-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 349.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.16

File hashes

Hashes for terrakit-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 55c900757bc23701f128bc419d8111e06f9b0b9a7ef4e2c4b17259a9001227b8
MD5 5d6ad151063afaa64321b519d75962c9
BLAKE2b-256 66e34c237966b6ba740f43e0b4b920c135361e0ecb312b9332f84ca88774a123

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page