Skip to main content

A toolbox to download, process, store and visualise Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

Project description

gediDB Logo

gediDB: A toolbox for Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

Pipelines Code coverage Docs Available on PyPI PyPI Downloads DOI DOI Code style: black

gediDB is an open-source Python package designed to streamline the processing, analysis, and management of GEDI L2A-B and L4A-C data. This toolbox enables efficient and flexible data querying and management of large GEDI datasets stored with TileDB, a high-performance, multi-dimensional array database.

gediDB integrates key functionalities such as structured data querying, multi-dimensional data processing, and metadata management. With built-in support for parallel engines (e.g. Dask), the toolbox ensures scalability for large datasets, allowing efficient parallel processing on local machines or clusters.

Key Features of gediDB

  • TileDB-Based Storage: GEDI data is stored and managed in TileDB arrays, providing efficient, scalable, multi-dimensional data storage, enabling fast and flexible access to large volumes of data.
  • Flexible Data Querying: Easily query GEDI data across spatial, temporal, and variable dimensions. Access data within bounding boxes, or retrieve the nearest shots to a specific location, using intuitive filtering options for precision.
  • Parallel Processing: Process large GEDI datasets in parallel, enabling concurrent downloading, processing, and TileDB insertion of GEDI products. The number of concurrent processes can be easily controlled based on available system resources.
  • Metadata-Driven: Maintain and manage metadata for each dataset, ensuring that important contextual information like units, descriptions, and source details are stored and accessible.
  • Geospatial Data Management: Integrate seamlessly with tileDB to enable spatial queries, transformations, and geospatial analyses.

Why gediDB?

gediDB simplifies and automates the workflow for GEDI data processing, making it easier to retrieve, filter, and analyze complex datasets in an efficient, scalable manner. Whether you're investigating biomass distribution, monitoring forest dynamics, or conducting large-scale ecological studies, gediDB supports users with tools to handle and analyze large GEDI datasets with ease.

Documentation

Learn more about gediDB in its official documentation at https://gedidb.readthedocs.io/en/latest/.

Contributing

You can find information about contributing to gediDB on our Contributing page.

Future development

Planned future developments for gediDB are designed to improve usability and extend the package’s scope for both researchers and operational users:

  • Compatibility with upcoming GEDI product releases: ensures long-term sustainability of the toolbox as new mission data become available, avoiding version lock-in for users building workflows on gediDB.

  • Improved performance and flexibility in querying profile variables: will make it easier for users to analyse canopy structure profiles (e.g., RH metrics) at scale, which are currently among the most data-intensive GEDI products.

  • Expanded documentation and tutorials: will benefit new users by lowering the entry barrier, providing clear end-to-end examples, and connecting scientific use cases to code snippets.

  • Strengthened testing for reliability and maintainability: supports developers and long-term users by ensuring that changes do not break existing workflows, and by increasing trust in the reproducibility of analyses built on gediDB.

Development progress and discussion of these features are tracked openly through the project’s GitHub issues.

History

The development of the gediDB package began during the PhD of Amelia Holcomb, who initially created part of this toolset to analyze and manage GEDI data for her research. Recognizing the potential of her work to benefit the broader scientific community, the Global Land Monitoring team collaborated with Amelia in March 2024 to expand and optimize her code, transforming it into a scalable and versatile Python package named gediDB. This collaboration refined the toolbox to handle large-scale datasets with TileDB, integrate parallel processing, and incorporate a robust querying and metadata management system. Today, gediDB is designed to help researchers in ecological and environmental sciences by making GEDI data processing more efficient and accessible.

About the authors

Simon Besnard, a senior researcher in the Global Land Monitoring Group at GFZ Helmholtz Centre Potsdam, studies terrestrial ecosystems' dynamics and their feedback on environmental conditions. He specializes in developing methods to analyze large EO and climate datasets to understand ecosystem functioning in a changing climate. His current research focuses on forest structure changes over the past decade and their links to the carbon cycle.

Felix Dombrowski is a Bachelor’s student in Computer Science at the University of Potsdam and a research intern in the Global Land Monitoring Group at GFZ Helmholtz Centre Potsdam. At GFZ, his work has focused on developing toolboxes to process Earth Observation data efficiently.

Amelia Holcomb is a PhD candidate in Computer Science at the University of Cambridge, researching remote sensing and machine learning to study carbon sequestration and forest regrowth. Previously, she worked as a site reliability engineer at Google on Bigtable. She holds an MMath from the University of Waterloo and a B.A. in Mathematics from Yale.

Contact

For any questions or inquiries, please contact:

Acknowledgments

The development of gediDB was supported by the European Union through the FORWARDS and NextGenCarbon projects, and by the Helmholtz Association via the Helmholtz Foundation Model Initiative (3D-ABC project). Amelia Holcomb acknowledges funding from the Harding Distinguished Postgraduate Scholarship. We would also like to thank the R2D2 Workshop (March 2024, GFZ Potsdam) for providing the opportunity to meet and discuss GEDI data processing.

License

This project is licensed under the EUROPEAN UNION PUBLIC LICENCE v.1.2 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gedidb-2026.4.28.tar.gz (10.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gedidb-2026.4.28-py3-none-any.whl (10.4 MB view details)

Uploaded Python 3

File details

Details for the file gedidb-2026.4.28.tar.gz.

File metadata

  • Download URL: gedidb-2026.4.28.tar.gz
  • Upload date:
  • Size: 10.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gedidb-2026.4.28.tar.gz
Algorithm Hash digest
SHA256 5ab240d7cdc6d81d70ceddae350f94aafc3f4ec9df520927beed4202580919d2
MD5 3bc95f1886a59a961665309ea4be18f6
BLAKE2b-256 b8c3976b856e862b8514d999eb3b8be5c5527a4ca430365e8674fd94e5015a31

See more details on using hashes here.

Provenance

The following attestation bundles were made for gedidb-2026.4.28.tar.gz:

Publisher: pypi-release.yaml on simonbesnard1/gedidb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gedidb-2026.4.28-py3-none-any.whl.

File metadata

  • Download URL: gedidb-2026.4.28-py3-none-any.whl
  • Upload date:
  • Size: 10.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gedidb-2026.4.28-py3-none-any.whl
Algorithm Hash digest
SHA256 d3731c341213279b1da702b021184abfa54a757bc94169862325b033be8a2cd8
MD5 838fe98e95266a3932e35c1eac2e5b49
BLAKE2b-256 6cb2b0a71ad9cd66fe95dba4a032d37f501c5b9aa5414059f88f5e983df59095

See more details on using hashes here.

Provenance

The following attestation bundles were made for gedidb-2026.4.28-py3-none-any.whl:

Publisher: pypi-release.yaml on simonbesnard1/gedidb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page