Skip to main content

A toolbox to download, process, store and visualise Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

Project description

gediDB Logo

gediDB: A toolbox for Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

Pipelines Code coverage Docs Available on PyPI PyPI Downloads DOI DOI Code style: black

gediDB is an open-source Python package designed to streamline the processing, analysis, and management of GEDI L2A-B and L4A-C data. This toolbox enables efficient and flexible data querying and management of large GEDI datasets stored with TileDB, a high-performance, multi-dimensional array database.

gediDB integrates key functionalities such as structured data querying, multi-dimensional data processing, and metadata management. With built-in support for parallel engines (e.g. Dask), the toolbox ensures scalability for large datasets, allowing efficient parallel processing on local machines or clusters.

Key Features of gediDB

  • TileDB-Based Storage: GEDI data is stored and managed in TileDB arrays, providing efficient, scalable, multi-dimensional data storage, enabling fast and flexible access to large volumes of data.
  • Flexible Data Querying: Easily query GEDI data across spatial, temporal, and variable dimensions. Access data within bounding boxes, or retrieve the nearest shots to a specific location, using intuitive filtering options for precision.
  • Parallel Processing: Process large GEDI datasets in parallel, enabling concurrent downloading, processing, and TileDB insertion of GEDI products. The number of concurrent processes can be easily controlled based on available system resources.
  • Metadata-Driven: Maintain and manage metadata for each dataset, ensuring that important contextual information like units, descriptions, and source details are stored and accessible.
  • Geospatial Data Management: Integrate seamlessly with tileDB to enable spatial queries, transformations, and geospatial analyses.

Why gediDB?

gediDB simplifies and automates the workflow for GEDI data processing, making it easier to retrieve, filter, and analyze complex datasets in an efficient, scalable manner. Whether you're investigating biomass distribution, monitoring forest dynamics, or conducting large-scale ecological studies, gediDB supports users with tools to handle and analyze large GEDI datasets with ease.

Documentation

Learn more about gediDB in its official documentation at https://gedidb.readthedocs.io/en/latest/.

Contributing

You can find information about contributing to gediDB on our Contributing page.

Future development

Planned future developments for gediDB are designed to improve usability and extend the package’s scope for both researchers and operational users:

  • Compatibility with upcoming GEDI product releases: ensures long-term sustainability of the toolbox as new mission data become available, avoiding version lock-in for users building workflows on gediDB.

  • Improved performance and flexibility in querying profile variables: will make it easier for users to analyse canopy structure profiles (e.g., RH metrics) at scale, which are currently among the most data-intensive GEDI products.

  • Expanded documentation and tutorials: will benefit new users by lowering the entry barrier, providing clear end-to-end examples, and connecting scientific use cases to code snippets.

  • Strengthened testing for reliability and maintainability: supports developers and long-term users by ensuring that changes do not break existing workflows, and by increasing trust in the reproducibility of analyses built on gediDB.

Development progress and discussion of these features are tracked openly through the project’s GitHub issues.

History

The development of the gediDB package began during the PhD of Amelia Holcomb, who initially created part of this toolset to analyze and manage GEDI data for her research. Recognizing the potential of her work to benefit the broader scientific community, the Global Land Monitoring team collaborated with Amelia in March 2024 to expand and optimize her code, transforming it into a scalable and versatile Python package named gediDB. This collaboration refined the toolbox to handle large-scale datasets with TileDB, integrate parallel processing, and incorporate a robust querying and metadata management system. Today, gediDB is designed to help researchers in ecological and environmental sciences by making GEDI data processing more efficient and accessible.

About the authors

Simon Besnard, a senior researcher in the Global Land Monitoring Group at GFZ Helmholtz Centre Potsdam, studies terrestrial ecosystems' dynamics and their feedback on environmental conditions. He specializes in developing methods to analyze large EO and climate datasets to understand ecosystem functioning in a changing climate. His current research focuses on forest structure changes over the past decade and their links to the carbon cycle.

Felix Dombrowski is a Bachelor’s student in Computer Science at the University of Potsdam and a research intern in the Global Land Monitoring Group at GFZ Helmholtz Centre Potsdam. At GFZ, his work has focused on developing toolboxes to process Earth Observation data efficiently.

Amelia Holcomb is a PhD candidate in Computer Science at the University of Cambridge, researching remote sensing and machine learning to study carbon sequestration and forest regrowth. Previously, she worked as a site reliability engineer at Google on Bigtable. She holds an MMath from the University of Waterloo and a B.A. in Mathematics from Yale.

Contact

For any questions or inquiries, please contact:

Acknowledgments

The development of gediDB was supported by the European Union through the FORWARDS and NextGenCarbon projects, and by the Helmholtz Association via the Helmholtz Foundation Model Initiative (3D-ABC project). Amelia Holcomb acknowledges funding from the Harding Distinguished Postgraduate Scholarship. We would also like to thank the R2D2 Workshop (March 2024, GFZ Potsdam) for providing the opportunity to meet and discuss GEDI data processing.

License

This project is licensed under the EUROPEAN UNION PUBLIC LICENCE v.1.2 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gedidb-2026.4.30.tar.gz (9.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gedidb-2026.4.30-py3-none-any.whl (9.2 MB view details)

Uploaded Python 3

File details

Details for the file gedidb-2026.4.30.tar.gz.

File metadata

  • Download URL: gedidb-2026.4.30.tar.gz
  • Upload date:
  • Size: 9.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gedidb-2026.4.30.tar.gz
Algorithm Hash digest
SHA256 89448c1b42c45ece76a1ae6d4c87cba14ee8f75f6c96a53beec5c374f63aad5a
MD5 732bc022f0e0b691c4d7dc7ef7d8f932
BLAKE2b-256 49040aaa4e5c4cbbee79c655637cc69858962db3f04b27b8a50f4cc7f6918ecd

See more details on using hashes here.

Provenance

The following attestation bundles were made for gedidb-2026.4.30.tar.gz:

Publisher: pypi-release.yaml on simonbesnard1/gedidb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gedidb-2026.4.30-py3-none-any.whl.

File metadata

  • Download URL: gedidb-2026.4.30-py3-none-any.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gedidb-2026.4.30-py3-none-any.whl
Algorithm Hash digest
SHA256 a4ed5f75de9225bcd617979cd5723becebab430add653b41f61ad52a1177747c
MD5 17c2ec7cf84dfff0675758353914da3b
BLAKE2b-256 52e453be9243628b3a5f99b5e1ba2ce734f780c7827d5571abef6cea1c244d1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for gedidb-2026.4.30-py3-none-any.whl:

Publisher: pypi-release.yaml on simonbesnard1/gedidb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page