A python package to extract gee data for machine learning.
Project description
GEEML: Google Earth Engine Machine learning
A python package to extract gee data for machine learning.
Explore the documentation »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
About The Project
This python package makes it easier to extract satellite data from Google Earth Engine using parallel processing and the Google Earth Engine high volume end point.
In its current state it supports the extraction of data for traditional machine learning (tabular data) in the form of csv's and the extraction of GeoTiff image patches for Deep Neural Networks.
Motivation
The Machine learning capabilities in the GEE JS code editor remain limited.For example, there is no support for XGBoost, LightGBM, NGBoost, etc. Simultaneously, the python ecosystem has much more support for training, valdation, hyperparameter tuning. However, for this functionality to be leveraged, data needs to be downloaded locally or stored in Google Drive or Google Cloud Storage to be useful. Therfore, this package aims to make it easier and faster to download pre-processed data in a format that is ready for machine learning.
Features
- Parallel processing
- Support for both Tabular and Deep Neural Network type datasets
- ML ready
Getting Started
Installation
To install this package:
- pip
pip install geeml
- OR Build from source
pip install git+https://github.com/Geethen/geeml.git
Basic usage
#import packages
import ee
from geeml.prepare import getCountry, createGrid, prepareForExtraction
from geeml.extract import extractAOI
# Authenticate GEE
ee.Authenticate()
# Initialize GEE with high-volume end-point
ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
#import datasets from GEE
nasadem = ee.Image("NASA/NASADEM_HGT/001")
#A point in Kenya
poi = ee.Geometry.Point([37.857884,-0.002197])
kenya = getCountry(poi)
# Grid to serve as workers during data extraction
grid, items = createGrid(50000)
# Download directory
dd = '/content/drive/MyDrive/geeml_example'
# Prepare for data extraction
trial = extract(covariates=nasadem, grid = grid, aoi = kenya, scale= 5000, dd= dd, workers=items)
# Extract data
pExtract(trial.extractAoi, trial.workers)
For more examples, please refer to the Documentation
Roadmap
- Support the export of other formats (TFrecords/feather/geofeather/parquet and geoparquet)
- Collect items that failed to export
- Provide a progress bar
See the open issues for a full list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE.txt
for more information.
Contact
Geethen Singh - @Geethen - geethen.singh@gmail.com
Project Link: https://github.com/Geethen/geeml
Acknowledgments
This package was created with Cookiecutter and the giswqs/pypackage project template.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for geeml-0.0.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50d6e767c900dd94cb1f56e72690435ead57f33d2a1e1360bc7a61dc29af4283 |
|
MD5 | b34658696898c38f4d8ee839cd2bc8f1 |
|
BLAKE2b-256 | 958eaa9c2e9a14284720af0ba2e195ca36adc700681b9a60ed594e5858b6aac3 |