Skip to main content

Creating maps with machine learning models and earth observation data.

Project description

OpenMapFlow 🌍

CI Status tb1 db1 tb2 db2 tb3 db3

Rapid map creation with machine learning and earth observation data.

Example projects: Cropland, Buildings, Maize

Example maps: Earth Engine script

Tutorial cb

Colab notebook tutorial demonstrating data exploration, model training, and inference over small region. (video)

Prerequisites:

Creating a map from scratch

To create your own maps with OpenMapFlow, you need to

  1. Generate your own OpenMapFlow project, this will allow you to:
  2. Add your own labeled data
  3. Train a model using that labeled data, and
  4. Create a map using the trained model.

openmapflow-pipeline

Generating a project cb

A project can be generated by either following the below documentation OR running the above Colab notebook.

Prerequisites:

Once all prerequisites are satisfied, inside your Github repository run:

pip install openmapflow
openmapflow generate

The command will prompt for project configuration such as project name and Google Cloud Project ID. Several prompts will have defaults shown in square brackets. These will be used if nothing is entered.

After all configuration is set, the following project structure will be generated:

<YOUR PROJECT NAME>
│   README.md
│   datasets.py             # Dataset definitions (how labels should be processed)
│   evaluate.py             # Template script for evaluating a model
│   openmapflow.yaml        # Project configuration file
│   train.py                # Template script for training a model
│   
└─── .dvc/                  # https://dvc.org/doc/user-guide/what-is-dvc
│       
└─── .github
│   │
│   └─── workflows          # Github actions
│       │   deploy.yaml     # Automated Google Cloud deployment of trained models
│       │   test.yaml       # Automated integration tests of labeled data
│       
└─── data
    │   raw_labels/                     # User added labels
    │   datasets/                       # ML ready datasets (labels + earth observation data)
    │   models/                         # Models trained using datasets
    |   raw_labels.dvc                  # Reference to a version of raw_labels/
    |   datasets.dvc                    # Reference to a version of datasets/
    │   models.dvc                      # Reference to a version of models/
    

Github Actions Secrets Being able to pull and deploy data inside Github Actions requires access to Google Cloud. To allow the Github action to access Google Cloud, add a new repository secret (instructions).

  • In step 5 of the instructions, name the secret: GCP_SA_KEY
  • In step 6, enter your Google Cloud Service Account Key

After this the Github actions should successfully run.

GCloud Bucket: A Google Cloud bucket must be created for the labeled earth observation files. Assuming gcloud is installed run:

gcloud auth login
gsutil mb -l <YOUR_OPENMAPFLOW_YAML_GCLOUD_LOCATION> gs://<YOUR_OPENMAPFLOW_YAML_BUCKET_LABELED_EO>

Adding data

Adding already existing data

Prerequisites:

Add reference to already existing dataset in your datasets.py:

from openmapflow.datasets import GeowikiLandcover2017, TogoCrop2019

datasets = [GeowikiLandcover2017(), TogoCrop2019()]

Download and push datasets

openmapflow create-datasets # Download datasets
dvc commit && dvc push      # Push data to version control

git add .
git commit -m'Created new dataset'
git push

Adding custom data cb

Data can be added by either following the below documentation OR running the above Colab notebook.

Prerequisites:

  1. Pull the latest data
dvc pull
  1. Move raw label files into project's data/raw_labels folder
  2. Write a LabeledDataset class in datasets.py with a load_labels function that converts raw labels to a standard format, example:
label_col = "is_crop"

class TogoCrop2019(LabeledDataset):
    def load_labels(self) -> pd.DataFrame:
        # Read in raw label file
        df = pd.read_csv(PROJECT_ROOT / DataPaths.RAW_LABELS / "Togo_2019.csv")

        # Rename coordinate columns to be used for getting Earth observation data
        df.rename(columns={"latitude": LAT, "longitude": LON}, inplace=True)

        # Set start and end date for Earth observation data
        df[START], df[END] = date(2019, 1, 1), date(2020, 12, 31)

        # Set consistent label column
        df[label_col] = df["crop"].astype(float)

        # Split labels into train, validation, and test sets
        df[SUBSET] = train_val_test_split(index=df.index, val=0.2, test=0.2)

        # Set country column for later analysis
        df[COUNTRY] = "Togo"

        return df

datasets: List[LabeledDataset] = [TogoCrop2019(), ...]
  1. Check your new dataset load_labels function
openmapflow verify TogoCrop2019
  1. Run dataset creation (can be skipped if automated in CI e.g. in https://github.com/nasaharvest/crop-mask):
earthengine authenticate    # For getting new earth observation data
gcloud auth login           # For getting cached earth observation data
openmapflow create-datasets # Initiatiates or checks progress of dataset creation
  1. Push new data to remote storage and new code to Github
dvc commit && dvc push
git add .
git commit -m'Created new dataset'
git push

Training a model cb

A model can be trained by either following the below documentation OR running the above Colab notebook.

Prerequisites:

# Pull in latest data
dvc pull

# Set model name, train model, record test metrics
export MODEL_NAME=<YOUR MODEL NAME>              
python train.py --model_name $MODEL_NAME    
python evaluate.py --model_name $MODEL_NAME 

# Push new models to data version control
dvc commit 
dvc push  

# Make a Pull Request to the repository
git checkout -b"$MODEL_NAME"
git add .
git commit -m "$MODEL_NAME"
git push --set-upstream origin "$MODEL_NAME"

Now after merging the pull request, the model will be deployed to Google Cloud.

Creating a map cb

Prerequisites:

Only available through above Colab notebook. Cloud Architecture must be deployed using the deploy.yaml Github Action.

Accessing existing datasets

from openmapflow.datasets import TogoCrop2019
df = TogoCrop2019().load_df(to_np=True)
x = df.iloc[0]["eo_data"]
y = df.iloc[0]["class_prob"]

Citation

@inproceedings{OpenMapFlow2023,
  title={OpenMapFlow: A Library for Rapid Map Creation with Machine Learning and Remote Sensing Data},
  author={Zvonkov, Ivan and Tseng, Gabriel and Nakalembe, Catherine and Kerner, Hannah},
  booktitle={AAAI},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openmapflow-0.2.4.tar.gz (83.6 kB view details)

Uploaded Source

Built Distribution

openmapflow-0.2.4-py3-none-any.whl (88.7 kB view details)

Uploaded Python 3

File details

Details for the file openmapflow-0.2.4.tar.gz.

File metadata

  • Download URL: openmapflow-0.2.4.tar.gz
  • Upload date:
  • Size: 83.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for openmapflow-0.2.4.tar.gz
Algorithm Hash digest
SHA256 0bdd075aefdfc7e6532d0e01229ca20d81ebdfa7bbc7f4894043e388d3666b4e
MD5 7558de7a769912b0205d36dd4e787900
BLAKE2b-256 11b520739254a30f35ec7d29f49029a9e8eb083e36c7d00b6d69757841e029e6

See more details on using hashes here.

File details

Details for the file openmapflow-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: openmapflow-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 88.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for openmapflow-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 58de2ab17570708a79d505761048af625c5cb0ce88c06c2ce346adb8895f700b
MD5 994d8f1cc0a05e352b0b9ccea788da50
BLAKE2b-256 4dee797c1c73a0f93d6df8095ccbfd1b18d40d301924613f4a5938946d130ada

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page