Skip to main content

A Python package for the CropNet dataset

Project description

CropNet

CropNet is an open, large-scale, and deep learning-ready dataset, specifically targeting climate change-aware crop yield predictions for the contiguous United States (U.S.) continent at the county level. It is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, aligned in both the spatial and temporal domains, for over 2200 U.S. counties spanning 6 years (2017-2022). It is expected to facilitate researchers in developing deep learning models for timely and precisely predicting crop yields at the county level, by accounting for the effects of both short-term growing season weather variations and long-term climate change on crop yields. Although our initial goal of crafting the CropNet dataset is for precise crop yield prediction, we believe its future applicability is broad and can benefit the deep learning, agriculture, and meteorology communities, for exploring more interesting and critical climate change-related applications, by using one or more modalities of data.

Overview

The CropNet dataset is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, spanning from 2017 to 2022 (i.e., 6 years) across 2291 U.S. counties.

  • The dataset is available at Google Drive

  • The tutorials for each modality of data are availbale at Github

Sentinel-2 Imagery

The Sentinel-2 Imagery, obtained from the Sentinel-2 mission, provides high-resolution satellite images for monitoring crop growth on the ground. It contains two types of 224x224 RGB satellite images, agriculture imagery (AG) and normalized difference vegetation index (NDVI), both with a spatial resolution of 9x9 km, and a revisit frequency of 14 days.

WRF-HRRR Computed Dataset

The WRF-HRRR Computed Dataset, sourced from the WRF-HRRR model, contains daily and monthly meteorological parameters, with the former and the latter designed for capturing the direct effects of short-term growing season weather variations on crop growth, and for learning the indirect impacts of long-term climate change on crop yields, respectively. It contains 9 meteorological parameters gridded at 9 km in a one-day (and one-month) interval.

USDA Crop Dataset

The USDA Crop Dataset, collected from the USDA Quick Statistic website, offers valuable crop information, such as production, yield, etc., for crops grown at each available county. It offers crop information for four types of crops, i.e., corn, cotton, soybeans, and winter wheat, at a county-level basis, with a temporal resolution of one year.

Pipeline

This repository includes three types of APIs for facilitating researchers in downloading the CropNet data based on the time and region of interest, and flexibly building their deep learning models for accurate crop yield predictions, with their details listed below:

  • DataDownloader: This API allows users to download the CropNet data over the time/region of interest on the fly.

  • DataRetriever: With this API, users can conveniently obtain the CropNet data stored in the local machine (e.g., if you have downloaded our curated CropNet from Google Drive) over the time/region of interest.

  • DataLoader: This API is designed to facilitate researchers in developing their DNNs for accurate crop yield predictions. Specifically, the code in this API ( 1) combines all three modalities of data to create $(\mathbf{x}, \mathbf{y_{s}}, \mathbf{y_{l}}, \mathbf{z})$ tuples, with $\mathbf{x}, \mathbf{y_{s}}, \mathbf{y_{l}}, \text{and}~ \mathbf{z}$, respectively representing satellite images, short-term daily whether parameters, long-term monthly meteorological parameters, and ground-truth crop yield (or production) information, and then (2) exposes those tuples via a Dataset object after appropriate data pre-processing techniques.

Installation

Researchers and practitioners can install the latest version of CropNet with the following commands:

# Create and activate a conda environment
conda create -n cropnet_api python=3.10
conda activate cropnet_api

# Install the latest version of CropNet
pip install cropnet

# Slove the ecCodes library dependency issue
pip install ecmwflibs

CropNet API Examples

  • Example 1: A DataDownloader Example for Downloading the Up-to-date CropNet Data

    Given the time and region (i.e., the FIPS codes for two U.S. counties) of interest, the following code presents how to utilize the DataDownloader to download the up-to-date CropNet data:

from cropnet.data_downloader import DataDownloader

# Use the "target_dir" to specify where the data should be downloaded to
downloader = DataDownloader(target_dir="./data")

# Download 2022 USDA Soybean data
# Note that most of the 2023 USDA data are not yet available
downloader.download_USDA("Soybean", fips_codes=["10003", "22007"], years=["2022"])

# Download the 2023 (the 1st and 2nd quarters) Sentinel-2 Imagery
downloader.download_Sentinel2(fips_codes=["10003", "22007"], years=["2023"], image_type="AG")
downloader.download_Sentinel2(fips_codes=["10003", "22007"], years=["2023"], image_type="NDVI")

# Download the 2023 (January to July) WRF-HRRR data
downloader.download_HRRR(fips_codes=["10003", "22007"], years=["2023"])
  • Example 2: A DataRetriever Example for Obtaining Our Curated CropNet Data

    Given the time and region of interest, the following code shows how to use the DataRetriever to obtain the CropNet data stored in the local machine in a user-friendly format:

# Use the "base_fir" to specify where the CropNet data is stored
retriever = DataRetriever(base_dir="/mnt/data/CropNet")
   
# Retrieve the 2022 USDA Soybean data
usda_data = retriever.retrieve_USDA(crop_type="Soybean", fips_codes=["10003", "22007"], years=["2022"])
   
# Retrieve the 2022 Sentinel-2 Imagery data
sentinel2_data = retriever.retrieve_Sentinel2(fips_codes=["10003", "22007"], years=["2022"], image_type="AG")
sentinel2_data = retriever.retrieve_Sentinel2(fips_codes=["10003", "22007"], years=["2022"], image_type="NDVI")
   
# Retrieve the 2022 WRF-HRRR data
hrrr_data = retriever.retrieve_HRRR(fips_codes=["10003","22007"], years=["2022"])
  • Example 3: A PyTorch Example for Using the DataLoader API for Training DNNs

The following code presents a PyTorch example of training a deep learning model (i.e., MMST-ViT) for climate change-aware crop yield predictions, by utilizing the DataLoader APIs:

import torch
from torch.utils.data import DataLoader
from models_mmst_vit import MMST_ViT
from cropnet.dataset.hrrr_computed_dataset import HRRRComputedDataset
from cropnet.dataset.sentinel2_imagery import Sentinel2Imagery
from cropnet.dataset.usda_crop_dataset import USDACropDataset

# The base directory for the CropNet dataset
base_dir = "/mnt/data/CropNet"
# The JSON configuration file
config_file = "data/soybeans_train.json"

# The dataloaders for each modality of data
sentinel2_loader = DataLoader(Sentinel2Imagery(base_dir, config_file), batch_size=1)
hrrr_loader = DataLoader(HRRRComputedDataset(base_dir, config_file), batch_size=1)
usda_loader = DataLoader(USDACropDataset(base_dir, config_file), batch_size=1)

# The model, the optimizer, and the loss function
model = MMST_ViT()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, betas=(0.9, 0.999))
criterion = torch.nn.MSELoss()

# Traning the model for one epoch
for s, h, u in zip(sentinel2_loader, hrrr_loader, usda_loader):
    # x: satellite images
    # ys (or yl): short-term daily (or long-term monthly) weather parameters
    # z: ground-truth crop yield (or production) information
    x, ys, yl, z, = s[0], h[0], h[1], u[0]
    
    optimizer.zero_grad()
    z_hat = model(x, ys, yl)
    loss = criterion(z, z_hat)

    loss.backward()
    optimizer.step()

License

CropNet has a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cropnet-0.2.1.tar.gz (36.1 kB view details)

Uploaded Source

File details

Details for the file cropnet-0.2.1.tar.gz.

File metadata

  • Download URL: cropnet-0.2.1.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for cropnet-0.2.1.tar.gz
Algorithm Hash digest
SHA256 999c3160a28560f581747fc3ff2b153455e4a1aeb0b894ea65b86fb2de35aef3
MD5 50bf0d4879b491617060222e758df6e7
BLAKE2b-256 2da5bb06629014119e4b14b85649194ed17221bb5eb505c186c54f7bede1c94a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page