Skip to main content

A package for Adaptive Spatio-Temporal Model (AdaSTEM) in python

Project description

stemflow

stemflow logo

A package for Adaptive Spatio-Temporal Model (AdaSTEM) in python.

GitHub PyPI version Anaconda version PyPI downloads GitHub last commit

Installation

pip install stemflow

Brief introduction

stemflow is a toolkit for Adaptive Spatio-Temporal Model (AdaSTEM) in python. A typical usage is daily abundance estimation using eBird citizen science data. It leverages the "adjacency" information of surrounding target values in space and time, to predict the classes/continues values of target spatial-temporal point. In the demo, we use a two-step hurdle model as "base model", with XGBoostClassifier for occurence modeling and XGBoostRegressor for abundance modeling.

User can define the size of stixel (spatial temporal pixel) in terms of space and time. Larger stixel guarantees generalizability but loses precision in fine resolution; Smaller stixel may have better predictability in the exact area but reduced extrapolability for points outside the stixel.

In the demo, we first split the training data using temporal sliding windows with size of 50 day of year (DOY) and step of 20 DOY (temporal_start = 1, temporal_end=366, temporal_step=20, temporal_bin_interval=50). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, grid_len_lon_upper_threshold=25, grid_len_lat_upper_threshold=25), and stop splitting to prevent the edge length to shrink below 5 units (grid_len_lon_lower_threshold=5, grid_len_lat_lower_threshold=5) or containing less than 25 checklists (points_lower_threshold=50).

This process is excecuted 10 times (ensemble_fold = 10), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediciton phase, only spatial-temporal points with more than 7 (min_ensemble_required = 7) ensembles usable are predicted (otherwise, set as np.nan).

Fitting and prediction methods follow the convention of sklearn estimator class:

## fit
model.fit(X_train.reset_index(drop=True), y_train)

## predict
pred = model.predict(X_test)
pred = np.where(pred<0, 0, pred)

Where the pred is the mean of the predicted values across ensembles.

Usage

from stemflow.model.AdaSTEM import AdaSTEM, AdaSTEMClassifier, AdaSTEMRegressor
from stemflow.model.Hurdle import Hurdle_for_AdaSTEM
from xgboost import XGBClassifier, XGBRegressor

SAVE_DIR = './'


model = Hurdle_for_AdaSTEM(
    classifier=AdaSTEMClassifier(base_model=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
                                save_gridding_plot = True,
                                ensemble_fold=10, 
                                min_ensemble_required=7,
                                grid_len_lon_upper_threshold=25,
                                grid_len_lon_lower_threshold=5,
                                grid_len_lat_upper_threshold=25,
                                grid_len_lat_lower_threshold=5,
                                points_lower_threshold=50,
                                Spatio1='longitude',
                                Spatio2 = 'latitude', 
                                Temporal1 = 'DOY',
                                use_temporal_to_train=True,
                                njobs=4),
    regressor=AdaSTEMRegressor(base_model=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
                                save_gridding_plot = True,
                                ensemble_fold=10, 
                                min_ensemble_required=7,
                                grid_len_lon_upper_threshold=25,
                                grid_len_lon_lower_threshold=5,
                                grid_len_lat_upper_threshold=25,
                                grid_len_lat_lower_threshold=5,
                                points_lower_threshold=50,
                                Spatio1='longitude',
                                Spatio2 = 'latitude', 
                                Temporal1 = 'DOY',
                                use_temporal_to_train=True,
                                njobs=4)
)

## fit
model.fit(X_train.reset_index(drop=True), y_train)

## predict
pred = model.predict(X_test)
pred = np.where(pred<0, 0, pred)
eval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)
print(eval_metrics)

Plot QuadTree ensembles

model.classifier.gridding_plot
# or model.regressor.gridding_plot

QuadTree example


Example of visualization

GIF visualization


Documentation

stemflow Documentation


References:

  1. Fink, D., Damoulas, T., & Dave, J. (2013, June). Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 27, No. 1, pp. 1284-1290).

  2. Fink, D., Auer, T., Johnston, A., Ruiz‐Gutierrez, V., Hochachka, W. M., & Kelling, S. (2020). Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications, 30(3), e02056.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stemflow-0.0.8.tar.gz (57.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stemflow-0.0.8-py3-none-any.whl (34.2 kB view details)

Uploaded Python 3

File details

Details for the file stemflow-0.0.8.tar.gz.

File metadata

  • Download URL: stemflow-0.0.8.tar.gz
  • Upload date:
  • Size: 57.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for stemflow-0.0.8.tar.gz
Algorithm Hash digest
SHA256 194af6f1247875eb7e0eadbec327f29e29d569484337cc45e003b7a565315814
MD5 e7b6673cb76f1653dc2a66475066b855
BLAKE2b-256 f43e848f64b2e488feab1a7f4da2582579ec62865b288eeeadb17f331cb47433

See more details on using hashes here.

File details

Details for the file stemflow-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: stemflow-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 34.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for stemflow-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 cd95862ec49fd222dd0c52a518013ca892509cb6bf316b5fdd6e183a2bde0124
MD5 6d049901fe7a731ce256e4a24a55eb1c
BLAKE2b-256 e0161c1c582a3f99992d0ae1af37ec1e6668e79938793d8aa808f4047c7b2d02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page