A package for Adaptive Spatio-Temporal Exploratory Model (AdaSTEM) in python
Project description
stemflow :bird:
A Python Package for Adaptive Spatio-Temporal Exploratory Model (AdaSTEM)
Documentation :book:
Installation :wrench:
pip install stemflow
Or using conda:
conda install -c conda-forge stemflow
Brief introduction :information_source:
stemflow is a toolkit for Adaptive Spatio-Temporal Exploratory Model (AdaSTEM [1, 2]) in python. A typical usage is daily abundance estimation using eBird citizen science data (survey data).
stemflow adopts "split-apply-combine" philosophy. It
- Splits input data using Quadtree algorithm.
- Trains each spatiotemporal split (called stixel) separately.
- Aggregates the ensemble to make prediction.
The framework leverages the "adjacency" information of surroundings in space and time to model/predict the values of target spatiotemporal points. This framework ameliorates the long-distance/long-range prediction problem [3], and have a good spatiotemporal smoothing effect.
For more information, please see an introduction to stemflow and learning curve analysis
Model and data :slot_machine:
Main functionality of stemflow
:white_check_mark: Spatiotemporal modeling & prediction
:white_check_mark: Calculate overall feature importances
:white_check_mark: Plot spatiotemporal dynamics
For details see AdaSTEM Demo
Supported data types
:white_check_mark: All spatial indexing (CRS)
:white_check_mark: All temporal indexing
:white_check_mark: Spatial-only modeling
:white_check_mark: Both continuous and categorical features (prefer one-hot encoding)
:white_check_mark: Both static (e.g., yearly mean temperature) and dynamic features (e.g., daily temperature)
For details and tips see Tips for data types
Supported tasks
:white_check_mark: Binary classification task
:white_check_mark: Regression task
:white_check_mark: Hurdle task (two step regression – classify then regress the non-zero part)
For details and tips see Tips for different tasks
Supported base models
:white_check_mark: sklearn style BaseEstimator
classes (you can make your own base model), for example here
:white_check_mark: sklearn style Maxent model. Example here.
Usage :star:
Use Hurdle model as the base model of AdaSTEMRegressor:
from stemflow.model.AdaSTEM import AdaSTEM, AdaSTEMClassifier, AdaSTEMRegressor
from stemflow.model.Hurdle import Hurdle
from xgboost import XGBClassifier, XGBRegressor
## "hurdle in Ada"
model = AdaSTEMRegressor(
base_model=Hurdle(
classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1)
), # hurdel model for zero-inflated problem (e.g., count)
save_gridding_plot = True,
ensemble_fold=10, # data are modeled 10 times, each time with jitter and rotation in Quadtree algo
min_ensemble_required=7, # Only points covered by > 7 stixels will be predicted
grid_len_lon_upper_threshold=25, # force splitting if the longitudinal edge of grid exceeds 25
grid_len_lon_lower_threshold=5, # stop splitting if the longitudinal edge of grid fall short 5
grid_len_lat_upper_threshold=25, # similar to the previous one, but latitudinal
grid_len_lat_lower_threshold=5,
temporal_start=1, # The next 4 params define the temporal sliding window
temporal_end=366,
temporal_step=20,
temporal_bin_interval=50,
points_lower_threshold=50, # Only stixels with more than 50 samples are trained
Spatio1='longitude', # The next three params define the name of
Spatio2='latitude', # spatial coordinates shown in the dataframe
Temporal1='DOY',
use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor
njobs=1
)
Fitting and prediction methods follow the style of sklearn BaseEstimator
class:
## fit
model = model.fit(X_train.reset_index(drop=True), y_train)
## predict
pred = model.predict(X_test)
pred = np.where(pred<0, 0, pred)
eval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)
print(eval_metrics)
Where the pred
is the mean of the predicted values across ensembles.
See AdaSTEM demo for further functionality.
See Optimizing Stixel Size for why and how you should tune the important gridding parameters.
Plot QuadTree ensembles :evergreen_tree:
model.gridding_plot
# Here, the model is a AdaSTEM class, not a hurdle class
Here, each color shows an ensemble generated during model fitting. In each of the 10 ensembles, regions (in terms of space and time) with more training samples were gridded into finer resolution, while the sparse one remained coarse. Prediction results were aggregated across the ensembles (that is, in this example, data were modeled 10 times).
Example of visualization :world_map:
Daily Abundance Map of Barn Swallow
See section Prediction and Visualization for how to generate this GIF.
Contribute to stemflow :purple_heart:
We welcome pull requests. Contributors should follow contributor guidelines.
Application level cooperation is also welcomed. We recognized that stemflow may consume large computational resources especially as data volume boosts in the future. We always welcome research collaboration of all kinds. Contact me at chenyangkang24@outlook.com
References:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stemflow-1.0.9.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 371ab4eacf28357ee88ef4e7e46c245c6a8c5e19a44da1c4d3fea4ba4cdcfe13 |
|
MD5 | 0b65c8346be776d92ec0742224474a66 |
|
BLAKE2b-256 | 014c07da2069d1cc4e6b84c3acf16864b115434ac34cfdfda3ef749594a46dde |