Skip to main content

A ensemble framework for explainable geospatial machine Learning models

Project description

An Ensemble Framework for Geospatial Machine Learning Models

GitHub: https://github.com/UrbanGISer/XGeoML

PYPI Homepage: https://pypi.org/project/XGeoML/0.1.4/

Installation: pip install XGeoML

This package addresses the critical challenge of analyzing and interpreting spatially varying effects in geographic analysis, stemming from the complexity and non-linearity of geospatial data. We introduce an innovative integrated framework that combines local spatial weights, Explainable Artificial Intelligence (XAI), and advanced machine learning technologies. This approach significantly bridges the gap between traditional geographic analysis models and contemporary machine learning methodologies.

Introduction

Geospatial data is inherently complex and non-linear, presenting significant challenges in analysis and interpretation. Traditional geographic analysis models often struggle to address these challenges, leading to gaps in understanding and interpretation.

Our Approach

We propose an innovative integrated framework that leverages local spatial weights, Explainable Artificial Intelligence (XAI), and advanced machine learning technologies. Our approach aims to bridge the gap between traditional methods and modern machine learning techniques, offering a more comprehensive tool for geographic analysis.

Features

  • Local Spatial Weights: Incorporates the spatial context of data, enhancing model sensitivity to geographical nuances.
  • Explainable Artificial Intelligence (XAI): Provides clarity on the decision-making process, improving the interpretability of the model's predictions.
  • Advanced Machine Learning Technologies: Utilizes cutting-edge algorithms to manage the complexity and non-linearity of geospatial data effectively.

Key Functions

  • Use built-in Spatial Weights: Generate Gaussian, Binary and GaussianBinary weight.
weights=w_matrix.spatial_weight(df, "u", "v", fix=False, bandwidth=80, kernel_type='Binary')
  • Import libpysal Spatial Weights: Accept all spatial weight.
import libpysal.weights as lw
points = df[['u', 'v']].values
w=lw.DistanceBand(points,threshold=6,binary=False)
weightpysal=w_matrix.from_libpysal(w)
  • Predict or Search Bandwidth with fast training model: Accept all sci-learn model.
# 01 Define key variables
feature_names=['x1','x2','x3','x4']
target_name="y"
explainer_names = ["LIME","SHAP", 'Importance']
turebeta= ['b_linear','b_circular', 'b_cos_basic',  'b_poly']

# 02 import  sklearn ML model
from sklearn.ensemble import  GradientBoostingRegressor
model=GradientBoostingRegressor

# 03 import  R2
from sklearn.metrics  import r2_score

# 04 generate weights
weights=w_matrix.spatial_weight(df, "u", "v", fix=False, bandwidth=80, kernel_type='Binary')

# 05 Bandwidth Searching
eval_bandwidth = pd.DataFrame()
for i in range(10):
    k=40+i*40
    for j in range(3):
        weights=w_matrix.spatial_weight(df, "u", "v", fix=False, bandwidth=k, kernel_type='Binary')
        dfx=fast_train.predict(df, feature_names, target_name, weights, model)
        r22=r2_score(dfx.y,dfx.predy)
        eval_bandwidth.loc[i, j]=r22
  • Predict and Evaluate with updated Spatial Weights: Using new spatial weight based on bandwidth searching.
# 06 Predict
df_pred=fast_train.predict(df, feature_names, target_name, weights, model)
# 07 Evaluate
from sklearn.metrics import r2_score
r2_score(df_pred.y,df_pred.predy)
  • Explain models: explainer_names must be in a list["LIME","SHAP", 'Importance'].
# 08 Explain
df_explain=fast_train.explain(df, feature_names, target_name, weights,model, explainer_names)
#for append GWR interpolating SHAP, user must install/import  mgwr and specify the coordinatea
df_explain=fast_train.explainexplain(df, feature_names, target_name, weights, model_class, explainer_names,xcoord='u',ycoord='v',shap_gwr=True)
  • Partial Dependence Estimation: Sample bin is used here, two mode: even or original values.
# 09 Partial dependence
df_pd=fast_train.partial_dependence(df, model,  feature_names, target_name, weights,num_samples=50,even=False)
  • Use trained models: MUST BE CAREFUL, It might be time consuming while use HyperOpt.
#10 Trained models
sk_models,predictions=train_model.train_sklearn(df, feature_names, target_name, weights, model)
# 11 Explain with trained sci-learn models
df_sk=train_model.explain_models(df, feature_names, target_name, weights, sk_models, explainer_names)
#12 PDE with trained sci-learn models
df_sk_pd_even=train_model.partial_dependence_model(df, sk_models, feature_names, target_name, weights,num_samples=50)

# 13 Explain with trained HyperOpt models: SUPER TIME CONSUMING
from hpsklearn import HyperoptEstimator, xgboost_regression, mlp_regressor
from hyperopt import tpe
hymodel=xgboost_regression
#max_eval=5 for 900 points, it takes 3 hours
hy_models,predictions=train_model.train_hysklearn(df, feature_names, target_name, weights, hymodel,max_evals=1, trial_timeout=60)

# 14 Explain with trained models. MUST set  skleanrmodel=False
df_hy=train_model.explain_models(df, feature_names, target_name, weights, hy_models, explainer_names,skleanrmodel=False)
# 14 Partial dependence with trained models. Same as Previous one
df_hy_pd_even=train_model.partial_dependence_model(df, hy_models, feature_names, target_name, weights,num_samples=50)

Through rigorous testing on synthetic datasets and real-world dataset, our framework has proven to enhance the interpretability and accuracy of geospatial predictions in both regression and classification tasks. It effectively elucidates spatial variability, representing a significant advancement in the precision of predictions and offering a novel perspective for understanding spatial phenomena.

Conclusion

Our integrated framework marks a significant step forward in geographic analysis. By combining local spatial weights, XAI, and advanced machine learning, we offer a powerful tool for analyzing and interpreting complex geospatial data. This approach not only improves the accuracy and interpretability of geospatial predictions but also provides a fresh perspective on spatial phenomena.

Contact

For further information, inquiries, or collaborations, please contact us at lingboliu@fas.harvard.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

XGeoML-0.1.6.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

XGeoML-0.1.6-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file XGeoML-0.1.6.tar.gz.

File metadata

  • Download URL: XGeoML-0.1.6.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for XGeoML-0.1.6.tar.gz
Algorithm Hash digest
SHA256 601f270e748c89dddfb31d0f0f202b740e53dc230f5a028cfdc8772b5af30df1
MD5 eb3095a2fa75b1d7256a3f16b3246e32
BLAKE2b-256 d4815627314ce790e6244ae712b3ffd9540372765b15d41781161c33b351aba4

See more details on using hashes here.

File details

Details for the file XGeoML-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: XGeoML-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for XGeoML-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 091ae82bc64b89f79e76813182c95dff6927ca3abbd3fdedece8ae4c88a044be
MD5 1bf031386a913ad31f888d5964779485
BLAKE2b-256 4881169a1ce12c76b785a9f219bf66217fbb6374d29cc0dad10b53b6bcb45634

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page