Skip to main content

A ensemble framework for explainable geospatial machine Learning models

Project description

An Ensemble Framework for Geospatial Machine Learning Models

GitHub: https://github.com/UrbanGISer/XGeoML

PYPI Homepage: https://pypi.org/project/XGeoML/0.1.4/

Installation: pip install XGeoML

This package addresses the critical challenge of analyzing and interpreting spatially varying effects in geographic analysis, stemming from the complexity and non-linearity of geospatial data. We introduce an innovative integrated framework that combines local spatial weights, Explainable Artificial Intelligence (XAI), and advanced machine learning technologies. This approach significantly bridges the gap between traditional geographic analysis models and contemporary machine learning methodologies.

Introduction

Geospatial data is inherently complex and non-linear, presenting significant challenges in analysis and interpretation. Traditional geographic analysis models often struggle to address these challenges, leading to gaps in understanding and interpretation.

Our Approach

We propose an innovative integrated framework that leverages local spatial weights, Explainable Artificial Intelligence (XAI), and advanced machine learning technologies. Our approach aims to bridge the gap between traditional methods and modern machine learning techniques, offering a more comprehensive tool for geographic analysis.

Features

  • Local Spatial Weights: Incorporates the spatial context of data, enhancing model sensitivity to geographical nuances.
  • Explainable Artificial Intelligence (XAI): Provides clarity on the decision-making process, improving the interpretability of the model's predictions.
  • Advanced Machine Learning Technologies: Utilizes cutting-edge algorithms to manage the complexity and non-linearity of geospatial data effectively.

Key Functions

  • Use built-in Spatial Weights: Generate Gaussian, Binary and GaussianBinary weight.
weights=w_matrix.spatial_weight(df, "u", "v", fix=False, bandwidth=80, kernel_type='Binary')
  • Import libpysal Spatial Weights: Accept all spatial weight.
import libpysal.weights as lw
points = df[['u', 'v']].values
w=lw.DistanceBand(points,threshold=6,binary=False)
weightpysal=w_matrix.from_libpysal(w)
  • Predict or Search Bandwidth with fast training model: Accept all sci-learn model.
# 01 Define key variables
feature_names=['x1','x2','x3','x4']
target_name="y"
explainer_names = ["LIME","SHAP", 'Importance']
turebeta= ['b_linear','b_circular', 'b_cos_basic',  'b_poly']

# 02 import  sklearn ML model
from sklearn.ensemble import  GradientBoostingRegressor
model=GradientBoostingRegressor

# 03 import  R2
from sklearn.metrics  import r2_score

# 04 generate weights
weights=w_matrix.spatial_weight(df, "u", "v", fix=False, bandwidth=80, kernel_type='Binary')

# 05 Bandwidth Searching
eval_bandwidth = pd.DataFrame()
for i in range(10):
    k=40+i*40
    for j in range(3):
        weights=w_matrix.spatial_weight(df, "u", "v", fix=False, bandwidth=k, kernel_type='Binary')
        dfx=fast_train.predict(df, feature_names, target_name, weights, model)
        r22=r2_score(dfx.y,dfx.predy)
        eval_bandwidth.loc[i, j]=r22
  • Predict and Evaluate with updated Spatial Weights: Using new spatial weight based on bandwidth searching.
# 06 Predict
df_pred=fast_train.predict(df, feature_names, target_name, weights, model)
# 07 Evaluate
from sklearn.metrics import r2_score
r2_score(df_pred.y,df_pred.predy)
  • Explain models: explainer_names must be in a list["LIME","SHAP", 'Importance'].
# 08 Explain
df_explain=fast_train.explain(df, feature_names, target_name, weights,model, explainer_names)
  • Partial Dependence Estimation: Sample bin is used here, two mode: even or original values.
# 09 Partial dependence
df_pd=fast_train.partial_dependence(df, model,  feature_names, target_name, weights,num_samples=50,even=False)
  • Use trained models: MUST BE CAREFUL, It might be time consuming while use HyperOpt.
#10 Trained models
sk_models,predictions=train_model.train_sklearn(df, feature_names, target_name, weights, model)
# 11 Explain with trained sci-learn models
df_sk=train_model.explain_models(df, feature_names, target_name, weights, sk_models, explainer_names)
#12 PDE with trained sci-learn models
df_sk_pd_even=train_model.partial_dependence_model(df, sk_models, feature_names, target_name, weights,num_samples=50)

# 13 Explain with trained HyperOpt models: SUPER TIME CONSUMING
from hpsklearn import HyperoptEstimator, xgboost_regression, mlp_regressor
from hyperopt import tpe
hymodel=xgboost_regression
#max_eval=5 for 900 points, it takes 3 hours
hy_models,predictions=train_model.train_hysklearn(df, feature_names, target_name, weights, hymodel,max_evals=1, trial_timeout=60)

# 14 Explain with trained models. MUST set  skleanrmodel=False
df_hy=train_model.explain_models(df, feature_names, target_name, weights, hy_models, explainer_names,skleanrmodel=False)
# 14 Partial dependence with trained models. Same as Previous one
df_hy_pd_even=train_model.partial_dependence_model(df, hy_models, feature_names, target_name, weights,num_samples=50)

Through rigorous testing on synthetic datasets and real-world dataset, our framework has proven to enhance the interpretability and accuracy of geospatial predictions in both regression and classification tasks. It effectively elucidates spatial variability, representing a significant advancement in the precision of predictions and offering a novel perspective for understanding spatial phenomena.

Conclusion

Our integrated framework marks a significant step forward in geographic analysis. By combining local spatial weights, XAI, and advanced machine learning, we offer a powerful tool for analyzing and interpreting complex geospatial data. This approach not only improves the accuracy and interpretability of geospatial predictions but also provides a fresh perspective on spatial phenomena.

Contact

For further information, inquiries, or collaborations, please contact us at lingboliu@fas.harvard.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

XGeoML-0.1.5.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

XGeoML-0.1.5-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file XGeoML-0.1.5.tar.gz.

File metadata

  • Download URL: XGeoML-0.1.5.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for XGeoML-0.1.5.tar.gz
Algorithm Hash digest
SHA256 db89a7479e8e13a49dc85699f2e3365549236dce1dad444c3bd9687ef1af41b4
MD5 caf14fcf072fc466953a2ede469edd4a
BLAKE2b-256 dc0cf01963870ef47e9a4ee7f4be38457be4242c9282bb25531bb45396fb8f8e

See more details on using hashes here.

File details

Details for the file XGeoML-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: XGeoML-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for XGeoML-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 93420e47629e2a06978cc09a0aca1e6492c259eb19392a225284d73ac7682397
MD5 e2ce1231b1012d4752359c3080e02cfb
BLAKE2b-256 6b970e45dc91005f796281f9b3f8cd1637a2118fe9359b53c724164bbbc0d2be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page