Skip to main content

A toolkit for detecting spatial stratified heterogeneity

Project description

GeoDetector · 地理探测器

English | 中文

A Python toolkit for detecting spatial stratified heterogeneity based on the Geographical Detector method proposed by Wang et al. (2010).

基于王劲峰等 (2010) 提出的地理探测器方法,用于探测空间分层异质性的 Python 工具包。


English

Installation

pip install geodetector

Quick Start

from geodetector import GeoDetector
from geodetector.dataset import load_disease

df = load_disease()
gd = GeoDetector(factors=["type", "region", "level"], target="incidence")
gd.fit(df)

# Factor detector
print(gd.q_values_)
#   variable  q_value   p_value  significant
# 0     type   0.3857  0.372145        False
# 1   region   0.6378  0.000129         True
# 2    level   0.6067  0.043382         True

# Summary and plots
print(gd.summary())
gd.plot()               # horizontal bar chart of q-values
gd.plot_interaction()   # interaction q-value heatmap
gd.plot_dashboard()     # all four detectors in one figure

Mathematical Foundation

The q-statistic measures the explanatory power of a stratification X on Y:

$$q = 1 - \frac{SSW}{SST} = 1 - \frac{\sum_{h=1}^L N_h \cdot \mathrm{Var}(Y_h)}{\sum_{i=1}^N (Y_i - \bar{Y})^2}$$

where $L$ is the number of strata and $\mathrm{Var}(Y_h)$ is the within-stratum variance.

The q-statistic is algebraically identical to the ANOVA effect size η² and the R² of a stratum-mean predictor: q ≡ R² ≡ η².

Significance Test (Non-central F-test)

$$F = \frac{N-L}{L-1} \cdot \frac{q}{1-q} \sim F(L-1,\ N-L,\ \lambda)$$

The non-centrality parameter λ follows the R GD / gdverse formula.

Four Core Detectors

Detector Question Output
Factor Does stratification X explain Y? q-value, p-value
Interaction Do X₁ and X₂ have synergistic effects? Interaction type (0–4)
Risk Are Y means significantly different between strata? t-test results
Ecological Do X₁ and X₂ differ significantly in explanatory power? F-test results

Interaction Types

Type Condition Label
0 q(X₁∩X₂) < min(q₁, q₂) Weaken, nonlinear
1 min ≤ q(X₁∩X₂) ≤ max Weaken, uni-variable
2 max < q(X₁∩X₂) < q₁+q₂ Enhance, bi-variable
3 q(X₁∩X₂) ≈ q₁+q₂ Independent
4 q(X₁∩X₂) > q₁+q₂ Enhance, nonlinear

Core API

from geodetector import (
    GeoDetector,
    FactorDetector,
    InteractionDetector,
    RiskDetector,
    EcologicalDetector,
    discretize,
    Discretizer,
    OptimalDiscretizer,
)

# Discretize continuous variables
strata = discretize(data, method="quantile", n_strata=5)

# Individual detectors
fd = FactorDetector(discretize="quantile", n_strata=5)
fd.fit(X[["factor1"]], y)
print(fd.q_value_, fd.p_value_)

id_ = InteractionDetector()
id_.fit(X, y)
print(id_.interaction_q_)
print(id_.interaction_type_)

Advanced Extensions

from geodetector.extensions import (
    OPGDDetector,      # Optimal Parameter GD
    GOZHDetector,      # Decision-tree-based GD
    RGDDetector,       # Robust GD
    shapley_decompose, # Shapley decomposition (LESH)
    lesh,              # Full LESH analysis
)

OPGD — Optimal Parameter Geographical Detector

Searches over discretization methods and stratum counts to maximize q-value.

opgd = OPGDDetector(methods=["sd", "equal", "geometric", "quantile", "natural"],
                    k_range=(3, 8))
opgd.fit(data, factors=["xa", "xb", "xc"], target="y")
print(opgd.opt_params_)  # optimal method & k per variable
print(opgd.q_values_)

GOZH — Geographically Optimal Zones-based Heterogeneity

Uses decision trees to automatically find optimal strata. Interaction detection uses joint decision trees per factor pair (matching gdverse).

gozh = GOZHDetector(max_depth=3, min_samples_leaf=5)
gozh.fit(data, factors=["xa", "xb"], target="y")
print(gozh.n_zones_)          # number of zones per factor
print(gozh.interaction_pairs_) # joint-tree interaction results

LESH — Locally Explained Stratified Heterogeneity

Shapley-value decomposition of q-statistics. Supports both traditional discretization (method="quantile") and GOZH-style decision-tree discretization (method="gozh").

# Traditional discretization
result = shapley_decompose(data, ["xa", "xb", "xc"], "y", method="quantile")
print(result[["variable", "shapley_value", "shapley_pct"]])

# GOZH-style (matching gdverse LESH)
result = lesh(data, ["xa", "xb", "xc"], "y", method="gozh", max_depth=3)
print(result["shapley"])
print(result["interaction"])  # SPD-attributed interaction

RGD — Robust Geographical Detector

Variance-based change-point detection for discretization, robust to outliers. Supports multi-k search with LOESS elbow detection for optimal stratum count selection.

rgd = RGDDetector(discnum=range(3, 8), strategy=2, increase_rate=0.05)
rgd.fit(data, factors=["xa", "xb"], target="y")
print(rgd.opt_discnum_)   # optimal discnum per factor
print(rgd.all_q_values_)  # q-values across all discnums

Comparison with R Implementations

This package is aligned with two R reference implementations:

Feature GD-main (GD) gdverse This package
Factor detector q ✓ same formula ✓ same formula ✓ same formula
F-test degrees of freedom uses original N, L uses filtered N, L uses filtered N, L (matches gdverse)
Non-central F λ ✓ same formula ✓ same formula ✓ same formula
Interaction q12 per-pair gd() per-pair gd() per-pair consistent subset
Ecological F q₂/q₁ (α≈0.2) (1−q₁)/(1−q₂) 1-tailed (1−q₁)/(1−q₂) 2-tailed
OPGD defaults user-specified 5 methods 5 methods (matches gdverse)
GOZH interaction joint tree per pair joint tree per pair
LESH discretization rpart_disc (GOZH) supports both
RGD discnum search 3:8 + LOESS 3:8 + LOESS

Degrees of freedom note: When single-observation strata exist, GD-main computes the F-test degrees of freedom using the original (unfiltered) N and L, while gdverse recomputes N and L after filtering. This package follows gdverse's approach. The difference is negligible when all strata have ≥2 observations.

Ecological detector note: This package uses a two-tailed F-test with F = (1−q₁)/(1−q₂). GD-main uses F = q₂/q₁ with qf(0.9, n−1, n−1). gdverse uses one-tailed pf(F, n−1, n−1, lower.tail=FALSE). The two-tailed test is more conservative and symmetric (order-independent).

References

  • Wang JF, Li XH, Christakos G, Liao YL, Zhang T, Gu X & Zheng XY. 2010. Geographical detectors-based health risk assessment. IJGIS 24(1): 107-127.
  • Wang JF, Zhang TL, Fu BJ. 2016. A measure of spatial stratified heterogeneity. Ecological Indicators 67: 250-256.
  • Song Y, Wang J, Ge Y, Xu C. 2020. An optimal parameters-based geographical detector model. GIScience & Remote Sensing 57(5): 593-610.
  • Luo P, Song Y, et al. 2022. GOZH model. ISPRS Journal of Photogrammetry and Remote Sensing 185: 111-128.
  • Li Y, Luo P, Song Y, et al. 2023. LESH model. International Journal of Digital Earth 16(2): 4533-4552.
  • Zhang Z, Song Y, Wu P. 2022. Robust geographical detector. IJAEOG 109: 102782.
  • Lv W, Lei Y, et al. 2025. gdverse: An R Package for Spatial Stratified Heterogeneity Family. Transactions in GIS 29.

License

MIT


中文

安装

pip install geodetector

快速开始

from geodetector import GeoDetector
from geodetector.dataset import load_disease

df = load_disease()
gd = GeoDetector(factors=["type", "region", "level"], target="incidence")
gd.fit(df)

# 因子探测器
print(gd.q_values_)
#   variable  q_value   p_value  significant
# 0     type   0.3857  0.372145        False
# 1   region   0.6378  0.000129         True
# 2    level   0.6067  0.043382         True

# 摘要与可视化
print(gd.summary())
gd.plot()               # q值水平柱状图
gd.plot_interaction()   # 交互作用热力图
gd.plot_dashboard()     # 四合一仪表盘

数学基础

q 统计量衡量分层 X 对结果 Y 的解释力:

$$q = 1 - \frac{SSW}{SST} = 1 - \frac{\sum_{h=1}^L N_h \cdot \mathrm{Var}(Y_h)}{\sum_{i=1}^N (Y_i - \bar{Y})^2}$$

其中 $L$ 为分层数,$\mathrm{Var}(Y_h)$ 为层内方差。

q 统计量与 ANOVA 效应量 η² 及层均值预测器的 R² 代数等价:q ≡ R² ≡ η²

显著性检验(非中心 F 检验)

$$F = \frac{N-L}{L-1} \cdot \frac{q}{1-q} \sim F(L-1,\ N-L,\ \lambda)$$

非中心参数 λ 采用 R GD / gdverse 公式。

四种核心探测器

探测器 回答的问题 输出
因子 分层 X 能否解释 Y? q 值、p 值
交互 X₁ 与 X₂ 是否有协同/拮抗效应? 交互类型 (0–4)
风险 不同分层之间 Y 的均值是否有显著差异? t 检验结果
生态 X₁ 与 X₂ 的解释力是否有显著差异? F 检验结果

交互作用类型

类型 条件 含义
0 q(X₁∩X₂) < min(q₁, q₂) 非线性减弱
1 min ≤ q(X₁∩X₂) ≤ max 单因子非线性减弱
2 max < q(X₁∩X₂) < q₁+q₂ 双因子增强
3 q(X₁∩X₂) ≈ q₁+q₂ 独立
4 q(X₁∩X₂) > q₁+q₂ 非线性增强

核心 API

from geodetector import (
    GeoDetector,          # 主控类
    FactorDetector,       # 因子探测器
    InteractionDetector,  # 交互探测器
    RiskDetector,         # 风险探测器
    EcologicalDetector,   # 生态探测器
    discretize,           # 离散化函数
    Discretizer,          # sklearn 兼容转换器
    OptimalDiscretizer,   # 最优离散化
)

# 连续变量离散化
strata = discretize(data, method="quantile", n_strata=5)

# 单独使用探测器
fd = FactorDetector(discretize="quantile", n_strata=5)
fd.fit(X[["factor1"]], y)
print(fd.q_value_, fd.p_value_)

高级扩展

from geodetector.extensions import (
    OPGDDetector,      # 最优参数地理探测器
    GOZHDetector,      # 决策树最优分区
    RGDDetector,       # 鲁棒地理探测器
    shapley_decompose, # Shapley 分解 (LESH)
    lesh,              # 完整 LESH 分析
)

OPGD — 最优参数地理探测器

遍历离散化方法和分层数,选择使 q 值最大的组合。

opgd = OPGDDetector(
    methods=["sd", "equal", "geometric", "quantile", "natural"],
    k_range=(3, 8)
)
opgd.fit(data, factors=["xa", "xb", "xc"], target="y")
print(opgd.opt_params_)  # 每个变量的最优方法 & k

GOZH — 决策树最优分区

使用决策树回归器自动寻找最优分层。交互检测使用联合决策树(与 gdverse 一致)。

gozh = GOZHDetector(max_depth=3, min_samples_leaf=5)
gozh.fit(data, factors=["xa", "xb"], target="y")
print(gozh.n_zones_)           # 每个因子的分区数
print(gozh.interaction_pairs_)  # 联合决策树交互结果

LESH — 局部解释的分层异质性

基于 Shapley 值的 q 统计量贡献分解。支持传统离散化 (method="quantile") 和 GOZH 决策树离散化 (method="gozh")。

# 传统离散化
result = shapley_decompose(data, ["xa", "xb", "xc"], "y", method="quantile")
print(result[["variable", "shapley_value", "shapley_pct"]])

# GOZH 模式 (匹配 gdverse LESH)
result = lesh(data, ["xa", "xb", "xc"], "y", method="gozh", max_depth=3)
print(result["shapley"])       # Shapley 分解结果
print(result["interaction"])   # SPD 归因的交互作用

RGD — 鲁棒地理探测器

基于方差的变点检测离散化方法,对异常值鲁棒。支持多 k 搜索和 LOESS 曲率检测自动选择最优分层数。

rgd = RGDDetector(discnum=range(3, 8), strategy=2, increase_rate=0.05)
rgd.fit(data, factors=["xa", "xb"], target="y")
print(rgd.opt_discnum_)   # 每个因子的最优分层数
print(rgd.all_q_values_)  # 所有分层数下的 q 值

与 R 实现的对比

本工具包与两个 R 参考实现对齐:

功能 GD-main (GD) gdverse 本工具包
因子探测器 q ✓ 相同公式 ✓ 相同公式 ✓ 相同公式
F 检验自由度 使用原始 N、L 使用过滤后 N、L 使用过滤后 N、L (匹配 gdverse)
非中心 F λ ✓ 相同公式 ✓ 相同公式 ✓ 相同公式
交互 q12 每对调用 gd() 每对调用 gd() 每对使用一致子集
生态检测器 F q₂/q₁ (α≈0.2) (1−q₁)/(1−q₂) 单侧 (1−q₁)/(1−q₂) 双侧
OPGD 默认方法 用户指定 5 种方法 5 种方法 (匹配 gdverse)
GOZH 交互 每对联合决策树 每对联合决策树
LESH 离散化 rpart_disc (GOZH) 支持两种方式
RGD 分层数搜索 3:8 + LOESS 3:8 + LOESS

自由度说明:当存在单观测分层时,GD-main 使用原始(未过滤)的 N 和 L 计算 F 检验自由度,而 gdverse 在移除单观测分层后重新计算 N 和 L。本工具包遵循 gdverse 的方式。当所有分层均含 ≥2 个观测值时差异可忽略。

生态检测器说明:本工具包使用双侧 F 检验,F = (1−q₁)/(1−q₂)。GD-main 使用 F = q₂/q₁ 配合 qf(0.9, n−1, n−1) 临界值。gdverse 使用单侧检验 pf(F, n−1, n−1, lower.tail=FALSE)。双侧检验更保守且对称(与因子顺序无关)。

参考文献

  • 王劲峰, 李新虎, Christakos G, 等. 2010. 地理探测器-based 健康风险评估. IJGIS 24(1): 107-127.
  • Wang JF, Zhang TL, Fu BJ. 2016. A measure of spatial stratified heterogeneity. Ecological Indicators 67: 250-256.
  • Song Y, Wang J, Ge Y, Xu C. 2020. 最优参数地理探测器模型. GIScience & Remote Sensing 57(5): 593-610.
  • Luo P, Song Y, et al. 2022. GOZH 模型. ISPRS JPRS 185: 111-128.
  • Li Y, Luo P, Song Y, et al. 2023. LESH 模型. Int. J. Digital Earth 16(2): 4533-4552.
  • Zhang Z, Song Y, Wu P. 2022. 鲁棒地理探测器. IJAEOG 109: 102782.
  • Lv W, Lei Y, et al. 2025. gdverse: 空间分层异质性家族的 R 包. Transactions in GIS 29.

许可证

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geodetector-0.2.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geodetector-0.2.0-py3-none-any.whl (50.7 kB view details)

Uploaded Python 3

File details

Details for the file geodetector-0.2.0.tar.gz.

File metadata

  • Download URL: geodetector-0.2.0.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for geodetector-0.2.0.tar.gz
Algorithm Hash digest
SHA256 64a32de1ad0265e9913eba6bdf881caeb20846b845d1366c3d4804223739ddbe
MD5 621edf6e56aa9a91a4860acacb3b7738
BLAKE2b-256 b9826f1a4a3757d715b99027f78f6ab329f45dfa1ca7a7269babd4451e515f1c

See more details on using hashes here.

File details

Details for the file geodetector-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: geodetector-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 50.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for geodetector-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cef0f40255e8dbd9d99e38ca7948aa0243ca87c092bb264f0ca94a9cdb53f4a6
MD5 1ba08d454359b1bad44daa95b46e06ee
BLAKE2b-256 952c477d3d0edb7aaf1250fdae496c7be408954270f140909b9c0f03733ed47a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page