Geospatial Species Distribution Modeling with Ensemble Learning and Reinforcement Learning-based Threshold Optimization

These details have not been verified by PyPI

Project links

Project description

GeoXERL

Geospatial species distribution modeling with eXtreme Ensemble methods and Reinforcement Learning-based threshold optimization.

Overview

GeoXERL is a modular Python toolkit for species distribution modeling (SDM) and geospatial prediction tasks. It combines:

Multi-step data preprocessing — environment variable extraction, presence-point processing, background-point generation, dataset splitting, and feature-stack preparation.
Base model training & evaluation — unified interface for training and batch inference across multiple algorithms.
Ensemble methods — Bagging, Boosting, Stacking, Geographically Weighted Random Forest (GWRF), and SHAP-based RL feature selection.
Reinforcement-learning threshold optimization — Q-Learning and PPO agents that search for the optimal prediction threshold instead of using the default 0.5.

Installation

pip install geoxerl

Or install from source for the latest development version:

git clone https://github.com/wenshunzhang/GeoXERL.git
cd GeoXERL
pip install -e ".[dev]"

Requirements: Python >= 3.8, numpy, pandas, scikit-learn, rasterio, geopandas.

To install optional extras:

pip install geoxerl[rl]    # adds stable-baselines3 and gymnasium for PPO
pip install geoxerl[docs]  # adds Sphinx for building documentation

Quick start

Command line

# Run each step individually
geoxerl preprocess
geoxerl train
geoxerl ensemble --method stacking
geoxerl optimize

# Or run the full pipeline in one command
geoxerl run-all

# Check version
geoxerl --version

Python API

from geoxerl.data_preprocessing.main import main as preprocess
from geoxerl.base_models.train import main as train_models
from geoxerl.ensemble.stacking import main as run_ensemble
from geoxerl.threshold_optimization.q_main import main as optimize_threshold

# Step 1: preprocess raw environmental rasters
preprocess()

# Step 2: train base models
train_models()

# Step 3: build the ensemble
run_ensemble()

# Step 4: find the optimal prediction threshold via Q-Learning
optimize_threshold()

See the examples/ directory for ready-to-run scripts covering each stage.

Project structure

GeoXERL/
├── geoxerl/                          # Main package
│   ├── __init__.py
│   ├── __version__.py
│   ├── __main__.py                   # Enables python -m geoxerl
│   ├── cli.py                        # Command-line interface
│   ├── data_preprocessing/           # Steps 00-05: env vars -> feature stack
│   │   ├── 00_env_variables_preprocessing.py
│   │   ├── 01_env_variables_preprocessing.py
│   │   ├── 02_presence_points_processing.py
│   │   ├── 03_background_points_generation.py
│   │   ├── 04_dataset_splitting.py
│   │   ├── 05_prepare_feature_stack.py
│   │   ├── config.py
│   │   ├── main.py
│   │   └── utils.py
│   ├── base_models/                  # Model training, evaluation, batch inference
│   │   ├── models.py
│   │   ├── train.py
│   │   ├── evaluate.py
│   │   ├── batch_models.py
│   │   └── config.json
│   ├── ensemble/                     # Bagging, Boosting, Stacking, GWRF, PPO
│   │   ├── bagging.py
│   │   ├── boosting.py
│   │   ├── stacking.py
│   │   ├── gwrf.py
│   │   ├── gwrf_shap_analysis.py
│   │   ├── gwrf_shap_tif.py
│   │   ├── feature_selector_rl2.py
│   │   ├── ppo_main.py
│   │   ├── predict_gwrf.py
│   │   └── metrics.py
│   └── threshold_optimization/       # Q-Learning / PPO threshold search
│       ├── q_learning_optimizer.py
│       ├── q_main.py
│       ├── threshold_analyzer.py
│       ├── data_processor.py
│       ├── visualizer.py
│       └── config.py
├── tests/                            # Unit tests
├── examples/                         # Ready-to-run example scripts
├── docs/                             # Documentation
├── .github/workflows/                # CI/CD (tests + PyPI publish)
├── pyproject.toml
├── README.md
├── CHANGELOG.md
├── CONTRIBUTING.md
└── LICENSE

Module descriptions

`data_preprocessing`

Processes raw environmental raster layers and species occurrence records into a clean, analysis-ready dataset. Scripts are numbered 00-05 to indicate execution order; main.py runs them all in sequence.

Script	Purpose
`00` / `01`	Clip, reproject, and derive environmental variables from raw rasters
`02`	Filter and spatially thin species occurrence records
`03`	Generate background / pseudo-absence points
`04`	Split dataset into train / validation / test sets
`05`	Stack selected features into a single analysis-ready array

`base_models`

Provides a unified interface for fitting individual classifiers (train.py), computing standard SDM metrics — AUC, TSS, Kappa (evaluate.py), and running inference over large raster stacks (batch_models.py).

`ensemble`

Implements three classical ensemble strategies and two geospatial-aware methods:

Method	File	Notes
Bagging	`bagging.py`	Bootstrap aggregation
Boosting	`boosting.py`	Gradient boosting
Stacking	`stacking.py`	Meta-learner on base model outputs
GWRF	`gwrf.py`	Geographically Weighted Random Forest with SHAP explainability
PPO feature selector	`feature_selector_rl2.py` / `ppo_main.py`	RL agent that learns which features to include

`threshold_optimization`

Casts threshold selection as a reinforcement learning problem. The Q-Learning optimizer discretizes the threshold space into states and learns a policy through reward signals based on TSS / F1. threshold_analyzer.py and visualizer.py provide post-hoc analysis and plotting tools.

Configuration

Each module has its own config file. Edit these before running to set your data paths and hyperparameters:

Module	Config file
`data_preprocessing`	`geoxerl/data_preprocessing/config.py`
`base_models`	`geoxerl/base_models/config.json`
`ensemble`	`geoxerl/ensemble/config.py`
`threshold_optimization`	`geoxerl/threshold_optimization/config.py`

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for setup instructions, code style guidelines, and the pull request checklist.

# Set up development environment
git clone https://github.com/wenshunzhang/GeoXERL.git
cd GeoXERL
pip install -e ".[dev]"
pre-commit install

# Run tests
pytest tests/

Citation

If you use GeoXERL in your research, please cite:

@software{geoxerl2024,
  author  = {Zhang, Wenshun},
  title   = {GeoXERL: Geospatial Ensemble and Reinforcement Learning Toolkit for Species Distribution Modeling},
  year    = {2024},
  url     = {https://github.com/wenshunzhang/GeoXERL},
  version = {0.1.0}
}

License

MIT — see LICENSE for details.

Contact

Wenshun Zhang — zhangwenshun24@mails.ucas.ac.cn

University of Chinese Academy of Sciences

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Apr 4, 2026

This version

0.2.0

Apr 4, 2026

0.1.0 yanked

Mar 22, 2026

Reason this release was yanked:

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoxerl-0.2.0.tar.gz (154.0 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

geoxerl-0.2.0-py3-none-any.whl (190.0 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file geoxerl-0.2.0.tar.gz.

File metadata

Download URL: geoxerl-0.2.0.tar.gz
Upload date: Apr 4, 2026
Size: 154.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for geoxerl-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8114c979bddd4a6d59c6b84a478b3e09b3b344d7472e1b325fc9037515b7f739`
MD5	`311b421baf5dbdf4f0030b5da6cb9f81`
BLAKE2b-256	`3db1f0c640fe5043bd3c9ae7c48b358ead6d47d13514d2f5d0452a29d2624ef0`

See more details on using hashes here.

File details

Details for the file geoxerl-0.2.0-py3-none-any.whl.

File metadata

Download URL: geoxerl-0.2.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 190.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for geoxerl-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`198db0fc4cc597328378e4604fa297d4557f7a7484de58b809df73e7faa7fd53`
MD5	`7eed4059422bd65c23e63cf564b2c5a8`
BLAKE2b-256	`f4adb6019d6a8f511ce392ebb9ccb5dcd920e3eb396dfde56b62114887e9dbc6`

See more details on using hashes here.

geoxerl 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GeoXERL

Overview

Installation

Quick start

Command line

Python API

Project structure

Module descriptions

data_preprocessing

base_models

ensemble

threshold_optimization

Configuration

Contributing

Citation

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`data_preprocessing`

`base_models`

`ensemble`

`threshold_optimization`