Spatial Random Forests for Daily Climate Records Reconstruction in Mexico
Project description
MEXsrfdcrPy
Spatial Random Forests for Daily Climate Records Reconstruction in Mexico
MEXsrfdcrPy is a Python package for reconstructing and interpolating daily climate station records in Mexico using spatial random forests. It trains a single global RandomForest model on all available stations using only:
- latitude,
- longitude,
- elevation, and
- calendar information (year, month, day-of-year, optional cyclic terms),
and then uses that model to:
- fill gaps in daily station series,
- evaluate spatial interpolation skill with leave-one-station-out (LOSO) experiments, and
- generate gridded daily fields at arbitrary resolutions.
The package is designed for large national datasets (e.g. 1991–2020 SMN network) and integrates naturally with Jupyter, Kaggle and other Python-based workflows.
Installation
From PyPI:
pip install MEXsrfdcrPy
or, for a local development install from GitHub:
git clone https://github.com/sasoryhaf91/MEXsrfdcrPy.git
cd MEXsrfdcrPy
python -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
pip install -e .[dev]
Project goals
MEXsrfdcrPy focuses on a simple but powerful idea:
Learn as much as possible about daily climate patterns from space (X, Y, Z) and time (T) alone, using a single model trained on decades of national station data.
This allows you to:
- reconstruct precipitation, minimum temperature, maximum temperature and evaporation at station locations,
- quantify interpolation performance station by station,
- produce daily climate grids (e.g. 1/16°) for long periods using a reusable global model, and
- benchmark the global model against local models and external products (e.g. NASA POWER, CHIRPS).
MEXsrfdcrPy is part of a broader open-source ecosystem around Mexican climate data, together with:
SMNdataR: R tools to download and process SMN station data.MissClimatePy: Python package for local spatial–temporal imputation at station level.
Quick start
1. LOSO evaluation for a region
import pandas as pd
from MEXsrfdcrPy.loso import evaluate_all_stations_fast
url = "https://zenodo.org/records/17636066/files/smn_mx_daily_1991_2020.csv"
data = pd.read_csv(url)
res = evaluate_all_stations_fast(
data,
id_col="station",
date_col="date",
lat_col="latitude",
lon_col="longitude",
alt_col="altitude",
target_col="prec",
prefix=["15"], # e.g. stations in Estado de México
start="1991-01-01",
end="2020-12-31",
include_target_pct=0.0, # strict LOSO
min_station_rows=9125, # ~25 years of valid data
rf_params=dict(
n_estimators=20,
max_depth=30,
random_state=42,
n_jobs=-1,
),
show_progress=True,
)
print(res.head())
This returns a tidy table with MAE, RMSE and R² per station and per temporal aggregation.
2. Train a global model for reuse
from MEXsrfdcrPy.grid import train_global_rf_target
model, meta, summary = train_global_rf_target(
data,
id_col="station", date_col="date",
lat_col="latitude", lon_col="longitude", alt_col="altitude",
target_col="tmin",
start="1991-01-01", end="2020-12-31",
min_rows_per_station=1825,
rf_params=dict(
n_estimators=15,
max_depth=30,
random_state=42,
n_jobs=-1,
),
model_path="models/global_tmin_rf.joblib",
meta_path="models/global_tmin_rf.meta.json",
)
print(summary.head())
The saved model + metadata can later be reused to predict on any grid or set of points.
3. Predict on a grid
from MEXsrfdcrPy.grid import predict_grid_daily_with_global_model
preds = predict_grid_daily_with_global_model(
grid_df=grid_clean, # DataFrame with [station, latitude, longitude, altitude]
model_path="models/global_tmin_rf.joblib",
meta_path="models/global_tmin_rf.meta.json",
start="1991-01-01",
end="2020-12-31",
batch_days=365,
out_path="preds/global_tmin_grid_1_16deg.parquet",
)
When out_path is provided, predictions are streamed directly to Parquet.
4. Compare against NASA POWER and local models
from MEXsrfdcrPy.loso import loso_predict_full_series_fast, plot_compare_obs_rf_nasa
station_id = 11020
y_col = "prec"
full_df, full_metrics, _, _ = loso_predict_full_series_fast(
data,
station_id=station_id,
id_col="station",
date_col="date",
lat_col="latitude",
lon_col="longitude",
alt_col="altitude",
target_col=y_col,
start="1991-01-01",
end="2020-12-31",
rf_params=dict(n_estimators=20, max_depth=30, random_state=42, n_jobs=-1),
k_neighbors=20,
include_target_pct=0.0,
)
NASA_COL = "PRECTOTCORR" # NASA POWER precipitation column
ax = plot_compare_obs_rf_nasa(
data=data,
station_id=station_id,
id_col="station",
date_col="date",
obs_col=y_col,
nasa_col=NASA_COL,
extra=series_1001,
extra_date_col="date",
extra_value_col="y_pred_full",
extra_label="Grid Model",
rf_df=full_df,
rf_date_col="date",
rf_value_col="y_pred_full",
rf_label="SRFI (0%)",
resample="D",
agg="sum",
ylabel="Rainfall [mm/day]",
title=f"Station {station_id} — Observed vs SRFI vs {NASA_COL} vs Grid Model",
)
This produces a figure comparing observations, NASA POWER, a local model and the global grid model, with MAE, RMSE and R² in the legend.
Documentation
Full API documentation and worked examples (Jupyter notebooks, Kaggle kernels) are planned for future releases. For now, the best reference is the docstrings in:
MEXsrfdcrPy.loso– LOSO evaluation, station-level reconstruction and plotting.MEXsrfdcrPy.grid– global model training and grid/point prediction utilities.
The JOSS paper in paper/paper.md provides a short conceptual overview.
Citation
If you use MEXsrfdcrPy in your work, please cite the software paper (once accepted) and the Zenodo record for this release.
Software paper (JOSS, in review):
Antonio-Fernández, H., Vaquera-Huerta, H., Rosengaus-Moshinsky, M. M., Pérez-Rodríguez, P., & Crossa, J. (2025). MEXsrfdcrPy: Spatial Random Forests for Daily Climate Records Reconstruction in Mexico. Journal of Open Source Software.
Zenodo record: (to be added)
A ready-to-use CITATION.cff file is included in the repository.
Contributing
Contributions are very welcome! Please:
- Open an issue on GitHub describing the bug, feature request or enhancement.
- Fork the repository and create a feature branch.
- Add or update tests when introducing new functionality.
- Run the test suite (e.g.
pytest) and ensure all tests pass. - Open a pull request referencing the relevant issue.
See CONTRIBUTING.md for more detailed guidelines.
License
MEXsrfdcrPy is released under the MIT license. See the LICENSE file for details.
Maintainer and contact
The project is maintained by Hugo Antonio-Fernández (@sasoryhaf91).
Feedback, issues and pull requests are welcome via the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mexsrfdcrpy-0.1.0.tar.gz.
File metadata
- Download URL: mexsrfdcrpy-0.1.0.tar.gz
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
374023ecac5b45afb9ba267d790d666aaf59f151a9fb2791a4df39e974384c12
|
|
| MD5 |
1cee4449aec2acd336e18d6538889d7c
|
|
| BLAKE2b-256 |
d351f25a25318c18983b4dffe6b63f65bc41e822aaf62dbf2b284e5185746e9c
|
File details
Details for the file mexsrfdcrpy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mexsrfdcrpy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fa934901e0711984146e842514a1fab4ff580863638833fc88caf8b2c48578f
|
|
| MD5 |
d438c4f08e9ffe41909a8b026958433b
|
|
| BLAKE2b-256 |
1d62e2817ad2831785f667a2f126f660745faaca7ae652816f8dff5a69cf337f
|