A PyThon implementation of NNGLS
Project description
GeospaNN
Package based on the paper: Neural networks for geospatial data
This is the package repository for the method proposed in the paper. To install locally, use the following command:
pip install git+https://github.com/WentaoZhan1998/NN-GLS.git#egg=geospaNN
An easy pipeline for a simulation experiment:
First import the modules and set up the parameters
import torch
import geospaNN
import numpy as np
# Define the Friedman's function, and specify the dimension of input covariates.
def f5(X): return (10*np.sin(np.pi*X[:,0]*X[:,1]) + 20*(X[:,2]-0.5)**2 + 10*X[:,3] +5*X[:,4])/6
p = 5; funXY = f5
# Set the parameters for the spatial process.
sigma = 1
phi = 3/np.sqrt(2)
tau = 0.01
theta = torch.tensor([sigma, phi, tau])
n = 1000 # Size of the simulated sample.
nn = 20 # Neighbor size used for NNGP.
batch_size = 50 # Batch size for training the neural networks.
Next, simulate and split the data.
torch.manual_seed(2024)
# Simulate the spatially correlated data with spatial coordinates randomly sampled on a [0, 10]^2 squared domain.
X, Y, coord, cov, corerr = geospaNN.Simulation(n, p, nn, funXY, theta, range=[0, 10])
# Build the nearest neighbor graph, 'data' is in the torch_geometric.data.Data class.
data = geospaNN.make_graph(X, Y, coord, nn)
# Split data into training, validation, testing sets.
data_train, data_val, data_test = geospaNN.split_data(X, Y, coord, neighbor_size=20,
test_proportion=0.2)
Compose the mlp structure and train easily.
# Define the mlp structure (torch.nn) to use.
mlp = torch.nn.Sequential(
torch.nn.Linear(p, 50),
torch.nn.ReLU(),
torch.nn.Linear(50, 20),
torch.nn.ReLU(),
torch.nn.Linear(20, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1),
)
# Define the NN-GLS corresponding model.
model = geospaNN.nngls(p=p, neighbor_size=nn, coord_dimensions=2, mlp=mlp, theta=torch.tensor([1.5, 5, 0.1]))
# Define the NN-GLS training class with learning rate and tolerance.
nngls_model = geospaNN.nngls_train(model, lr = 0.01, min_delta = 0.001)
# Train the model.
training_log = nngls_model.train(data_train, data_val, data_test,
Update_init = 10, Update_step = 10)
Estimation from the model.
train_estimate = model.estimate(data_train.x)
Kriging prediction from the model.
test_predict = model.predict(data_train, data_test)
Running examples:
-
A simulation experiment with a common spatial setting is shown here
-
A real data experiment is shown here.
- The PM2.5 data is collected from the U.S. Environmental Protection Agency datasets for each state are collected and bound together to obtain 'pm25_2022.csv'. daily PM2.5 files are subsets of 'pm25_2022.csv' produced by 'realdata_preprocess.py'. One can skip the preprocessing and use the daily files directory.
- The meteorological data is collected from the National Centers for Environmental Prediction’s (NCEP) North American Regional Reanalysis (NARR) product. The '.nc' (netCDF) files should be downloaded from the website and saved in the root directory to run 'realdata_preprocess.py'. Otherwise, one may skip the preprocessing and use covariate files directly.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
geospann-0.1.1.tar.gz
(2.9 MB
view hashes)
Built Distribution
geospann-0.1.1-py3-none-any.whl
(11.8 kB
view hashes)