Skip to main content

Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)

Project description

Large Soil Spectral Models (LSSM)

This is a Python package allowing to reproduce the research work done by Franck Albinet in the context of a PhD @ KU Leuven titled “Multiscale Characterization of Exchangeable Potassium Content in Soil to Remediate Agricultural Land Affected by Radioactive Contamination using Machine Learning, Soil Spectroscopy and Remote Sensing”.

Our first paper Albinet, F., Peng, Y., Eguchi, T., Smolders, E., Dercon, G., 2022. Prediction of exchangeable potassium in soil through mid-infrared spectroscopy and deep learning: From prediction to explainability. Artificial Intelligence in Agriculture 6, 230–241. investigated the possibility to predict exchangeable potassium in soil using large Mid-infrared soil spectral libraries and Deep Learning. Code available here.

We are now exploring the potential to characterize and predict exchangeable potassium using both Near- and Mid-infrared soil spectroscopy, with a focus on leveraging advanced Deep Learning models such as ResNet and ViT transformers through transfer learning.

Our Deep Learning pipeline is primarily based on the approach described by Jeremy Howard.

Install

pip install lssm

Getting started

We demonstrate a typical workflow below to showcase our method.

from pathlib import Path
from functools import partial

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
                            DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm

Loading training & validation data

  1. Load model from timm python package, Deep Learning State-Of-The-Art (SOTA) pre-trained models:
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
  1. Automatically download large spectral libraries developed by our colleagues at WCRC. We focus on exchangeable potassium in the example below:
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
Reading & selecting data ...
  1. A bit of data features and target preprocessing:
X = Pipeline([('to_abs', ToAbsorbance()), 
              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]
  1. Typical train/test split to get a train and valid dataset:
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], 
                                                      test_size=0.1,
                                                      stratify=ds_name[:n_smp], 
                                                      random_state=41)
  1. Finally, creating a custom PyTorch DataLoader:
train_ds, valid_ds = [SpectralDataset(X, y, ) 
                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)

Training

epochs = 1
lr = 5e-3

# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs
#    - put to GPU
#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
#    - resize the 2D version
#    - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(), 
       metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr, 
                cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)
<style> /* Turns off some styling */ progress { /* gets rid of default border in Firefox and Opera. */ border: none; /* Needs to be in here for Safari polyfill so background images work as expected. */ background-size: auto; } progress:not([value]), progress:not([value])::-webkit-progress-bar { background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px); } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style>
<div>
  <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
  0.00% [0/1 00:00&lt;?]
</div>
&#10;

4.39% [55/1252 00:23<08:42 0.084]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lssm-0.1.0.tar.gz (25.6 kB view hashes)

Uploaded Source

Built Distribution

lssm-0.1.0-py3-none-any.whl (25.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page