Skip to main content

Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)

Project description

Large Soil Spectral Models (LSSM)

This is a Python package allowing to reproduce the research work done by Franck Albinet in the context of a PhD @ KU Leuven titled “Multiscale Characterization of Exchangeable Potassium Content in Soil to Remediate Agricultural Land Affected by Radioactive Contamination using Machine Learning, Soil Spectroscopy and Remote Sensing”.

Our first paper Albinet, F., Peng, Y., Eguchi, T., Smolders, E., Dercon, G., 2022. Prediction of exchangeable potassium in soil through mid-infrared spectroscopy and deep learning: From prediction to explainability. Artificial Intelligence in Agriculture 6, 230–241. investigated the possibility to predict exchangeable potassium in soil using large Mid-infrared soil spectral libraries and Deep Learning. Code available here.

We are now exploring the potential to characterize and predict exchangeable potassium using both Near- and Mid-infrared soil spectroscopy, with a focus on leveraging advanced Deep Learning models such as ResNet and ViT transformers through transfer learning.

Our Deep Learning pipeline is primarily based on the approach described by Jeremy Howard.

Install

pip install lssm

Getting started

We demonstrate a typical workflow below to showcase our method.

from pathlib import Path
from functools import partial

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
                            DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm

Loading training & validation data

  1. Load model from timm python package, Deep Learning State-Of-The-Art (SOTA) pre-trained models:
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
  1. Automatically download large spectral libraries developed by our colleagues at WCRC. We focus on exchangeable potassium in the example below:
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
Reading & selecting data ...
  1. A bit of data features and target preprocessing:
X = Pipeline([('to_abs', ToAbsorbance()), 
              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]
  1. Typical train/test split to get a train and valid dataset:
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], 
                                                      test_size=0.1,
                                                      stratify=ds_name[:n_smp], 
                                                      random_state=41)
  1. Finally, creating a custom PyTorch DataLoader:
train_ds, valid_ds = [SpectralDataset(X, y, ) 
                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)

Training

epochs = 1
lr = 5e-3

# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs
#    - put to GPU
#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
#    - resize the 2D version
#    - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(), 
       metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr, 
                cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)
<style> /* Turns off some styling */ progress { /* gets rid of default border in Firefox and Opera. */ border: none; /* Needs to be in here for Safari polyfill so background images work as expected. */ background-size: auto; } progress:not([value]), progress:not([value])::-webkit-progress-bar { background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px); } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style>
<div>
  <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
  0.00% [0/1 00:00&lt;?]
</div>
&#10;

4.39% [55/1252 00:23<08:42 0.084]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lssm-0.1.2.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

lssm-0.1.2-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file lssm-0.1.2.tar.gz.

File metadata

  • Download URL: lssm-0.1.2.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for lssm-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9c574b047e4731915ee1b22be1ff933459a1d45862147ed8bf6ff886b0288574
MD5 e4430452df1e03ac8c6b485235c50e3e
BLAKE2b-256 4791483781e06dc1906b7eb2e5fc04da0f53a9eda5a881bf10eba1ad7d57fe70

See more details on using hashes here.

File details

Details for the file lssm-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: lssm-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for lssm-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 26a863d21874280b562962cb8944fad9860d88a45756ffef0029c7ff2fe0d945
MD5 8242f793d9b61d6f07437dd5bc48fd48
BLAKE2b-256 d503736579731d474507996605947d5d5f2f014b1ad6a85e61e54edd0c84cf52

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page