Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
Project description
Large Soil Spectral Models (LSSM)
This is a Python package allowing to reproduce the research work done by Franck Albinet in the context of a PhD @ KU Leuven titled “Multiscale Characterization of Exchangeable Potassium Content in Soil to Remediate Agricultural Land Affected by Radioactive Contamination using Machine Learning, Soil Spectroscopy and Remote Sensing”.
Our first paper Albinet, F., Peng, Y., Eguchi, T., Smolders, E., Dercon, G., 2022. Prediction of exchangeable potassium in soil through mid-infrared spectroscopy and deep learning: From prediction to explainability. Artificial Intelligence in Agriculture 6, 230–241. investigated the possibility to predict exchangeable potassium in soil using large Mid-infrared soil spectral libraries and Deep Learning. Code available here.
We are now exploring the potential to characterize and predict exchangeable potassium using both Near- and Mid-infrared soil spectroscopy, with a focus on leveraging advanced Deep Learning models such as ResNet and ViT transformers through transfer learning.
Our Deep Learning pipeline is primarily based on the approach described by Jeremy Howard.
Install
pip install lssm
Getting started
We demonstrate a typical workflow below to showcase our method.
from pathlib import Path
from functools import partial
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from torch import optim, nn
import timm
from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm
Loading training & validation data
- Load model from
timm
python package, Deep Learning State-Of-The-Art (SOTA) pre-trained models:
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
- Automatically download large spectral libraries developed by our colleagues at WCRC. We focus on exchangeable potassium in the example below:
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
Reading & selecting data ...
- A bit of data features and target preprocessing:
X = Pipeline([('to_abs', ToAbsorbance()),
('cr', ContinuumRemoval(X_names))]).fit_transform(X)
y = Log1p().fit_transform(y)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]
- Typical train/test split to get a train and valid dataset:
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp],
test_size=0.1,
stratify=ds_name[:n_smp],
random_state=41)
- Finally, creating a custom PyTorch
DataLoader
:
train_ds, valid_ds = [SpectralDataset(X, y, )
for X, y, in [(X_train, y_train), (X_valid, y_valid)]]
# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)
Training
epochs = 1
lr = 5e-3
# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())
# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)
# A series of preprocessing performed on GPUs
# - put to GPU
# - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
# - resize the 2D version
# - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))
cbs = [DeviceCB(), gadf, resize, stats, TrainCB(),
metrics, ProgressCB(plot=False)]
learn = Learner(model, dls, nn.MSELoss(), lr=lr,
cbs=cbs+xtra, opt_func=optim.AdamW)
learn.fit(epochs)
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<div>
<progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
0.00% [0/1 00:00<?]
</div>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lssm-0.1.2.tar.gz
.
File metadata
- Download URL: lssm-0.1.2.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c574b047e4731915ee1b22be1ff933459a1d45862147ed8bf6ff886b0288574 |
|
MD5 | e4430452df1e03ac8c6b485235c50e3e |
|
BLAKE2b-256 | 4791483781e06dc1906b7eb2e5fc04da0f53a9eda5a881bf10eba1ad7d57fe70 |
File details
Details for the file lssm-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: lssm-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26a863d21874280b562962cb8944fad9860d88a45756ffef0029c7ff2fe0d945 |
|
MD5 | 8242f793d9b61d6f07437dd5bc48fd48 |
|
BLAKE2b-256 | d503736579731d474507996605947d5d5f2f014b1ad6a85e61e54edd0c84cf52 |