Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

Large Soil Spectral Models (LSSM)

This is a Python package allowing to reproduce the research work done by Franck Albinet in the context of a PhD @ KU Leuven titled “Multiscale Characterization of Exchangeable Potassium Content in Soil to Remediate Agricultural Land Affected by Radioactive Contamination using Machine Learning, Soil Spectroscopy and Remote Sensing”.

Our first paper Albinet, F., Peng, Y., Eguchi, T., Smolders, E., Dercon, G., 2022. Prediction of exchangeable potassium in soil through mid-infrared spectroscopy and deep learning: From prediction to explainability. Artificial Intelligence in Agriculture 6, 230–241. investigated the possibility to predict exchangeable potassium in soil using large Mid-infrared soil spectral libraries and Deep Learning. Code available here.

We are now exploring the potential to characterize and predict exchangeable potassium using both Near- and Mid-infrared soil spectroscopy, with a focus on leveraging advanced Deep Learning models such as ResNet and ViT transformers through transfer learning.

Our Deep Learning pipeline is primarily based on the approach described by Jeremy Howard.

Install

pip install lssm

Getting started

We demonstrate a typical workflow below to showcase our method.

from pathlib import Path
from functools import partial

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
                            DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm

Loading training & validation data

Load model from timm python package, Deep Learning State-Of-The-Art (SOTA) pre-trained models:

model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)

Automatically download large spectral libraries developed by our colleagues at WCRC. We focus on exchangeable potassium in the example below:

analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data

Reading & selecting data ...

A bit of data features and target preprocessing:

X = Pipeline([('to_abs', ToAbsorbance()), 
              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]

Typical train/test split to get a train and valid dataset:

n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], 
                                                      test_size=0.1,
                                                      stratify=ds_name[:n_smp], 
                                                      random_state=41)

Finally, creating a custom PyTorch DataLoader:

train_ds, valid_ds = [SpectralDataset(X, y, ) 
                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)

Training

epochs = 1
lr = 5e-3

# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs
#    - put to GPU
#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
#    - resize the 2D version
#    - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(), 
       metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr, 
                cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)

<div>
  <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
  0.00% [0/1 00:00&lt;?]
</div>
&#10;

4.39% [55/1252 00:23<08:42 0.084]

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.1.2

Jul 3, 2024

0.1.1

Jul 3, 2024

This version

0.1.0

Jul 2, 2024

0.0.10

Jul 2, 2024

0.0.9

Nov 28, 2023

0.0.8

Nov 21, 2023

0.0.7

Nov 21, 2023

0.0.6

Nov 21, 2023

0.0.5

Nov 8, 2023

0.0.4

Nov 7, 2023

0.0.3

Sep 22, 2023

0.0.2

Sep 22, 2023

0.0.1

Sep 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lssm-0.1.0.tar.gz (25.6 kB view hashes)

Uploaded Jul 2, 2024 Source

Built Distribution

lssm-0.1.0-py3-none-any.whl (25.1 kB view hashes)

Uploaded Jul 2, 2024 Python 3

Hashes for lssm-0.1.0.tar.gz

Hashes for lssm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`84edaec52d2f4267bdd8b6cf4c0b6f9c17ecd03b8a614a75fdd95332935066f6`
MD5	`aa94c7542b8caaa6c83d4c99a15044d9`
BLAKE2b-256	`181a5f4c55b0015727feff12dce3fc2a73e3f38da2c7c39ffec012d048d72cdd`

Hashes for lssm-0.1.0-py3-none-any.whl

Hashes for lssm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f3921eb901818ec58d2361521eee68a5065461b126bb67c403e51c4e714692f`
MD5	`1630f130f6b5eb05c78e92056bbe5840`
BLAKE2b-256	`49fc051fb814855f2570fc47c876064d8ffd78f5bcade8a4330dd3a66f186565`