Skip to main content

Function to evaluate impact of NIR pre-processing techniques on spectral data

Project description

Function compare_preprocessing can be used on any NIR spectral data if Y values are available.
Y can include one or several variables.

Function evaluates impact of different pre-processing techniques and combinations using multiblock partial least squares (MBPLS). Each block in MBPLS is a pre-processed spectral data.

Different pre-processing techniques evaluated:

  • baseline
  • de-trend
  • EMSC
  • MSC
  • SNV
  • Savitzky Golay derivatives (different polynomial and derivatives orders can be tested as well as the size of the moving window)

Blocks in MBPLS include:

  • pre-processing techniques and combinations (several techniques applied to same data)
  • original spectral data (starting point)
  • 20 blocks of random noise called false signals (reference for destroyed information)

Analyst can choose to only compare scatter corrections techniques or only derivatives or both. It is also possible to set the number of pre-processing techniques which can be applied to one same data. By default, only a single pre-processing technique and a combination of 2 are tested. NB: EMSC and MSC can not be applied together

For MBPLS, analyst can choose:

  • number of principal components
  • to autoscale or center each block
  • to autoscale or center Y

Blocks are represented in superloading plots.
Model performances (adjusted R2, RMSECV) and variable importance on projection (VIP) are calculated for each block by cross validation. Number of random picks for cross validation and number of lines predicted in each cross validation can be set by the analyst. Effective rank for each block is calculated as well.

CALL FUNCTION

combination, datasets, datasets0, R2_all, R2adj_all, RMSECV_all, VIP_all, Ef_all, Wt = compare_preprocessing(X0, y)

INPUT ARGUMENTS

  1. X0 (n x k) data to test pre-processing techniques on
  2. y (n x m) property under-study

OPTIONAL INPUT ARGUMENTS

nbPC: number of principal components for PLS in the mbPLS decomposition (default value=2)
nb_comb: maximum number of pre-processing techniques applied on same data (default value=2)
auto_x: autoscale data after applying pre-processing technique if auto_x=1, if not data centered (default value=1)
auto_y: autoscale variables to predict if auto_y=1, if not centered (default value=1)
nb: number of random picks for cross validation
CVnb: number of samples predicted in each cross validation
only_sg: only test Savitzky-Golay if only_sg=1 (default value=0)
svg_order: Savitzky-Golay polynomials orders to test
svg_deriv: Savitzky-Golay derivatives orders to test
svg_window: Savitzky-Golay window sizes to test
sg_op: test only Savitzky-Golay pretreatments with the same order of polynomial and derivative if sg_op=0 (default value=0)

OUTPUT

  1. combination: Pre-processing options tested
  2. datasets: data X0 after each pre-processing option and autoscaled or centered
  3. datasets0: data X0 after each pre-processing option
  4. R2_all: R2 values for each y variable predicted for each pre-processing technique tested (block)
  5. R2adj_all: Adjusted for each y variable predicted for each pre-processing technique tested (block)
  6. RMSECV_all: Root mean square error by cross validation for each y variable predicted for each pre-processing technique tested (block)
  7. VIP_all: PLS variable importance in projection for each y variable predicted for each pre-processing technique tested (block)
  8. W: Superloadings from MBPLS

EXAMPLES

Two full examples, along with datasets are provided in folder 'tests' of 'Download Files'. Please refer to 'NIR_preprocess_example.pdf' for full details

  • Example 1: Artificial dataset
  • Example 2: Corn dataset

COMPATIBILITY

\compare_preprocessing tested on Python 3.8 using the following modules:
- numpy 1.19.2
- matplotlib 3.3.2
- copy
- itertools
- RG 0.0.66 (available from Pypi at: https://pypi.org/project/RG/)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NIR_preprocess-0.0.2.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

NIR_preprocess-0.0.2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file NIR_preprocess-0.0.2.tar.gz.

File metadata

  • Download URL: NIR_preprocess-0.0.2.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for NIR_preprocess-0.0.2.tar.gz
Algorithm Hash digest
SHA256 726423f899ddfca6d54b1187edd602c92f5a53b16ec5cfb772e6b0d3597e23ce
MD5 f5bf88065c87100e7f98bd3f7146dd84
BLAKE2b-256 53dacb329dea961fec84f5a245af8d9e3beff21ae611b1199c1268bf88de168d

See more details on using hashes here.

File details

Details for the file NIR_preprocess-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: NIR_preprocess-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for NIR_preprocess-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 be532e4b52fe37f6ee92ff351c3b382e1272e4e1101b07f7405ae601411376f0
MD5 8500cd0bde881f2339727bc3b85e6c66
BLAKE2b-256 7057f410d04ed50c927caea7990f331fbe95345deca3f3f2152a92aaf9629537

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page