Skip to main content

Pinard: a Pipeline for Nirs Analysis ReloadeD.

Project description

alt text

Pinard is a python package that provides functionalities dedicated to the preprocessing and processing of NIRS data and allows the fast development of prediction models thanks to the extension of scikit-learn pipelines.

NIRS measures the light reflected from a sample after irradiating it with wavelengths ranging from visible to shortwave infrared. This provides a signature of the physical and chemical characteristics of the sample. Thanks to its low cost NIRS has been widely used for determining chemical traits in various fields - pharmaceutical, agricultural, and food sectors (Shepherd and Walsh, 2007; Wójcicki, 2015; Biancolillo and Marini, 2018; Pasquini, 2018) Although NIRS data are simple to acquire, they quickly generate a very large amount of information and this information must be processed to allow quality predictions for desired traits. Pinard provides a set of python functionalities dedicated to the preprocessing and processing of NIRS data and allows the fast development of prediction models thanks to the extension of scikit-learn pipelines:

  • Collection of spectra preprocessings: Baseline, StandardNormalVariate, RobustNormalVariate, SavitzkyGolay, Normalize, Detrend, MultiplicativeScatterCorrection, Derivate, Gaussian, Haar, Wavelet...,
  • Collection of splitting methods based of spectra similarity metrics: Kennard Stone, SPXY, random sampling, stratified sampling, k-mean...,
  • An extension of sklearn pipelines to provide 2D tensors to keras regressors.

Moreover, because Pinard extends scikit-learn, all scikit-learn features are natively available (split, regressor, etc.).

Authors

Pinard is a python package developed at CIRAD (www.cirad.fr) by Grégory Beurier (beurier@cirad.fr) in collaboration with Denis Cornet (denis.cornet@cirad.fr) and Lauriane Rouan (lauriane.rouan@cirad.fr)

INSTALLATION

pip install pinard

USAGE

Basic usage

x, y = utils.load_csv(xcal_csv, ycal_csv, x_hdr=0, y_hdr=0, remove_na=True) # Load data
train_index, test_index = train_test_split_idx(x, y=y, method="kennard_stone", metric="correlation" test_size=0.25, random_state=rd_seed) # Get splitting indices
X_train, y_train, X_test, y_test = x[train_index], y[train_index], x[test_index], y[test_index]

# Declare preprocessing pipeline
preprocessing = [   ('id', pp.IdentityTransformer()),
                    ('savgol', pp.SavitzkyGolay()),
                    ('derivate', pp.Derivate()), 
                    Pipeline([('_sg1',pp.SavitzkyGolay()),('_sg2',pp.SavitzkyGolay())]))] # reification for 2nd order preprocessing

# Declare complete pipeline
pipeline = Pipeline([
    ('scaler', MinMaxScaler()), # scaling
    ('preprocessing', FeatureUnion(preprocessing)), # preprocessing
    ('PLS',  sklearn.PLS()) # regressor
])

# Estimator including y values scaling
estimator = TransformedTargetRegressor(regressor = pipeline, transformer = MinMaxScaler())

# Training
estimator.fit(X_train, y_train)

# Predictions
Y_preds = estimator.predict(X_test)

More complete examples can be found in examples folders and executed on google collab:

more examples to come soon...

ROADMAP

  • Sklearn compatibility:
    • Extend sklearn pipeline to fully integrate data augmentation (x,y along the pipeline management)
    • Extend sklearn pipeline to integrate validation data (required for Deep Learning tuning)
    • Add folds and iterable results to all splitting methods (cross validation / KFold compatibility)
  • Ease of use:
    • Extend model_selection helpers (metrics, methods, etc.)
    • Provide dedicated serialization methods to avoid compatibility problems between different frameworks (i.e. Keras + sklearn)
  • Data augmentation:
    • Auto-balance sample augmentation based on groups/classes/metric - augmentation count replaced by ratio/weight
    • Allow augmentation methods parameters

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinard-0.9.7.tar.gz (30.8 kB view hashes)

Uploaded Source

Built Distribution

pinard-0.9.7-py3-none-any.whl (37.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page