pinard

Pinard: a Pipeline for Nirs Analysis ReloadeD.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

alt text

Pinard is a python package that provides functionalities dedicated to the preprocessing and processing of NIRS data and allows the fast development of prediction models thanks to the extension of scikit-learn pipelines.

NIRS measures the light reflected from a sample after irradiating it with wavelengths ranging from visible to shortwave infrared. This provides a signature of the physical and chemical characteristics of the sample. Thanks to its low cost NIRS has been widely used for determining chemical traits in various fields - pharmaceutical, agricultural, and food sectors (Shepherd and Walsh, 2007; Wójcicki, 2015; Biancolillo and Marini, 2018; Pasquini, 2018) Although NIRS data are simple to acquire, they quickly generate a very large amount of information and this information must be processed to allow quality predictions for desired traits. Pinard provides a set of python functionalities dedicated to the preprocessing and processing of NIRS data and allows the fast development of prediction models thanks to the extension of scikit-learn pipelines:

Collection of spectra preprocessings: Baseline, StandardNormalVariate, RobustNormalVariate, SavitzkyGolay, Normalize, Detrend, MultiplicativeScatterCorrection, Derivate, Gaussian, Haar, Wavelet...,
Collection of splitting methods based of spectra similarity metrics: Kennard Stone, SPXY, random sampling, stratified sampling, k-mean...,
An extension of sklearn pipelines to provide 2D tensors to keras regressors.

Moreover, because Pinard extends scikit-learn, all scikit-learn features are natively available (split, regressor, etc.).

Authors

Pinard is a python package developed at CIRAD (www.cirad.fr) by Grégory Beurier (beurier@cirad.fr) in collaboration with Denis Cornet (denis.cornet@cirad.fr) and Lauriane Rouan (lauriane.rouan@cirad.fr)

INSTALLATION

pip install pinard

USAGE

Basic usage

x, y = utils.load_csv(xcal_csv, ycal_csv, x_hdr=0, y_hdr=0, remove_na=True) # Load data
train_index, test_index = train_test_split_idx(x, y=y, method="kennard_stone", metric="correlation" test_size=0.25, random_state=rd_seed) # Get splitting indices
X_train, y_train, X_test, y_test = x[train_index], y[train_index], x[test_index], y[test_index]

# Declare preprocessing pipeline
preprocessing = [   ('id', pp.IdentityTransformer()),
                    ('savgol', pp.SavitzkyGolay()),
                    ('derivate', pp.Derivate()), 
                    Pipeline([('_sg1',pp.SavitzkyGolay()),('_sg2',pp.SavitzkyGolay())]))] # reification for 2nd order preprocessing

# Declare complete pipeline
pipeline = Pipeline([
    ('scaler', MinMaxScaler()), # scaling
    ('preprocessing', FeatureUnion(preprocessing)), # preprocessing
    ('PLS',  sklearn.PLS()) # regressor
])

# Estimator including y values scaling
estimator = TransformedTargetRegressor(regressor = pipeline, transformer = MinMaxScaler())

# Training
estimator.fit(X_train, y_train)

# Predictions
Y_preds = estimator.predict(X_test)

More complete examples can be found in examples folders and executed on google collab:

more examples to come soon...

ROADMAP

Sklearn compatibility:
- Extend sklearn pipeline to fully integrate data augmentation (x,y along the pipeline management)
- Extend sklearn pipeline to integrate validation data (required for Deep Learning tuning)
- Add folds and iterable results to all splitting methods (cross validation / KFold compatibility)
Ease of use:
- Extend model_selection helpers (metrics, methods, etc.)
- Provide dedicated serialization methods to avoid compatibility problems between different frameworks (i.e. Keras + sklearn)
Data augmentation:
- Auto-balance sample augmentation based on groups/classes/metric - augmentation count replaced by ratio/weight
- Allow augmentation methods parameters

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.0

Feb 7, 2024

1.0.1

Jun 27, 2023

1.0.0

Jun 27, 2023

This version

0.9.7

Nov 28, 2022

0.9.6

Nov 22, 2022

0.9.5

Nov 21, 2022

0.7.6

Jun 14, 2022

0.7.5

Jun 14, 2022

0.7.4

Jun 9, 2022

0.7.2

Jun 9, 2022

0.7.1

Jun 9, 2022

0.7

Jun 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinard-0.9.7.tar.gz (30.8 kB view hashes)

Uploaded Nov 28, 2022 Source

Built Distribution

pinard-0.9.7-py3-none-any.whl (37.5 kB view hashes)

Uploaded Nov 28, 2022 Python 3

Hashes for pinard-0.9.7.tar.gz

Hashes for pinard-0.9.7.tar.gz
Algorithm	Hash digest
SHA256	`a29903c2ebbbb372cf04a25b9e5ef5be41aa4a3146df4b5f083ed0dba4fc21e2`
MD5	`3b3f4d0e3ebd2874a79028d813b7c727`
BLAKE2b-256	`bc2ce7d454c916f2d72e98462e94a2f85effd3b523ca5c28ad5c88722cc9b3bc`

Hashes for pinard-0.9.7-py3-none-any.whl

Hashes for pinard-0.9.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d0e832f604accbd91922e288eac8d145ca6e628c68e4a93f7e3020cff0f43459`
MD5	`3ba7e0d115671337d7af6204ecf1fd48`
BLAKE2b-256	`bfd1889777bc84fe86205d2a345a6f0800f3911cffa4deaa78ff84be8b5e486e`