Skip to main content

Many-to-many transfer of LIBS spectra across multiple experimental conditions

Project description

Spectra Transfer ACVAE

This repository is actively being refactored to make it as easy to use as possible. Here is everything you need to know to prepare your data, train the baseline predictors, and train the ACVAE model from scratch.

Quick Start: The Full Pipeline

If your data directories are already set up, you can run the entire data extraction, CNN baseline training, and ACVAE evaluation pipeline in just a few lines of code.

from libs_transfer.training import train_concentration_predictors
from libs_transfer.prepare_data import data_to_h5
from libs_transfer.training import train_acvae_pipeline
import warnings
import torch

warnings.filterwarnings("ignore")
torch.set_float32_matmul_precision('high')
torch.backends.cudnn.benchmark = True
torch.manual_seed(123456)

# 1. Parse raw text files into an HDF5 dataset
data_to_h5('./examples/example_raw_data', './examples/processed_data')

# 2. Train CNN Concentration Predictors for all conditions
train_concentration_predictors(data_folder='./examples/processed_data')

# 3. Train and Evaluate the ACVAE
acvae = train_acvae_pipeline(test_split=0.5)

Preparing the Data

Your raw data should be in standard tab-separated text files (e.g., exported from a LIBS Discovery system). The first column represents the wavelength ([nm]), and the subsequent columns represent the intensity ([a.u.]) of separete measurements.

Example Al2O3-AM.txt format:

  [nm]    [a.u]        [a.u]
200.000 47.646484   -17.080078 ...
200.020 98.522461   -17.432617 ...
...

Organize your raw .txt files into a nested directory structure so the script can automatically label your data by Atmosphere, Energy, and Sample Name:

raw_data_folder/
├── VACUUM/                 <-- Atmosphere
│   ├── 100/                <-- Energy
│      ├── Al2O3.txt       <-- Sample Name
│      └── SiO2.txt
├── EARTH/
│   ├── 50/
│      ├── Al2O3.txt
...

Use the data_to_h5 function to parse this folder and automatically generate the .h5 dataset and corresponding .json dictionaries:

from libs_transfer.prepare_data import data_to_h5

# This generates spectra.h5, atm_dict.json, perc_dict.json, and label_dict.json
data_to_h5(in_path='./raw_data_folder', out_folder='./processed_data')

Preparing the Concentration Predictors (CNN)

To evaluate how well the ACVAE transfers spectra between conditions, the pipeline requires baseline CNN models to predict chemical concentrations.

Ensure you have a Concentrations.xlsx file saved in your processed data folder. The first row should contain the element names (e.g., SIO2, AL2O3), and the first column should contain the sample names matching your .txt files.

Run the CNN training script. The script will automatically discover every Atmosphere + Energy combination in your .h5 file and train a specific CNN for each one.

from libs_transfer.training.CNN_conc_baseline import train_concentration_predictors

# Trains and saves a CNN and standard scaler for EVERY condition in spectra.h5
train_concentration_predictors(
    epochs=40, 
    batch_size=128, 
    data_folder='./processed_data'
)

After running this, your data folder will be populated with files like CNN_EARTH_50_state_dict.pth and conc_std_EARTH_50.joblib.

Training the ACVAE Model

Once your .h5 data and baseline CNNs are prepared, training the ACVAE is entirely automated.

The refactored train_acvae_pipeline will automatically:

Load your .h5 file and metadata dictionaries.

Detect all available conditions and load the corresponding CNN predictors.

Dynamically generate permutations for every possible transfer direction (e.g., VACUUM 100 $\rightarrow$ EARTH 50, EARTH 100 $\rightarrow$ EARTH 50, etc.).

Evaluate clustering accuracy, cosine similarity, and RMSE metrics on-the-fly.

from libs_transfer.training.discriminator_new_data import train_acvae_pipeline

# Start the automated training and evaluation pipeline
acvae_model = train_acvae_pipeline(
    data_path='./processed_data/spectra.h5', 
    data_dir='./processed_data/', 
    epochs=5, 
    batch_size=64,
    test_split=0.5
)

[!NOTE] If you need to drop a specific condition from training, you can modify the prepare_training_data call inside the pipeline script to pass exclude_id=X.

Using the Model

To use your trained model to transfer spectra between conditions, please refer to examples/transfer_spectra.py for reference on formatting inputs for the generator.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libs_transfer-0.1.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

libs_transfer-0.1.0-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file libs_transfer-0.1.0.tar.gz.

File metadata

  • Download URL: libs_transfer-0.1.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for libs_transfer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5649c4d3f7bfdb0cf4ffed3c59a66a5cf9b6d3cbc33a4642c9a26870f76c6ea3
MD5 acf4f5af10aa496a6e57f2141bace2f1
BLAKE2b-256 606db83399e2ce88d7d72684e3cbb9a93107f8d2d2a964cbbc3952a634b6b3b0

See more details on using hashes here.

File details

Details for the file libs_transfer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: libs_transfer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for libs_transfer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46ee539aa0a98e53da313dbb9c6aad19105ecc570507015e27faaf9dbb6bfc61
MD5 897a06cde9fd37bb4f8fc16b2d762dd1
BLAKE2b-256 71d9f9d5ccefc11881f15e08595549eb5973679a9b5d4234f66b6fee2c70c138

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page