Many-to-many transfer of LIBS spectra across multiple experimental conditions
Project description
Spectra Transfer ACVAE
This repository is actively being refactored to make it as easy to use as possible. Here is everything you need to know to prepare your data, train the baseline predictors, and train the ACVAE model from scratch.
Quick Start: The Full Pipeline
If your data directories are already set up, you can run the entire data extraction, CNN baseline training, and ACVAE evaluation pipeline in just a few lines of code.
from libs_transfer.training import train_concentration_predictors
from libs_transfer.prepare_data import data_to_h5
from libs_transfer.training import train_acvae_pipeline
import warnings
import torch
warnings.filterwarnings("ignore")
torch.set_float32_matmul_precision('high')
torch.backends.cudnn.benchmark = True
torch.manual_seed(123456)
# 1. Parse raw text files into an HDF5 dataset
data_to_h5('./examples/example_raw_data', './examples/processed_data')
# 2. Train CNN Concentration Predictors for all conditions
train_concentration_predictors(data_folder='./examples/processed_data')
# 3. Train and Evaluate the ACVAE
acvae = train_acvae_pipeline(test_split=0.5)
Preparing the Data
Your raw data should be in standard tab-separated text files (e.g., exported from a LIBS Discovery system). The first column represents the wavelength ([nm]), and the subsequent columns represent the intensity ([a.u.]) of separete measurements.
Example Al2O3-AM.txt format:
[nm] [a.u] [a.u]
200.000 47.646484 -17.080078 ...
200.020 98.522461 -17.432617 ...
...
Organize your raw .txt files into a nested directory structure so the script can automatically label your data by Atmosphere, Energy, and Sample Name:
raw_data_folder/
├── VACUUM/ <-- Atmosphere
│ ├── 100/ <-- Energy
│ │ ├── Al2O3.txt <-- Sample Name
│ │ └── SiO2.txt
├── EARTH/
│ ├── 50/
│ │ ├── Al2O3.txt
...
Use the data_to_h5 function to parse this folder and automatically generate the .h5 dataset and corresponding .json dictionaries:
from libs_transfer.prepare_data import data_to_h5
# This generates spectra.h5, atm_dict.json, perc_dict.json, and label_dict.json
data_to_h5(in_path='./raw_data_folder', out_folder='./processed_data')
Preparing the Concentration Predictors (CNN)
To evaluate how well the ACVAE transfers spectra between conditions, the pipeline requires baseline CNN models to predict chemical concentrations.
Ensure you have a Concentrations.xlsx file saved in your processed data folder. The first row should contain the element names (e.g., SIO2, AL2O3), and the first column should contain the sample names matching your .txt files.
Run the CNN training script. The script will automatically discover every Atmosphere + Energy combination in your .h5 file and train a specific CNN for each one.
from libs_transfer.training.CNN_conc_baseline import train_concentration_predictors
# Trains and saves a CNN and standard scaler for EVERY condition in spectra.h5
train_concentration_predictors(
epochs=40,
batch_size=128,
data_folder='./processed_data'
)
After running this, your data folder will be populated with files like CNN_EARTH_50_state_dict.pth and conc_std_EARTH_50.joblib.
Training the ACVAE Model
Once your .h5 data and baseline CNNs are prepared, training the ACVAE is entirely automated.
The refactored train_acvae_pipeline will automatically:
Load your .h5 file and metadata dictionaries.
Detect all available conditions and load the corresponding CNN predictors.
Dynamically generate permutations for every possible transfer direction (e.g., VACUUM 100 $\rightarrow$ EARTH 50, EARTH 100 $\rightarrow$ EARTH 50, etc.).
Evaluate clustering accuracy, cosine similarity, and RMSE metrics on-the-fly.
from libs_transfer.training.discriminator_new_data import train_acvae_pipeline
# Start the automated training and evaluation pipeline
acvae_model = train_acvae_pipeline(
data_path='./processed_data/spectra.h5',
data_dir='./processed_data/',
epochs=5,
batch_size=64,
test_split=0.5
)
[!NOTE] If you need to drop a specific condition from training, you can modify the
prepare_training_datacall inside the pipeline script to passexclude_id=X.
Using the Model
To use your trained model to transfer spectra between conditions, please refer to examples/transfer_spectra.py for reference on formatting inputs for the generator.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file libs_transfer-0.1.0.tar.gz.
File metadata
- Download URL: libs_transfer-0.1.0.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5649c4d3f7bfdb0cf4ffed3c59a66a5cf9b6d3cbc33a4642c9a26870f76c6ea3
|
|
| MD5 |
acf4f5af10aa496a6e57f2141bace2f1
|
|
| BLAKE2b-256 |
606db83399e2ce88d7d72684e3cbb9a93107f8d2d2a964cbbc3952a634b6b3b0
|
File details
Details for the file libs_transfer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: libs_transfer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46ee539aa0a98e53da313dbb9c6aad19105ecc570507015e27faaf9dbb6bfc61
|
|
| MD5 |
897a06cde9fd37bb4f8fc16b2d762dd1
|
|
| BLAKE2b-256 |
71d9f9d5ccefc11881f15e08595549eb5973679a9b5d4234f66b6fee2c70c138
|