Spectrometry and spectral analysis tools from BUAIIR datasets
Project description
BUAIIR Spectra
BUAIIR Spectra is a Python library designed to simplify data loading and batching for spectral analysis tasks using an open-source dataset collected and maintained by BUAIIR.
The library provides a clean interface for creating Dataset and DataLoader objects, with built-in preprocessing and conversion utilities compatible with common machine learning frameworks.
Dataset Overview
The spectral dataset was collected using three different devices:
| Device Name | Spectral Range(nm) |
|---|---|
| BIO_SCIENCE | 3648 |
| SCAN CODER | 12 |
| LOW COST device | 381 |
For each device, spectral data was collected across three crop types:
| Crop | Total Samples | Class Breakdown |
|---|---|---|
| Beans | 15 | 5 HLT, 5 BRD, 5 BLB |
| Maize | 15 | 5 HLT, 5 MSV, 5 MLN |
| Cassava | 15 | 5 HLT, 5 CMD, 5 CBB |
Each crop type was subjected to controlled inoculation with viral and bacterial diseases, resulting in multiple classification labels per crop.
Class Definitions
Each crop contains a healthy control class plus disease-specific classes:
| Crop | Class Code | Description |
|---|---|---|
| Beans | HLT | Healthy / Control |
| Beans | BLB | Bean Bacterial Blight |
| Beans | BRD | Bean Rust Disease |
| Maize | HLT | Healthy / Control |
| Maize | MSV | Maize Streak Virus |
| Maize | MLN | Maize Lethal Necrosis |
| Cassava | HLT | Healthy / Control |
| Cassava | CMD | Cassava Mosaic Disease |
| Cassava | CBB | Cassava Bacterial Blight |
Data Collection Period
The dataset was collected over a period of 15 weeks, with repeated sampling across all classes, crops, and devices.
Data Loading
The library provides utilities for loading spectral data into structured machine learning pipelines.
It supports:
- Dataset object creation
- DataLoader batching
- Standardized preprocessing and conversion
- Compatibility with deep learning frameworks such as PyTorch
Example usage:
Installation
You can install the buaiir-spectra library using pip:
pip install buaiir-spectra
Requirements
Make sure you have Python 3.8+ installed. The library is designed to work with common scientific Python packages such as NumPy and PyTorch.
Dataloading per device
from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
# path to the data
DATA_PATH = '/home/usr/Datasets/spectra_data'
# Loading BIO_SCIENCE data
dataset_bio = SpectralDataset(DATA_PATH, device=Device.BIO_SCIENCE)
# Loading SCAN CODER data
dataset_scan_coder = SpectralDataset(DATA_PATH, device=Device.SCAN_CODER)
# Loading LOW COST data
dataset_low_cost = SpectralDataset(DATA_PATH, device= Device.LOW_COST)
# Reading single readings
x, y = dataset_scan_coder[0]
print(x.shape, y.shape)
Nature of Target(y) The target of both the dataset and dataloader return a tuple (titer_value, expert_score, week, disease_class) where each element describes a specific aspect of the spectral and laboratory observation collected from each plant sample.
| Feature | Type | Description |
|---|---|---|
| titer_value | Float | Ground truth measurement collected from each plant, aligned with the spectral reading. |
| expert_score | Integer | Visual severity score assigned by an agricultural expert based on observable symptoms. |
| week | Integer | Week of data collection during the 15-week sampling period. |
| disease_class | Float | Label representing the disease type or health status of the plant sample. |
Feature matrix (x) currently contains only the calibrated wavelength reading of each device
Dynamic creation of Dataset object
from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
# Path to where data is store
DATA_PATH = '/home/wilfred/Datasets/spectra_data'
for test_device in Device.get_devices():
dataset = SpectralDataset(DATA_PATH, device=test_device) # dataset
x, y = dataset[0] # load sample data
print(f'Prinitng shapes for device: {test_device.name}')
print(x.shape, y.shape)
Properties of dataset
from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
# create Dataset object for LOW COST only
dataset = SpectralDataset('/home/usr/Datasets/spectra_data', Device.SCAN_CODER)
# Get wavelength range for the device
wavelength = dataset.wavelength
print(wavelength)
# Get disease class codes used in batching
disease_classes = dataset.disease_class_codes
print(f'Supported disease classes {disease_classes}')
# Get plant_type codes used in the batching
plant_types = dataset.plant_type_codes
print(f'Supported crop types: {plant_types}')
Data batching
from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
from buaiir_spectra.data.dataloader import SpectralDataLoader
# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)
# creat the dataloader
dataloader = SpectralDataLoader(dataset, batch_size=4)
# iterate over the batchs
for batch in dataloader:
# extract the x_batch and y_batch
x_batch, y_batch = batch
# print the shape of the batches
print(x_batch.shape, y_batch.shape)
Parameters for data wrangling provided by Dataloader
from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device
# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'
# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.BIO_SCIENCE)
# Creating a dataloader with plant labels shuffled
dataloader_with_shuffled_plants = SpectralDataLoader(dataset, batch_size=40, permutate_plants= True)
# Creating a dataloader with weeks shuffled
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate_weeks= True)
# Creating a dataloader with completely shuffled data, best for regularization
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate=True)
Extracting label specific or week specific data
from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device
DATA_PATH = '/home/wilfred/Datasets/spectra_data'
# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)
dataloader = SpectralDataLoader(dataset, batch_size=150)
# Load data for only a single disease class e.g CMD
x, y = dataloader.load_data_of_disease_class('CMD')
# Load data for a specific label across all weeks
x_1, y_1 = dataloader.load_data_of('BBLB1')
# Checking all supported labels
supported_labels = dataloader.labels
print(supported_labels)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file buaiir_spectra-1.2.0.tar.gz.
File metadata
- Download URL: buaiir_spectra-1.2.0.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fd977b235885e6a2fb62ef04ed5c2b34a037b73537a1eac64e22bea49e6778f
|
|
| MD5 |
d4ffed1e16e1edf4bc0cb1aa658d1334
|
|
| BLAKE2b-256 |
5175c46c3dfcddfa6c544a64cb5f00073c3c361a6be4862091d5a1e19e001651
|
File details
Details for the file buaiir_spectra-1.2.0-py3-none-any.whl.
File metadata
- Download URL: buaiir_spectra-1.2.0-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36fe38896228d0949870d8bc80195a278c2d7307dfe5954c04bda953de218a07
|
|
| MD5 |
b065615eeba466bfb21d9c1b0cb34bf1
|
|
| BLAKE2b-256 |
b4c33f6e1fb71c98341f2229c03029f3a424abe3f305085e3333a940ae5d9b77
|