Skip to main content

Spectrometry and spectral analysis tools from BUAIIR datasets

Project description

BUAIIR Spectra

BUAIIR Spectra is a Python library designed to simplify data loading and batching for spectral analysis tasks using an open-source dataset collected and maintained by BUAIIR.

The library provides a clean interface for creating Dataset and DataLoader objects, with built-in preprocessing and conversion utilities compatible with common machine learning frameworks.


Dataset Overview

The spectral dataset was collected using three different devices:

Device Name Spectral Range(nm)
BIO_SCIENCE 3648
SCAN CODER 12
LOW COST device 381

For each device, spectral data was collected across three crop types:

Crop Total Samples Class Breakdown
Beans 15 5 HLT, 5 BRD, 5 BLB
Maize 15 5 HLT, 5 MSV, 5 MLN
Cassava 15 5 HLT, 5 CMD, 5 CBB

Each crop type was subjected to controlled inoculation with viral and bacterial diseases, resulting in multiple classification labels per crop.

Class Definitions

Each crop contains a healthy control class plus disease-specific classes:

Crop Class Code Description
Beans HLT Healthy / Control
Beans BLB Bean Bacterial Blight
Beans BRD Bean Rust Disease
Maize HLT Healthy / Control
Maize MSV Maize Streak Virus
Maize MLN Maize Lethal Necrosis
Cassava HLT Healthy / Control
Cassava CMD Cassava Mosaic Disease
Cassava CBB Cassava Bacterial Blight

Data Collection Period

The dataset was collected over a period of 15 weeks, with repeated sampling across all classes, crops, and devices.


Data Loading

The library provides utilities for loading spectral data into structured machine learning pipelines.

It supports:

  • Dataset object creation
  • DataLoader batching
  • Standardized preprocessing and conversion
  • Compatibility with deep learning frameworks such as PyTorch

Example usage:

Installation

You can install the buaiir-spectra library using pip:

pip install buaiir-spectra

Requirements

Make sure you have Python 3.8+ installed. The library is designed to work with common scientific Python packages such as NumPy and PyTorch.

Dataloading per device

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# path to the data
DATA_PATH = '/home/usr/Datasets/spectra_data'

# Loading BIO_SCIENCE data
dataset_bio = SpectralDataset(DATA_PATH, device=Device.BIO_SCIENCE)

# Loading SCAN CODER data
dataset_scan_coder = SpectralDataset(DATA_PATH, device=Device.SCAN_CODER)

# Loading LOW COST data
dataset_low_cost = SpectralDataset(DATA_PATH, device= Device.LOW_COST)

# Reading single readings
x, y = dataset_scan_coder[0]
print(x.shape, y.shape)

Nature of Target(y) The target of both the dataset and dataloader return a tuple (titer_value, expert_score, week, disease_class) where each element describes a specific aspect of the spectral and laboratory observation collected from each plant sample.

Feature Type Description
titer_value Float Ground truth measurement collected from each plant, aligned with the spectral reading.
expert_score Integer Visual severity score assigned by an agricultural expert based on observable symptoms.
week Integer Week of data collection during the 15-week sampling period.
disease_class Float Label representing the disease type or health status of the plant sample.

Feature matrix (x) currently contains only the calibrated wavelength reading of each device

Dynamic creation of Dataset object

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# Path to where data is store
DATA_PATH = '/home/wilfred/Datasets/spectra_data'

for test_device in Device.get_devices():
    dataset = SpectralDataset(DATA_PATH, device=test_device) # dataset
    x, y = dataset[0] # load sample data
    print(f'Prinitng shapes for device: {test_device.name}')
    print(x.shape, y.shape)

Properties of dataset

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# create Dataset object for LOW COST only
dataset = SpectralDataset('/home/usr/Datasets/spectra_data', Device.SCAN_CODER)

# Get wavelength range for the device
wavelength = dataset.wavelength
print(wavelength)

# Get disease class codes used in batching
disease_classes = dataset.disease_class_codes
print(f'Supported disease classes {disease_classes}')

# Get plant_type codes used in the batching
plant_types = dataset.plant_type_codes
print(f'Supported crop types: {plant_types}')

Data batching

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
from buaiir_spectra.data.dataloader import SpectralDataLoader

# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'

dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)

# creat the dataloader
dataloader = SpectralDataLoader(dataset, batch_size=4)

# iterate over the batchs
for batch in dataloader:
    # extract the x_batch and y_batch
    x_batch, y_batch = batch

    # print the shape of the batches
    print(x_batch.shape, y_batch.shape)

Parameters for data wrangling provided by Dataloader

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device


# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'

# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.BIO_SCIENCE)

# Creating a dataloader with plant labels shuffled
dataloader_with_shuffled_plants = SpectralDataLoader(dataset, batch_size=40, permutate_plants= True)

# Creating a dataloader with weeks shuffled
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate_weeks= True)

# Creating a dataloader with completely shuffled data, best for regularization
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate=True)

Extracting label specific or week specific data

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device

DATA_PATH = '/home/wilfred/Datasets/spectra_data'

# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)
dataloader = SpectralDataLoader(dataset, batch_size=150)


# Load data for only a single disease class e.g CMD
x, y = dataloader.load_data_of_disease_class('CMD')

# Load data for a specific label across all weeks
x_1, y_1 = dataloader.load_data_of('BBLB1')

# Checking all supported labels
supported_labels = dataloader.labels
print(supported_labels)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buaiir_spectra-1.2.4.tar.gz (41.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

buaiir_spectra-1.2.4-py3-none-any.whl (42.2 MB view details)

Uploaded Python 3

File details

Details for the file buaiir_spectra-1.2.4.tar.gz.

File metadata

  • Download URL: buaiir_spectra-1.2.4.tar.gz
  • Upload date:
  • Size: 41.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for buaiir_spectra-1.2.4.tar.gz
Algorithm Hash digest
SHA256 04895ff41faebae000e9e56ff5b55bffe815757793e00990c7a0e9f579c4551c
MD5 3e6ed365688f782b883c0b67ffcfc304
BLAKE2b-256 2aae75fb7faaf3aee97d4f514afa2b56c8b8b80143fb10e7d395d2f23a6b38b6

See more details on using hashes here.

File details

Details for the file buaiir_spectra-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: buaiir_spectra-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 42.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for buaiir_spectra-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fae3aec3da17f9e75c6eea5d9365749fba901d55fbd7d58a8644a8f0ab3401f7
MD5 c09bed5fff740e6b3b035e5e88113abe
BLAKE2b-256 065e60b06932f32c52f5dc2ced68af0c2f424e0342d4bf4beb6b5a1ea2ecd4d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page