Skip to main content

Spectrometry and spectral analysis tools from BUAIIR datasets

Project description

BUAIIR Spectra

BUAIIR Spectra is a Python library designed to simplify data loading and batching for spectral analysis tasks using an open-source dataset collected and maintained by BUAIIR.

The library provides a clean interface for creating Dataset and DataLoader objects, with built-in preprocessing and conversion utilities compatible with common machine learning frameworks.


Dataset Overview

The spectral dataset was collected using three different devices:

Device Name Spectral Range(nm)
BIO_SCIENCE 3648
SCAN CODER 12
LOW COST device 381

For each device, spectral data was collected across three crop types:

Crop Total Samples Class Breakdown
Beans 15 5 HLT, 5 BRD, 5 BLB
Maize 15 5 HLT, 5 MSV, 5 MLN
Cassava 15 5 HLT, 5 CMD, 5 CBB

Each crop type was subjected to controlled inoculation with viral and bacterial diseases, resulting in multiple classification labels per crop.

Class Definitions

Each crop contains a healthy control class plus disease-specific classes:

Crop Class Code Description
Beans HLT Healthy / Control
Beans BLB Bean Bacterial Blight
Beans BRD Bean Rust Disease
Maize HLT Healthy / Control
Maize MSV Maize Streak Virus
Maize MLN Maize Lethal Necrosis
Cassava HLT Healthy / Control
Cassava CMD Cassava Mosaic Disease
Cassava CBB Cassava Bacterial Blight

Data Collection Period

The dataset was collected over a period of 15 weeks, with repeated sampling across all classes, crops, and devices.


Data Loading

The library provides utilities for loading spectral data into structured machine learning pipelines.

It supports:

  • Dataset object creation
  • DataLoader batching
  • Standardized preprocessing and conversion
  • Compatibility with deep learning frameworks such as PyTorch

Example usage:

Installation

You can install the buaiir-spectra library using pip:

pip install buaiir-spectra

Requirements

Make sure you have Python 3.8+ installed. The library is designed to work with common scientific Python packages such as NumPy and PyTorch.

Dataloading per device

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# path to the data
DATA_PATH = '/home/usr/Datasets/spectra_data'

# Loading BIO_SCIENCE data
dataset_bio = SpectralDataset(DATA_PATH, device=Device.BIO_SCIENCE)

# Loading SCAN CODER data
dataset_scan_coder = SpectralDataset(DATA_PATH, device=Device.SCAN_CODER)

# Loading LOW COST data
dataset_low_cost = SpectralDataset(DATA_PATH, device= Device.LOW_COST)

# Reading single readings
x, y = dataset_scan_coder[0]
print(x.shape, y.shape)

Nature of Target(y) The target of both the dataset and dataloader return a tuple (titer_value, expert_score, week, disease_class) where each element describes a specific aspect of the spectral and laboratory observation collected from each plant sample.

Feature Type Description
titer_value Float Ground truth measurement collected from each plant, aligned with the spectral reading.
expert_score Integer Visual severity score assigned by an agricultural expert based on observable symptoms.
week Integer Week of data collection during the 15-week sampling period.
disease_class Float Label representing the disease type or health status of the plant sample.

Feature matrix (x) currently contains only the calibrated wavelength reading of each device

Dynamic creation of Dataset object

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# Path to where data is store
DATA_PATH = '/home/wilfred/Datasets/spectra_data'

for test_device in Device.get_devices():
    dataset = SpectralDataset(DATA_PATH, device=test_device) # dataset
    x, y = dataset[0] # load sample data
    print(f'Prinitng shapes for device: {test_device.name}')
    print(x.shape, y.shape)

Properties of dataset

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# create Dataset object for LOW COST only
dataset = SpectralDataset('/home/usr/Datasets/spectra_data', Device.SCAN_CODER)

# Get wavelength range for the device
wavelength = dataset.wavelength
print(wavelength)

# Get disease class codes used in batching
disease_classes = dataset.disease_class_codes
print(f'Supported disease classes {disease_classes}')

# Get plant_type codes used in the batching
plant_types = dataset.plant_type_codes
print(f'Supported crop types: {plant_types}')

Data batching

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
from buaiir_spectra.data.dataloader import SpectralDataLoader

# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'

dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)

# creat the dataloader
dataloader = SpectralDataLoader(dataset, batch_size=4)

# iterate over the batchs
for batch in dataloader:
    # extract the x_batch and y_batch
    x_batch, y_batch = batch

    # print the shape of the batches
    print(x_batch.shape, y_batch.shape)

Parameters for data wrangling provided by Dataloader

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device


# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'

# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.BIO_SCIENCE)

# Creating a dataloader with plant labels shuffled
dataloader_with_shuffled_plants = SpectralDataLoader(dataset, batch_size=40, permutate_plants= True)

# Creating a dataloader with weeks shuffled
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate_weeks= True)

# Creating a dataloader with completely shuffled data, best for regularization
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate=True)

Extracting label specific or week specific data

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device

DATA_PATH = '/home/wilfred/Datasets/spectra_data'

# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)
dataloader = SpectralDataLoader(dataset, batch_size=150)


# Load data for only a single disease class e.g CMD
x, y = dataloader.load_data_of_disease_class('CMD')

# Load data for a specific label across all weeks
x_1, y_1 = dataloader.load_data_of('BBLB1')

# Checking all supported labels
supported_labels = dataloader.labels
print(supported_labels)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buaiir_spectra-1.2.3.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

buaiir_spectra-1.2.3-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file buaiir_spectra-1.2.3.tar.gz.

File metadata

  • Download URL: buaiir_spectra-1.2.3.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for buaiir_spectra-1.2.3.tar.gz
Algorithm Hash digest
SHA256 557d6070c855dadf02a4f5e198b59422791ba215145383294685c718b7355ee6
MD5 1cefeb933705c111a1c07b5119aa09ed
BLAKE2b-256 493257a8a39c550d802e40158b77f1badc5bd81f982b1f85ea59b9f81859ea6f

See more details on using hashes here.

File details

Details for the file buaiir_spectra-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: buaiir_spectra-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for buaiir_spectra-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fd3b2b8e8e496f49ec9cd3d867f4705dc64180eb9aea82dbdbb0d35de1457709
MD5 148c8039f1bd8f3d88a965c40627cdea
BLAKE2b-256 57b79863c6e3176155899ad90c3473ef829f071b352ac77c3def2c12872d61d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page