buaiir-spectra

Spectrometry and spectral analysis tools from BUAIIR datasets

These details have not been verified by PyPI

Project description

BUAIIR Spectra

BUAIIR Spectra is a Python library designed to simplify data loading and batching for spectral analysis tasks using an open-source dataset collected and maintained by BUAIIR.

The library provides a clean interface for creating Dataset and DataLoader objects, with built-in preprocessing and conversion utilities compatible with common machine learning frameworks.

Dataset Overview

The spectral dataset was collected using three different devices:

Device Name	Spectral Range(nm)
BIO_SCIENCE	3648
SCAN CODER	12
LOW COST device	381

For each device, spectral data was collected across three crop types:

Crop	Total Samples	Class Breakdown
Beans	15	5 HLT, 5 BRD, 5 BLB
Maize	15	5 HLT, 5 MSV, 5 MLN
Cassava	15	5 HLT, 5 CMD, 5 CBB

Each crop type was subjected to controlled inoculation with viral and bacterial diseases, resulting in multiple classification labels per crop.

Class Definitions

Each crop contains a healthy control class plus disease-specific classes:

Crop	Class Code	Description
Beans	HLT	Healthy / Control
Beans	BLB	Bean Bacterial Blight
Beans	BRD	Bean Rust Disease
Maize	HLT	Healthy / Control
Maize	MSV	Maize Streak Virus
Maize	MLN	Maize Lethal Necrosis
Cassava	HLT	Healthy / Control
Cassava	CMD	Cassava Mosaic Disease
Cassava	CBB	Cassava Bacterial Blight

Data Collection Period

The dataset was collected over a period of 15 weeks, with repeated sampling across all classes, crops, and devices.

Data Loading

The library provides utilities for loading spectral data into structured machine learning pipelines.

It supports:

Dataset object creation
DataLoader batching
Standardized preprocessing and conversion
Compatibility with deep learning frameworks such as PyTorch

Example usage:

Installation

You can install the buaiir-spectra library using pip:

pip install buaiir-spectra

Requirements

Make sure you have Python 3.8+ installed. The library is designed to work with common scientific Python packages such as NumPy and PyTorch.

Dataloading per device using load_spectra function

from buaiir_spectra.data.load_fn import load_spectra

# Load data for BIO_SCIENCE device
lf = load_spectra(device=Device.BIO_SCIENCE, shuffle=False, no_files_per_load=4, load_with_images=True)

for batch in lf:
    x, y, images = next(iter(lf))
    print(f'x_shape: {x.shape}',f'y_shape: {y.shape}', f'image_shape: {images.shape}')



# Load data for the SCAN_CODER device
lf = load_spectra(device=Device.SCAN_CODER, shuffle=False, no_files_per_load=4, load_with_images=True)

# Load data for the LOW_COST device
lf = load_spectra(device=Device.LOW_COST, shuffle=False, no_files_per_load=4, load_with_images=True)

# Loading only spectra data without images
lf = load_spectra(device=Device.LOW_COST, shuffle=False, no_files_per_load=4, load_with_images=False)

    for batch in lf:
        x, y = next(iter(lf))
        print(f'x_shape: {x.shape}',f'y_shape: {y.shape}')

# To load all the data in-memory (Note is too huge, partial loading can be achieved by reducing the no_files_per_load)
lf = load_spectra(device=Device.SCAN_CODER, shuffle=False, no_files_per_load=-1, load_with_images=True)
x, y, images = next(iter(lf))

Nature of Target(y) The target returns a tuple (titer_value, expert_score, week, disease_class) where each element describes a specific aspect of the spectral and laboratory observation collected from each plant sample.

Feature	Type	Description
titer_value	Float	Ground truth measurement collected from each plant, aligned with the spectral reading.
expert_score	Integer	Visual severity score assigned by an agricultural expert based on observable symptoms.
week	Integer	Week of data collection during the 15-week sampling period.
disease_class	Float	Label representing the disease type or health status of the plant sample.

Feature matrix (x) currently contains only the calibrated wavelength reading of each device

Dynamic data loading

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# Path to where data is store
DATA_PATH = '/home/wilfred/Datasets/spectra_data'

for test_device in Device.get_devices():
    dataset = SpectralDataset(DATA_PATH, device=test_device) # dataset
    x, y = dataset[0] # load sample data
    print(f'Prinitng shapes for device: {test_device.name}')
    print(x.shape, y.shape)

Properties of dataset

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device

# create Dataset object for LOW COST only
dataset = SpectralDataset('/home/usr/Datasets/spectra_data', Device.SCAN_CODER)

# Get wavelength range for the device
wavelength = dataset.wavelength
print(wavelength)

# Get disease class codes used in batching
disease_classes = dataset.disease_class_codes
print(f'Supported disease classes {disease_classes}')

# Get plant_type codes used in the batching
plant_types = dataset.plant_type_codes
print(f'Supported crop types: {plant_types}')

Data batching

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.utils.device import Device
from buaiir_spectra.data.dataloader import SpectralDataLoader

# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'

dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)

# creat the dataloader
dataloader = SpectralDataLoader(dataset, batch_size=4)

# iterate over the batchs
for batch in dataloader:
    # extract the x_batch and y_batch
    x_batch, y_batch = batch

    # print the shape of the batches
    print(x_batch.shape, y_batch.shape)

Parameters for data wrangling provided by Dataloader

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device


# Path to where you dataset is store: adjust accordingly
DATA_PATH = '/home/usr/Datasets/spectra_data'

# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.BIO_SCIENCE)

# Creating a dataloader with plant labels shuffled
dataloader_with_shuffled_plants = SpectralDataLoader(dataset, batch_size=40, permutate_plants= True)

# Creating a dataloader with weeks shuffled
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate_weeks= True)

# Creating a dataloader with completely shuffled data, best for regularization
dataloader_with_shuffled_weeks = SpectralDataLoader(dataset, batch_size=40, permutate=True)

Extracting label specific or week specific data

from buaiir_spectra.data.dataset import SpectralDataset
from buaiir_spectra.data.dataloader import SpectralDataLoader
from buaiir_spectra.utils.device import Device

DATA_PATH = '/home/wilfred/Datasets/spectra_data'

# Loading data for BIO SCINCE
dataset = SpectralDataset(data_path=DATA_PATH, device=Device.SCAN_CODER)
dataloader = SpectralDataLoader(dataset, batch_size=150)


# Load data for only a single disease class e.g CMD
x, y = dataloader.load_data_of_disease_class('CMD')

# Load data for a specific label across all weeks
x_1, y_1 = dataloader.load_data_of('BBLB1')

# Checking all supported labels
supported_labels = dataloader.labels
print(supported_labels)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.3.7

Jun 26, 2026

1.3.6

Jun 26, 2026

1.3.5

Jun 26, 2026

1.3.4

Jun 25, 2026

1.3.3

Jun 17, 2026

1.3.2

Jun 5, 2026

1.3.1

Jun 4, 2026

1.3.0

Jun 4, 2026

1.2.9

Jun 4, 2026

1.2.8

Jun 2, 2026

1.2.7

Jun 2, 2026

1.2.6

Jun 2, 2026

1.2.5

Jun 2, 2026

1.2.4

May 24, 2026

1.2.3

May 24, 2026

1.2.2

May 24, 2026

1.2.1

May 23, 2026

1.2.0

May 21, 2026

0.1.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buaiir_spectra-1.3.7.tar.gz (45.5 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

buaiir_spectra-1.3.7-py3-none-any.whl (47.6 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file buaiir_spectra-1.3.7.tar.gz.

File metadata

Download URL: buaiir_spectra-1.3.7.tar.gz
Upload date: Jun 26, 2026
Size: 45.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for buaiir_spectra-1.3.7.tar.gz
Algorithm	Hash digest
SHA256	`df5197c4a4dd625bde1e906b901b62b9f625e7d092d2eed856cb1127c1a09cda`
MD5	`b683f772035de7144200abe5e009f58d`
BLAKE2b-256	`77444401fe7e15885641f33cd7f218821139ce1a349ad7725507d299f26862df`

See more details on using hashes here.

File details

Details for the file buaiir_spectra-1.3.7-py3-none-any.whl.

File metadata

Download URL: buaiir_spectra-1.3.7-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 47.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for buaiir_spectra-1.3.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37e673824521fe6609f452406cc1526ce8ecbe1cd56b8543a7ddec5ef2afcc48`
MD5	`df491f2c1c59e345ef756248431d51da`
BLAKE2b-256	`09a6ff08d497fe6f78ca3ee9756a85f560e12ad55ed1e812885672a7fbc535df`

See more details on using hashes here.

buaiir-spectra 1.3.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

BUAIIR Spectra

Dataset Overview

Class Definitions

Data Collection Period

Data Loading

Installation

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes