Skip to main content

A Python Pipeline for Machine Learning with Oghma

Project description

PyOghma_ML

PyOghma_ML is a Python-based machine learning pipeline designed to integrate seamlessly with OghmaNano, a powerful platform for simulating and analyzing optoelectronic devices. This library provides tools for training machine learning models, generating predictions, and producing detailed reports based on experimental and simulated data.

The primary goal of PyOghma_ML is to streamline the process of leveraging machine learning for device characterization and prediction. By automating tasks such as data preprocessing, model training, and result generation, PyOghma_ML enables researchers to focus on interpreting results and advancing their research.

Key features of PyOghma_ML include:

  • Training Machine Learning Models: Train deep learning models using simulated data to predict device characteristics.
  • Prediction and Analysis: Use trained models to predict device performance based on experimental data.
  • Customizable Model Settings: Fine-tune model architectures, learning rates, and other hyperparameters to suit specific datasets.
  • Report Generation: Automatically generate LaTeX-based reports and CSV files summarizing predictions and model performance.
  • Integration with OghmaNano: Designed to work directly with OghmaNano's data formats, ensuring compatibility and ease of use.

Whether you are a researcher working on photovoltaics, LEDs, or other optoelectronic devices, PyOghma_ML provides a robust and flexible framework for incorporating machine learning into your workflow.


Installation

Before installing PyOghma_ML, ensure that OghmaNano is installed. You can download OghmaNano from oghma-nano.com.

To install PyOghma_ML, use pip:

# For production use
python -m pip install PyOghma-ML

# For development (from source)
git clone https://github.com/CaiWilliams/PyOghma_ML.git
cd PyOghma_ML
pip install -e .

# With optional dependencies
pip install PyOghma-ML[dev,docs,gpu]

Modern Installation: PyOghma_ML uses modern Python packaging with pyproject.toml. For pip versions >= 21.0, all dependencies are automatically handled.


Usage and Examples

This guide assumes familiarity with OghmaNano and its machine learning features. If you are new to OghmaNano, refer to the OghmaNano User Manual before proceeding.

For detailed API documentation, visit the PyOghma_ML Documentation.


Machine Learning Inputs

Machine learning networks for training or prediction are defined through the machine learning window in OghmaNano. Once the dataset is generated and vectors are built and normalized, the required files for the pipeline are created.

Required Files:

  • nets.json: Defines the networks, their inputs, and outputs.
  • vectors.csv: Contains normalized parameter values for each device.
  • min_max.csv: Specifies the minimum, maximum, and range for each parameter.

Note: Only nets.json can be modified after dataset and vector generation. Exercise caution when editing it outside OghmaNano.


Training with Training.py and SubTraining.py

Training.py

This script orchestrates the training of multiple neural network models across different device configurations and network architectures. It features improved error handling, progress reporting, and organized configuration management.

Key Features:

  • Automatic model existence checking to avoid retraining
  • Enhanced error handling with detailed status reporting
  • Progress tracking with clear status indicators (✓/✗)
  • Support for multiple material systems and network types
  • Configurable training parameters for different scenarios
# Example: Basic usage of the improved Training.py
import os
from Training import process_dirs

# Configuration
sim_dir_og = os.path.join('/', 'media', 'cai', 'Big', 'Simulated_Data', 'opkm', 'Networks')

# Train different network types with enhanced error handling
print("Starting training for Point networks...")
process_dirs(['Point_pm6y12'], sim_dir_og, 'Point', 10e6)

print("Starting training for Difference networks...")
process_dirs(['Difference_pm6y12'], sim_dir_og, 'Difference', 10e6)

print("Starting training for Residual networks...")
process_dirs(['Residual_pm6y12'], sim_dir_og, 'Residual', 10e6)

Improved Features in the Updated Script:

  • Error Handling: Comprehensive try/catch blocks with detailed error reporting
  • Progress Tracking: Clear status messages with ✓/✗ indicators for success/failure
  • File Validation: Checks for directory existence and valid JSON configurations
  • Subprocess Management: Proper error handling for SubTraining.py execution
  • Organized Configuration: Well-commented sections for different training scenarios
# Training.py - Enhanced version with comprehensive error handling
"""
Training Script for PyOghma ML Models

This script orchestrates the training of multiple neural network models across
different device configurations and network architectures.
"""

def process_dirs(dirs, sim_dir_og, network_type, train_size, extra_arg="1"):
    """
    Process training directories and train missing models.
    
    Args:
        dirs (list[str]): List of directory names to process
        sim_dir_og (str): Base simulation directory path  
        network_type (str): Network architecture type
        train_size (int): Number of training samples
        extra_arg (str): Additional argument for SubTraining.py
    """
    for d in dirs:
        sim_dir = os.path.join(sim_dir_og, d, 'training')
        print(f'Processing simulation directory: {sim_dir}')
        
        # Enhanced error handling and validation
        if not os.path.exists(sim_dir):
            print(f"Warning: Directory {sim_dir} does not exist. Skipping...")
            continue
            
        # Process networks with detailed progress reporting
        # (Full implementation available in the actual script)

# Usage with improved configuration management
if __name__ == "__main__":
    sim_dir_og = os.path.join('/', 'media', 'cai', 'Big', 'Simulated_Data', 'opkm', 'Networks')
    
    print("Starting training for Point networks...")
    process_dirs(['Point_pm6y12'], sim_dir_og, 'Point', 10e6)
    
    print("Training completed!")

SubTraining.py

This script handles individual network training with optimized configurations for different network types. The updated version includes enhanced documentation, improved argument parsing, and better model configuration management.

Key Features:

  • Network-Specific Optimization: Tailored hyperparameters for each network type
  • Flexible Configuration: Command-line arguments for all training parameters
  • Model Settings: Optimized layer architectures and learning rates
  • Error Handling: Comprehensive error reporting and validation
# Example usage (called automatically by Training.py)
python SubTraining.py /path/to/sim_dir 0 Residual 10000000 4

# Arguments:
# 1. sim_dir: Simulation directory path
# 2. idx: Network index to train
# 3. type: Network type (Residual/Difference/Point)
# 4. limit: Training data limit
# 5. inputs: Number of input features

Network Type Configurations:

Network Type Layers Nodes Learning Rate Batch Size Epochs
Residual 4 128 8e-5 16384 1024
Difference 4 256 1e-5 16384 1024
Point 4 256 1e-4 1024 4096
# SubTraining.py - Network-specific training with optimized configurations
"""
Individual Network Training Script

Handles training of specific networks with optimized hyperparameters
for different network types (Residual, Difference, Point).
"""

from argparse import ArgumentParser
import PyOghma_ML_Private as OML
from PyOghma_ML_Private.Training import Model_Settings

def configure_model_settings(network_type, limit, inputs):
    """Configure model settings based on network type."""
    if network_type == 'Residual':
        return Model_Settings(
            initializer='he_normal',
            activation='silu',
            layer_nodes=[128, 128, 128, 128],
            dropout=[0.05, 0.05, 0.05, 0.05],
            inital_learning_rate=8e-5,
            batch_size=16384,
            epochs=1024,
            patience=16,
            decay_rate=6e-1,
            permutations_limit=int(limit),
            inputs=int(inputs)
        )
    # Additional configurations for Difference and Point networks...
    
def main():
    parser = ArgumentParser(description="Train individual neural networks")
    parser.add_argument("sim_dir", help="Simulation directory path")
    parser.add_argument("idx", type=int, help="Network index")
    parser.add_argument("type", help="Network type")
    parser.add_argument("limit", help="Training data limit")
    parser.add_argument("inputs", help="Number of inputs")
    
    args = parser.parse_args()
    
    # Configure and train with enhanced error handling
    m = configure_model_settings(args.type, args.limit, args.inputs)
    A = OML.Networks.initialise(args.sim_dir, network_type=args.type, model_settings=m)
    A.train(args.idx)

if __name__ == "__main__":
    main()

Predictions with Predict.py and SubPredict.py

Predict.py

This script orchestrates the prediction pipeline for experimental data using trained neural network models. The updated version features improved error handling, better device mapping, and enhanced progress reporting.

Key Features:

  • Multi-Device Support: Handles various device types with automatic material mapping
  • Experiment Flexibility: Supports multiple experiment types (J-V, Suns-Voc, etc.)
  • Enhanced Error Handling: Comprehensive error checking and detailed status reporting
  • Progress Tracking: Clear progress indicators and completion status
  • Modular Design: Well-organized helper functions for maintainability
# Example: Using the improved Predict.py
from Predict import map_device_to_material, get_experimental, get_illumination_files

# Configuration
device_types = ["PM6Y12_ZnO", "PM6BTPeC9"]  # Multiple device types supported
experiment_type = "light_dark_jv"  # or "sunsvoc"

# Automatic device-to-material mapping
for device_type in device_types:
    net_mat = map_device_to_material(device_type)
    print(f"Device {device_type} mapped to material: {net_mat}")
    
    # Get experimental files with error handling
    try:
        exp_files = get_experimental(device_type, base_exp_dir, experiment_type)
        print(f"Found {len(exp_files)} experimental files")
    except FileNotFoundError as e:
        print(f"Error: {e}")

Improved Features:

  • Helper Functions: Modular design with get_experimental(), map_device_to_material(), and get_illumination_files()
  • Error Handling: File existence checking and detailed error reporting
  • Status Reporting: Progress indicators (✓/✗) for each processing step
  • Flexible Configuration: Easy-to-modify device types and experiment settings
# filepath: /media/cai/Big/PyOghma_ml/PyOghma_ML/Predict.py
import os
import subprocess
from natsort import natsorted
import json

def get_experimental(device_type, directory, exp):
    """
    Get sorted list of experimental data files for a given device type and experiment.
    """
    exp_dir = os.path.join(directory, device_type, exp)
    files = os.listdir(exp_dir)
    if exp == "light_dark_jv":
        ending = 'am15.dat'
    elif exp == "sunsvoc":
        ending = '0000000uIllu_IV.dat'
    else:
        ending = ''
    files = [os.path.join(exp_dir, f) for f in files if f.endswith(ending)]
    files = natsorted(files)
    return files

if __name__ == "__main__":
    # Define simulation and experimental data directories
    sim_dir_og = os.path.join('/', 'media', 'cai', 'Big', 'Simulated_Data', 'opkm', 'Networks_Extended')
    base_exp_dir = os.path.join('/', 'media', 'cai', 'Big', 'Experimental_Data', 'Data_From_Chen')
    device_type = ["PM6Y12_ZnO"]  # Can include "PM6BTPeC9", "W108-2", etc.
    exp = "light_dark_jv"  # or "sunsvoc"
    res_dir = os.path.join(os.getcwd(), 'Results')

    for idx in device_type:
        # Map device type to network material
        match idx:
            case 'PM6BTPeC9'| 'W108-2':
                net_mat = 'pm6ec9'
            case 'PM6Y12_ZnO':
                net_mat = 'pm6y12'
            case 'PM6Y12_PEIZn':
                net_mat = 'pm6y12_peizn'
            case _:
                raise ValueError(f"Unknown device type: {idx}")

        files = get_experimental(idx, base_exp_dir, exp)

        for jdx in files:
            name = os.path.basename(jdx).split('.')[0]
            # List and filter simulation directories for the current device type
            sim_dirs = [kdx for kdx in os.listdir(sim_dir_og) if kdx.startswith(('Residual_')) and net_mat in kdx]
            sim_dirs = natsorted(sim_dirs)
            sim_dirs = [os.path.join(sim_dir_og, kdx) for kdx in sim_dirs]
            
            for kdx in sim_dirs:
                training = os.path.join(kdx, 'training')
                conversion = os.path.join(kdx, 'conversion')
                network_type = os.path.basename(kdx).split('_')[0]
                extra_info = os.path.basename(kdx).split('_')[-1]
                
                # Check if result already exists before processing
                out_file = os.path.join(res_dir, network_type, name + '.csv')
                if os.path.isfile(out_file):
                    print(f"File {name}.csv already exists. Skipping processing.")
                    continue

                # Run SubPredict with appropriate arguments
                subprocess.run(["python", "SubPredict.py", training, conversion, jdx, res_dir, network_type, "JV", "Deibel", extra_info])

SubPredict.py

This script handles individual prediction tasks using trained neural network models. The updated version features enhanced documentation, improved error handling, and better argument parsing for more reliable predictions.

Key Features:

  • Enhanced Documentation: Comprehensive argument descriptions and usage examples
  • Robust Error Handling: Try/catch blocks with detailed error reporting
  • Progress Tracking: Status messages for each prediction step
  • Flexible Output: Configurable output directory structure
  • Network Type Support: Handles different prediction methods for various network types
# Example usage (automatically called by Predict.py)
python SubPredict.py /path/to/sim_dir /path/to/abs_dir /path/to/exp.dat /path/to/results Residual JV Deibel

# Enhanced argument parsing with detailed help
python SubPredict.py --help

Improved Features:

  • Comprehensive Error Handling: Detailed error messages for debugging
  • Progress Reporting: Step-by-step status updates with ✓/✗ indicators
  • Enhanced Documentation: Built-in help text and usage examples
  • Output Organization: Automatic directory creation and file management
  • Network-Specific Logic: Optimized prediction methods for different network types

Prediction Workflow:

  1. Data Loading: Load and standardize experimental data with validation
  2. Network Initialization: Initialize appropriate network type with error checking
  3. Prediction Generation: Execute predictions based on network architecture
  4. Report Building: Generate comprehensive output reports
  5. File Management: Save results with organized directory structure
# SubPredict.py - Enhanced individual prediction script
"""
Individual Prediction Script

Performs predictions on single experimental files with comprehensive
error handling and progress reporting.
"""

def main():
    parser = ArgumentParser(description="Individual prediction script")
    # Enhanced argument parsing with detailed help
    parser.add_argument("sim_dir", help="Simulation directory path")
    parser.add_argument("abs_dir", help="Absolute data directory")
    parser.add_argument("exp_dir", help="Experimental data file")
    # ... additional arguments
    
    args = parser.parse_args()
    
    try:
        print(f"Starting prediction for: {os.path.basename(args.exp_dir)}")
        
        # Load and standardize experimental data
        Exp = OML.Input.experiment(args.exp_dir, args.exp_typ, args.Source_Lab)
        Exp.standardise_inputs()
        print(f"✓ Loaded experimental data: {len(Exp.x)} data points")
        
        # Initialize network and generate predictions
        N = OML.Networks.initialise(args.sim_dir, network_type=args.net_typ)
        if args.net_typ in ['Difference', 'Residual']:
            N.predict(args.abs_dir, Exp)
        else:
            N.predict(Exp)
        print("✓ Predictions generated successfully")
        
        # Build and save report
        O = OML.Output(N, Exp, abs_dir=args.abs_dir)
        O.build_report()
        O.save_report(output_path)
        print(f"✓ Results saved to: {output_path}.csv")
        
    except Exception as e:
        print(f"✗ Error during prediction: {e}")
        raise

if __name__ == "__main__":
    main()

Supported Device Types and Materials

The current implementation supports the following device types and their corresponding network materials:

Device Type Network Material Description
PM6BTPeC9 pm6ec9 PM6:BTP-eC9 based devices
W108-2 pm6ec9 W108-2 devices
PM6Y12_ZnO pm6y12 PM6:Y12 with ZnO devices
PM6Y12_PEIZn pm6y12_peizn PM6:Y12 with PEI-Zn devices

Supported Experiments

Experiment Type File Extension Description
light_dark_jv am15.dat Light and dark JV measurements
sunsvoc 0000000uIllu_IV.dat Suns-Voc measurements

Supported Laboratories

Laboratory Label
OPKM Deibel
HSP Shoaee
Herzig Group Herzig
OghmaNano Oghma

Note: For new laboratories or characteristics, custom input functions may need to be written. Contact support for assistance.


Recent Improvements (v0.2.0)

PyOghma_ML has been significantly enhanced with improved documentation, error handling, and code organization:

Enhanced Documentation

  • Comprehensive Module Docstrings: All scripts now include detailed module-level documentation
  • Function Documentation: Improved docstrings with parameter types, return values, and examples
  • Usage Examples: Clear examples and configuration guidance throughout

Improved Error Handling

  • Robust Validation: File existence checking and directory validation
  • Detailed Error Messages: Clear error reporting with troubleshooting information
  • Graceful Failure: Scripts continue processing when individual tasks fail
  • Status Indicators: Visual progress tracking with ✓/✗ status markers

Code Structure Enhancements

  • Modular Design: Refactored scripts into well-organized functions
  • Helper Functions: Separated concerns for better maintainability
  • Configuration Management: Organized and documented configuration sections
  • Modern Python: Updated to use modern Python packaging with pyproject.toml

User Experience Improvements

  • Progress Tracking: Real-time status updates during training and prediction
  • Better Logging: Informative console output with clear progress indicators
  • Organized Output: Structured result directories and file naming
  • Enhanced CLI: Improved command-line argument parsing with detailed help

Example Workflow

1. Training Networks (Enhanced with Error Handling)

# Use the improved Training.py with automatic error handling
from Training import process_dirs

# The script now provides detailed status reporting
print("=== Starting Neural Network Training ===")

# Train different network types with enhanced monitoring
configurations = [
    (['Point_pm6y12'], 'Point', 10e6),
    (['Difference_pm6y12'], 'Difference', 10e6),
    (['Residual_pm6y12'], 'Residual', 10e6)
]

sim_dir_og = '/path/to/simulation/data'

for dirs, network_type, train_size in configurations:
    print(f"\n--- Training {network_type} Networks ---")
    try:
        process_dirs(dirs, sim_dir_og, network_type, train_size)
        print(f"✓ {network_type} training completed successfully")
    except Exception as e:
        print(f"✗ {network_type} training failed: {e}")

2. Generate Predictions (Improved Pipeline)

# Use the enhanced Predict.py with better device mapping
from Predict import map_device_to_material, get_experimental

# Configuration with multiple device types
device_types = ["PM6Y12_ZnO", "PM6BTPeC9", "W108-2"]
experiment_type = "light_dark_jv"

print("=== Starting Prediction Pipeline ===")

for device_type in device_types:
    print(f"\n--- Processing {device_type} ---")
    
    try:
        # Automatic device-to-material mapping
        net_mat = map_device_to_material(device_type)
        print(f"✓ Mapped {device_type} to material: {net_mat}")
        
        # Load experimental files with validation
        exp_files = get_experimental(device_type, base_exp_dir, experiment_type)
        print(f"✓ Found {len(exp_files)} experimental files")
        
        # Process predictions with progress tracking
        # (Detailed processing handled automatically)
        
    except Exception as e:
        print(f"✗ Error processing {device_type}: {e}")
        continue

print("✓ Prediction pipeline completed")

3. Network Type Optimization

Network Type Architecture Use Case Key Features
Residual 4×128 nodes Residual prediction Fast training, efficient memory
Difference 4×256 nodes Differential analysis High accuracy, robust to noise
Point 4×256 nodes Direct prediction Minimal data requirements

4. Testing Data Loading

# Use the enhanced LoadingFileTest.py for validation
from LoadingFileTest import main

# Test data loading with multiple optical densities
print("=== Testing Data Loading Functionality ===")

# The script provides comprehensive testing:
# - Multiple OD values (0.0, 1.0, 2.0, 3.0)
# - Error handling for missing files
# - Visual validation with enhanced plots
# - Detailed progress reporting

main()  # Automatically runs all tests with status indicators

Requirements and Dependencies

PyOghma_ML requires Python 3.9+ and includes the following key dependencies:

Core Dependencies

  • Scientific Computing: NumPy ≥1.24.0, SciPy ≥1.11.0, Pandas ≥2.1.0
  • Machine Learning: TensorFlow ≥2.15.0, Keras ≥3.0.0, Keras-Tuner ≥1.4.0
  • Visualization: Matplotlib ≥3.7.0
  • Data Processing: h5py ≥3.9.0, ujson ≥5.8.0, natsort ≥8.4.0
  • Templating: Jinja2 ≥3.1.0

Optional Dependencies

# Development tools
pip install PyOghma-ML[dev]  # pytest, black, flake8, mypy

# Documentation generation
pip install PyOghma-ML[docs]  # sphinx, themes, parsers

# GPU acceleration
pip install PyOghma-ML[gpu]  # tensorflow with CUDA support

System Requirements

  • Python: 3.9, 3.10, 3.11, or 3.12
  • Operating System: Cross-platform (Windows, macOS, Linux)
  • Memory: Minimum 8GB RAM (16GB recommended for large datasets)
  • Storage: 1GB+ for models and results

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyoghma_ml-0.2.0.tar.gz (795.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyoghma_ml-0.2.0-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file pyoghma_ml-0.2.0.tar.gz.

File metadata

  • Download URL: pyoghma_ml-0.2.0.tar.gz
  • Upload date:
  • Size: 795.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for pyoghma_ml-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7a7dd7dfe5712bf89d39c03525b4637166c0003abf8b743678aadce4619494c1
MD5 a807f2db629c7789a0fbeb41e30579be
BLAKE2b-256 7a0705e3f7e5c696d4b934440c88193587b5dcb65db4eecc874b8efe3d222c5a

See more details on using hashes here.

File details

Details for the file pyoghma_ml-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyoghma_ml-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for pyoghma_ml-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77050d6557fcc56d1468eedc625d1de1c1d1144ded9f2af0863b9ebfc11d166a
MD5 71934db2b76d7ed46272194223873148
BLAKE2b-256 cc21d3d595ab40e445473a34988004f3923e03ad506c5887b0b5464f0f6f7f70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page