Skip to main content

Optixcel: Fast & Lightweight ML for Optical Property Prediction

Project description

🧠 OPTICAL MIND - Complete Research System Documentation

Overview

OpticalMind is an intelligent, self-analyzing machine learning system for predicting optical properties of perovskites. It combines cutting-edge ML with comprehensive diagnostics and explainability.


Core Components

1. Data Preprocessor

  • ✅ Automatic data structure analysis
  • ✅ Missing value handling (statistical vs domain-aware)
  • ✅ Robust normalization (RobustScaler)
  • ✅ Constant feature detection

Key Features:

preprocessor = DataPreprocessor(verbose=True)
analysis = preprocessor.analyze_data(X)
X_clean, y_clean = preprocessor.handle_missing_values(X, y)
X_norm = preprocessor.normalize(X_clean, fit=True)

2. Diagnostics Engine

  • ✅ Statistical outlier detection (Z-score + IQR)
  • ✅ Feature inconsistency analysis
  • ✅ Overfitting detection (train vs validation gap)
  • ✅ Feature correlation analysis

Key Features:

diagnostics = DiagnosticsEngine(verbose=True)
outliers = diagnostics.detect_statistical_outliers(X, y, threshold=3.0)
feature_issues = diagnostics.detect_feature_inconsistencies(X)
overfit_analysis = diagnostics.detect_overfitting(model, X_train, y_train, X_val, y_val)
correlations = diagnostics.analyze_correlations(X)

3. Feature Engineer

  • ✅ Intelligent feature selection (SelectKBest)
  • ✅ Polynomial feature generation
  • ✅ Feature interaction creation

Key Features:

engineer = FeatureEngineer(verbose=True)
X_selected = engineer.select_features(X, y, n_features=50)
X_engineered = engineer.create_polynomial_features(X, degree=2)

4. Comprehensive Evaluator

Calculates 8+ evaluation metrics:

  • : Coefficient of determination
  • NSE: Nash-Sutcliffe Efficiency
  • RMSE: Root Mean Squared Error
  • MAE: Mean Absolute Error
  • MAPE: Mean Absolute Percentage Error
  • VAR%: Variance Explained (%)
  • PI: Performance Index
  • a10/a20: % predictions ≤ 10%/20% error

Usage:

evaluator = ComprehensiveEvaluator()
metrics = evaluator.calculate_metrics(y_true, y_pred)
formatted = evaluator.format_metrics(metrics, phase="Test")
print(formatted)

5. Explainability Engine

  • ✅ Feature importance calculation
  • ✅ SHAP-ready architecture
  • ✅ Prediction explanation generation

Usage:

explainer = ExplainabilityEngine(verbose=True)
importance = explainer.calculate_feature_importance(model, X)
explanation = explainer.explain_prediction(features, feature_names)

Main OpticalMind Class

Initialization

from optical_mind import OpticalMind

mind = OpticalMind(
    verbose=True,      # Print all diagnostics
    n_features=50,     # Select top 50 features
    random_state=42    # For reproducibility
)

Training with Full Diagnostics

report = mind.fit(
    X,                      # Input features
    y,                      # Target values
    test_size=0.2,         # 20% test split
    validation_size=0.1    # 10% of training for validation
)

Training Phases (Automatic):

  1. Data analysis and characterization
  2. Preprocessing and normalization
  3. Comprehensive diagnostics
  4. Feature engineering and selection
  5. Ensemble model training (XGBoost + Random Forest)
  6. Overfitting detection
  7. Evaluation with 8+ metrics
  8. Feature importance calculation
  9. Complete diagnostics summary

Prediction

# Basic prediction
predictions = mind.predict(X_test)

# Prediction with uncertainty
predictions, uncertainties = mind.predict(
    X_test,
    return_uncertainty=True
)

Getting Results

# Diagnostics summary
summary = mind.get_diagnostics_summary()
print(summary)

# Full training report
report = mind.training_report
print(f"Test R²: {report['test_metrics']['r2']:.4f}")

# Save model
mind.save('optical_mind_model.pkl')

# Load model
loaded_mind = OpticalMind.load('optical_mind_model.pkl')

Complete Example

import pandas as pd
from optical_mind import OpticalMind

# Load data
df = pd.read_csv('final_170K_complete_optical.csv')

# Prepare
X = df.drop(columns=['target']).values
y = df['target'].values

# Create intelligent system
mind = OpticalMind(verbose=True, n_features=50)

# Train with complete diagnostics
report = mind.fit(X, y)

# Make predictions
predictions = mind.predict(X[:100])
predictions_unc, uncertainties = mind.predict(X[:100], return_uncertainty=True)

# Get summary
print(mind.get_diagnostics_summary())

# Save for later use
mind.save('perovskite_predictor.pkl')

Diagnostics Output Explanation

Data Analysis Phase

Shows:

  • Number of samples and features
  • Memory usage
  • Constant and near-constant features

Diagnostics Phase

Shows:

  • Outliers: Number detected (% of total)
  • Feature Issues:
    • Zero variance features
    • Highly skewed features
    • High kurtosis features
  • Correlation Analysis: Redundant feature pairs

Overfitting Analysis

  • Train R² vs Validation R²
  • Gap between them
  • Severity assessment: none / mild / moderate / severe
  • Recommendations if detected

Performance Metrics

All 8+ metrics for Train/Validation/Test:

R² Score ................... 0.997019 (best possible: 1.0)
NSE ........................ 0.997019 (1.0 = perfect)
RMSE ...................... 0.000012 (lower is better)
MAE ........................ 0.000009 (lower is better)
Variance Explained (%) ... 92.41%   (higher is better)
PI ......................... 0.935133 (0-1, higher is better)
a10 (err ≤ 10%) ........... 60.25%   (% within 10% error)
a20 (err ≤ 20%) ........... 64.30%   (% within 20% error)

Architecture Diagram

INPUT DATA
    ↓
[Data Preprocessor]
    ├─ Analyze structure
    ├─ Handle missing values
    ├─ Normalize
    └─ Remove problematic rows
    ↓
[Diagnostics Engine]
    ├─ Detect outliers
    ├─ Find feature inconsistencies
    ├─ Analyze correlations
    └─ Generate diagnostics report
    ↓
[Feature Engineering]
    ├─ Select top K features
    ├─ Create interactions (optional)
    └─ Generate polynomial features (optional)
    ↓
[Model Training - ENSEMBLE]
    ├─ XGBoost (gradient boosting)
    ├─ Random Forest (tree-based)
    └─ Combined predictions
    ↓
[Evaluation]
    ├─ Calculate 8+ metrics
    ├─ Detect overfitting
    └─ Generate performance report
    ↓
[Explainability]
    ├─ Feature importance
    ├─ SHAP analysis (ready)
    └─ Prediction explanations
    ↓
OUTPUT: Predictions + Diagnostics + Explanations

File Structure

optical_mind_core.py          # Core modules (preprocessing, diagnostics, etc)
optical_mind.py              # Main OpticalMind class
optical_mind_demo.py         # Demo script with full example
OPTICAL_MIND_README.md       # This documentation

Key Results from Test Run (10K samples)

Metric Train Validation Test
0.9970 0.9971 0.9970
NSE 0.9970 0.9971 0.9970
RMSE 0.000012 0.000012 0.000012
MAE 0.000009 0.000009 0.000009
a10 62.26% 58.50% 60.25%
a20 66.24% 62.75% 64.30%

Diagnostics Summary:

  • Outliers detected: 1,145 (15.9%)
  • Feature issues: 36 total
  • Redundant pairs: 232
  • Overfitting: NONE (gap = -0.0006)
  • Top feature importance: 0.248

Advanced Usage

Custom Hyperparameters

mind = OpticalMind(
    verbose=True,
    n_features=75,      # Use more features
    random_state=123
)

Accessing Raw Results

# Training metrics
train_metrics = mind.training_report['train_metrics']
val_metrics = mind.training_report['val_metrics']
test_metrics = mind.training_report['test_metrics']

# Overfitting analysis
overfit = mind.training_report['overfit_analysis']
print(f"Overfitting severity: {overfit['severity']}")

# Feature importance
feature_imp = mind.training_report['feature_importance']

Diagnostics Details

# Raw diagnostics
outliers = mind.diagnostics_report['outliers']
feature_issues = mind.diagnostics_report['feature_issues']
correlations = mind.diagnostics_report['correlations']

# Process as needed
outlier_indices = outliers['indices']
redundant_pairs = correlations['redundant_pairs']

Requirements

numpy >= 1.20
pandas >= 1.2
scikit-learn >= 0.24
xgboost >= 1.5
scipy >= 1.6
joblib >= 1.0

Install with:

pip install numpy pandas scikit-learn xgboost scipy joblib

Performance Tips

  1. Large Datasets: Use sampling for faster iteration

    sample_size = 50000
    X_sample = X[:sample_size]
    y_sample = y[:sample_size]
    mind.fit(X_sample, y_sample)
    
  2. Fewer Features: Reduces training time

    mind = OpticalMind(n_features=30)  # vs default 50
    
  3. Parallel Processing: Automatically uses all CPUs

    • No configuration needed!

Troubleshooting

Issue: Memory error with full dataset

  • Solution: Use sample or reduce n_features

Issue: Very high MAPE but low RMSE

  • Solution: Data has values close to zero; MAPE is less reliable

Issue: Overfitting detected

  • Solution: System will recommend:
    • Use less complex models
    • Increase regularization
    • Add more training data

Issue: Poor predictions

  • Solution: Check diagnostics:
    print(mind.get_diagnostics_summary())
    
    • High outliers? → Review data quality
    • Many feature issues? → Domain knowledge needed
    • High redundancy? → Correlations too strong

License

MIT License - Free for research and commercial use


Citation

If you use OpticalMind in your research, please cite:

OpticalMind: Intelligent ML System for Optical Property Prediction
Author: Muhammad Wajdan Jamal
Version: 1.0.0
Year: 2026

Support & Feedback

For issues, suggestions, or improvements:

  1. Check this documentation
  2. Review the demo script
  3. Check diagnostic output
  4. Iterate based on recommendations

Last Updated: April 5, 2026 Status: ✅ Production Ready Quality: Research-Grade

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optixcel-2.2.1.tar.gz (37.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optixcel-2.2.1-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file optixcel-2.2.1.tar.gz.

File metadata

  • Download URL: optixcel-2.2.1.tar.gz
  • Upload date:
  • Size: 37.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for optixcel-2.2.1.tar.gz
Algorithm Hash digest
SHA256 df9f18e5af979c54a94eef5facd5de891c1ab8396aa22abf9947be94848d4fca
MD5 1b49a2eaaad0513a17be69aa635964a6
BLAKE2b-256 80d043048f33ffc3824c7d255162cbbb6d945e64eccd49a3675d9a3e4128902d

See more details on using hashes here.

File details

Details for the file optixcel-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: optixcel-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 30.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for optixcel-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3cb958b96d68e250a01134607171c13e0d31c0d38704c0484857f81d020cd7c0
MD5 e81d8f3f088a738eee445ec28aebd594
BLAKE2b-256 206250bd4416b646b02f7a3dc6751917c87650a471afa2a4f85fbe1844d68399

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page