Optixcel: Fast & Lightweight ML for Optical Property Prediction
Project description
🧠 OPTICAL MIND - Complete Research System Documentation
Overview
OpticalMind is an intelligent, self-analyzing machine learning system for predicting optical properties of perovskites. It combines cutting-edge ML with comprehensive diagnostics and explainability.
Core Components
1. Data Preprocessor
- ✅ Automatic data structure analysis
- ✅ Missing value handling (statistical vs domain-aware)
- ✅ Robust normalization (RobustScaler)
- ✅ Constant feature detection
Key Features:
preprocessor = DataPreprocessor(verbose=True)
analysis = preprocessor.analyze_data(X)
X_clean, y_clean = preprocessor.handle_missing_values(X, y)
X_norm = preprocessor.normalize(X_clean, fit=True)
2. Diagnostics Engine
- ✅ Statistical outlier detection (Z-score + IQR)
- ✅ Feature inconsistency analysis
- ✅ Overfitting detection (train vs validation gap)
- ✅ Feature correlation analysis
Key Features:
diagnostics = DiagnosticsEngine(verbose=True)
outliers = diagnostics.detect_statistical_outliers(X, y, threshold=3.0)
feature_issues = diagnostics.detect_feature_inconsistencies(X)
overfit_analysis = diagnostics.detect_overfitting(model, X_train, y_train, X_val, y_val)
correlations = diagnostics.analyze_correlations(X)
3. Feature Engineer
- ✅ Intelligent feature selection (SelectKBest)
- ✅ Polynomial feature generation
- ✅ Feature interaction creation
Key Features:
engineer = FeatureEngineer(verbose=True)
X_selected = engineer.select_features(X, y, n_features=50)
X_engineered = engineer.create_polynomial_features(X, degree=2)
4. Comprehensive Evaluator
Calculates 8+ evaluation metrics:
- R²: Coefficient of determination
- NSE: Nash-Sutcliffe Efficiency
- RMSE: Root Mean Squared Error
- MAE: Mean Absolute Error
- MAPE: Mean Absolute Percentage Error
- VAR%: Variance Explained (%)
- PI: Performance Index
- a10/a20: % predictions ≤ 10%/20% error
Usage:
evaluator = ComprehensiveEvaluator()
metrics = evaluator.calculate_metrics(y_true, y_pred)
formatted = evaluator.format_metrics(metrics, phase="Test")
print(formatted)
5. Explainability Engine
- ✅ Feature importance calculation
- ✅ SHAP-ready architecture
- ✅ Prediction explanation generation
Usage:
explainer = ExplainabilityEngine(verbose=True)
importance = explainer.calculate_feature_importance(model, X)
explanation = explainer.explain_prediction(features, feature_names)
Main OpticalMind Class
Initialization
from optical_mind import OpticalMind
mind = OpticalMind(
verbose=True, # Print all diagnostics
n_features=50, # Select top 50 features
random_state=42 # For reproducibility
)
Training with Full Diagnostics
report = mind.fit(
X, # Input features
y, # Target values
test_size=0.2, # 20% test split
validation_size=0.1 # 10% of training for validation
)
Training Phases (Automatic):
- Data analysis and characterization
- Preprocessing and normalization
- Comprehensive diagnostics
- Feature engineering and selection
- Ensemble model training (XGBoost + Random Forest)
- Overfitting detection
- Evaluation with 8+ metrics
- Feature importance calculation
- Complete diagnostics summary
Prediction
# Basic prediction
predictions = mind.predict(X_test)
# Prediction with uncertainty
predictions, uncertainties = mind.predict(
X_test,
return_uncertainty=True
)
Getting Results
# Diagnostics summary
summary = mind.get_diagnostics_summary()
print(summary)
# Full training report
report = mind.training_report
print(f"Test R²: {report['test_metrics']['r2']:.4f}")
# Save model
mind.save('optical_mind_model.pkl')
# Load model
loaded_mind = OpticalMind.load('optical_mind_model.pkl')
Complete Example
import pandas as pd
from optical_mind import OpticalMind
# Load data
df = pd.read_csv('final_170K_complete_optical.csv')
# Prepare
X = df.drop(columns=['target']).values
y = df['target'].values
# Create intelligent system
mind = OpticalMind(verbose=True, n_features=50)
# Train with complete diagnostics
report = mind.fit(X, y)
# Make predictions
predictions = mind.predict(X[:100])
predictions_unc, uncertainties = mind.predict(X[:100], return_uncertainty=True)
# Get summary
print(mind.get_diagnostics_summary())
# Save for later use
mind.save('perovskite_predictor.pkl')
Diagnostics Output Explanation
Data Analysis Phase
Shows:
- Number of samples and features
- Memory usage
- Constant and near-constant features
Diagnostics Phase
Shows:
- Outliers: Number detected (% of total)
- Feature Issues:
- Zero variance features
- Highly skewed features
- High kurtosis features
- Correlation Analysis: Redundant feature pairs
Overfitting Analysis
- Train R² vs Validation R²
- Gap between them
- Severity assessment: none / mild / moderate / severe
- Recommendations if detected
Performance Metrics
All 8+ metrics for Train/Validation/Test:
R² Score ................... 0.997019 (best possible: 1.0)
NSE ........................ 0.997019 (1.0 = perfect)
RMSE ...................... 0.000012 (lower is better)
MAE ........................ 0.000009 (lower is better)
Variance Explained (%) ... 92.41% (higher is better)
PI ......................... 0.935133 (0-1, higher is better)
a10 (err ≤ 10%) ........... 60.25% (% within 10% error)
a20 (err ≤ 20%) ........... 64.30% (% within 20% error)
Architecture Diagram
INPUT DATA
↓
[Data Preprocessor]
├─ Analyze structure
├─ Handle missing values
├─ Normalize
└─ Remove problematic rows
↓
[Diagnostics Engine]
├─ Detect outliers
├─ Find feature inconsistencies
├─ Analyze correlations
└─ Generate diagnostics report
↓
[Feature Engineering]
├─ Select top K features
├─ Create interactions (optional)
└─ Generate polynomial features (optional)
↓
[Model Training - ENSEMBLE]
├─ XGBoost (gradient boosting)
├─ Random Forest (tree-based)
└─ Combined predictions
↓
[Evaluation]
├─ Calculate 8+ metrics
├─ Detect overfitting
└─ Generate performance report
↓
[Explainability]
├─ Feature importance
├─ SHAP analysis (ready)
└─ Prediction explanations
↓
OUTPUT: Predictions + Diagnostics + Explanations
File Structure
optical_mind_core.py # Core modules (preprocessing, diagnostics, etc)
optical_mind.py # Main OpticalMind class
optical_mind_demo.py # Demo script with full example
OPTICAL_MIND_README.md # This documentation
Key Results from Test Run (10K samples)
| Metric | Train | Validation | Test |
|---|---|---|---|
| R² | 0.9970 | 0.9971 | 0.9970 |
| NSE | 0.9970 | 0.9971 | 0.9970 |
| RMSE | 0.000012 | 0.000012 | 0.000012 |
| MAE | 0.000009 | 0.000009 | 0.000009 |
| a10 | 62.26% | 58.50% | 60.25% |
| a20 | 66.24% | 62.75% | 64.30% |
Diagnostics Summary:
- Outliers detected: 1,145 (15.9%)
- Feature issues: 36 total
- Redundant pairs: 232
- Overfitting: NONE (gap = -0.0006)
- Top feature importance: 0.248
Advanced Usage
Custom Hyperparameters
mind = OpticalMind(
verbose=True,
n_features=75, # Use more features
random_state=123
)
Accessing Raw Results
# Training metrics
train_metrics = mind.training_report['train_metrics']
val_metrics = mind.training_report['val_metrics']
test_metrics = mind.training_report['test_metrics']
# Overfitting analysis
overfit = mind.training_report['overfit_analysis']
print(f"Overfitting severity: {overfit['severity']}")
# Feature importance
feature_imp = mind.training_report['feature_importance']
Diagnostics Details
# Raw diagnostics
outliers = mind.diagnostics_report['outliers']
feature_issues = mind.diagnostics_report['feature_issues']
correlations = mind.diagnostics_report['correlations']
# Process as needed
outlier_indices = outliers['indices']
redundant_pairs = correlations['redundant_pairs']
Requirements
numpy >= 1.20
pandas >= 1.2
scikit-learn >= 0.24
xgboost >= 1.5
scipy >= 1.6
joblib >= 1.0
Install with:
pip install numpy pandas scikit-learn xgboost scipy joblib
Performance Tips
-
Large Datasets: Use sampling for faster iteration
sample_size = 50000 X_sample = X[:sample_size] y_sample = y[:sample_size] mind.fit(X_sample, y_sample)
-
Fewer Features: Reduces training time
mind = OpticalMind(n_features=30) # vs default 50
-
Parallel Processing: Automatically uses all CPUs
- No configuration needed!
Troubleshooting
Issue: Memory error with full dataset
- Solution: Use sample or reduce n_features
Issue: Very high MAPE but low RMSE
- Solution: Data has values close to zero; MAPE is less reliable
Issue: Overfitting detected
- Solution: System will recommend:
- Use less complex models
- Increase regularization
- Add more training data
Issue: Poor predictions
- Solution: Check diagnostics:
print(mind.get_diagnostics_summary())
- High outliers? → Review data quality
- Many feature issues? → Domain knowledge needed
- High redundancy? → Correlations too strong
License
MIT License - Free for research and commercial use
Citation
If you use OpticalMind in your research, please cite:
OpticalMind: Intelligent ML System for Optical Property Prediction
Author: Muhammad Wajdan Jamal
Version: 1.0.0
Year: 2026
Support & Feedback
For issues, suggestions, or improvements:
- Check this documentation
- Review the demo script
- Check diagnostic output
- Iterate based on recommendations
Last Updated: April 5, 2026 Status: ✅ Production Ready Quality: Research-Grade
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file optixcel-2.2.1.tar.gz.
File metadata
- Download URL: optixcel-2.2.1.tar.gz
- Upload date:
- Size: 37.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df9f18e5af979c54a94eef5facd5de891c1ab8396aa22abf9947be94848d4fca
|
|
| MD5 |
1b49a2eaaad0513a17be69aa635964a6
|
|
| BLAKE2b-256 |
80d043048f33ffc3824c7d255162cbbb6d945e64eccd49a3675d9a3e4128902d
|
File details
Details for the file optixcel-2.2.1-py3-none-any.whl.
File metadata
- Download URL: optixcel-2.2.1-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cb958b96d68e250a01134607171c13e0d31c0d38704c0484857f81d020cd7c0
|
|
| MD5 |
e81d8f3f088a738eee445ec28aebd594
|
|
| BLAKE2b-256 |
206250bd4416b646b02f7a3dc6751917c87650a471afa2a4f85fbe1844d68399
|