A comprehensive utility package for machine learning development
Project description
MLON (Machine Learning Operations Network)
A comprehensive utility package for machine learning development that works seamlessly with popular ML libraries like TensorFlow, scikit-learn, Keras, and PyTorch. MLON provides an interconnected network of operations for streamlined machine learning workflows, with built-in safety checks for data leakage and bias.
Features Overview
1. Data Preprocessing (DataPreprocessor)
from mlon import DataPreprocessor
preprocessor = DataPreprocessor()
- Missing Value Handling
# Handle missing values with different strategies data = preprocessor.handle_missing_values(data, strategy='mean') # Options: 'mean', 'median', 'mode', 'zero', 'drop'
- Feature Scaling
# Scale features using StandardScaler or MinMaxScaler scaled_data = preprocessor.scale_features(data, method='standard') # Options: 'standard', 'minmax'
- Categorical Encoding
# Encode categorical variables encoded_data = preprocessor.encode_categorical(data, method='onehot') # Options: 'onehot', 'label'
2. Model Evaluation (ModelEvaluator)
from mlon import ModelEvaluator
evaluator = ModelEvaluator()
- Classification Metrics
# Get comprehensive classification metrics metrics = evaluator.classification_metrics(y_true, y_pred) # Returns accuracy, precision, recall, F1
- Regression Metrics
# Get regression performance metrics metrics = evaluator.regression_metrics(y_true, y_pred) # Returns MSE, RMSE, MAE, R²
- Confusion Matrix
conf_matrix = evaluator.get_confusion_matrix(y_true, y_pred, normalize='true') report = evaluator.get_classification_report(y_true, y_pred)
3. Visualization (Visualizer)
from mlon import Visualizer
viz = Visualizer()
- Model Performance Visualization
# Plot confusion matrix viz.plot_confusion_matrix(conf_matrix, class_names=classes) # Plot learning curves viz.plot_learning_curve(train_scores, val_scores) # Plot feature importance viz.plot_feature_importance(importance_scores, feature_names)
- Data Analysis Visualization
# Plot distribution of features viz.plot_distribution(data['feature']) # Plot correlation matrix viz.plot_correlation_matrix(data)
4. Model Utilities (ModelUtils)
from mlon import ModelUtils
model_utils = ModelUtils()
- Model Persistence
# Save and load models model_utils.save_model(model, 'model.pkl', method='pickle') # Options: 'pickle', 'joblib' model = model_utils.load_model('model.pkl', method='pickle')
- Hyperparameter Tuning
# Perform grid search best_model = model_utils.grid_search(model, param_grid, X, y) # Perform random search best_model = model_utils.random_search(model, param_dist, X, y)
5. Cross Validation (CrossValidator)
from mlon import CrossValidator
cv = CrossValidator(n_splits=5)
- Cross-Validation Operations
# Perform cross-validation with custom scoring scores = cv.cross_validate(model, X, y) # Get fold indices for custom cross-validation train_idx, val_idx = cv.get_fold_indices(X, y)
6. Time Series Utilities (TimeSeriesUtils) - NEW!
from mlon import TimeSeriesUtils
ts_utils = TimeSeriesUtils()
- Sequence Creation
# Create sequences for time series prediction X_seq, y_seq = ts_utils.create_sequences(data, seq_length=30, target_horizon=7)
- Time Feature Engineering
# Add time-based features df_with_features = ts_utils.add_time_features(df, 'date_column') # Calculate rolling statistics rolling_features = ts_utils.calculate_rolling_features(data, windows=[7, 30, 90]) # Detect seasonality seasonality_period = ts_utils.detect_seasonality(data)
7. Automatic Guardrails (LeakageDetector, BiasDetector) - NEW in v1.1.0! 🛡️
from mlon.guardrails import LeakageDetector, BiasDetector
# Initialize detectors
leakage_detector = LeakageDetector()
bias_detector = BiasDetector()
- Data Leakage Detection
# Check for train-test overlap overlap_warnings = leakage_detector.check_train_test_overlap(X_train, X_test) # Detect target leakage in features leakage_warnings = leakage_detector.detect_target_leakage(X, y) # Check for future information leakage in time series future_warnings = leakage_detector.detect_future_leakage(timestamps, features)
- Bias & Fairness Checks
# Check for dataset bias bias_warnings = bias_detector.check_dataset_bias(data, protected_features=['gender', 'race']) # Calculate disparate impact impact_metrics = bias_detector.calculate_disparate_impact(predictions, protected_feature) # Get group fairness metrics fairness_metrics = bias_detector.calculate_group_fairness_metrics(y_true, y_pred, protected_feature)
8. Feature Selection (FeatureSelector) - NEW!
from mlon import FeatureSelector
selector = FeatureSelector()
- Statistical Feature Selection
# Select top k features X_selected, scores = selector.select_k_best(X, y, k=10, method='f_classif')
- Dimensionality Reduction
# Apply PCA X_pca, pca = selector.apply_pca(X, n_components=0.95) # Apply ICA X_ica, ica = selector.apply_ica(X, n_components=5) # Apply LDA X_lda, lda = selector.apply_lda(X, y, n_components=2)
Installation
pip install mlon
Quick Start
from mlon import DataPreprocessor, ModelEvaluator, Visualizer, ModelUtils, CrossValidator
# Initialize safety checks
leakage_detector = LeakageDetector()
bias_detector = BiasDetector()
# Check for data leakage and bias
overlap_warnings = leakage_detector.check_train_test_overlap(X_train, X_test)
leakage_warnings = leakage_detector.detect_target_leakage(X, y)
bias_warnings = bias_detector.check_dataset_bias(data, protected_features=['gender'])
# Data Preprocessing
preprocessor = DataPreprocessor()
scaled_data = preprocessor.scale_features(data, method='standard')
encoded_data = preprocessor.encode_categorical(data, method='onehot')
# Model Evaluation
evaluator = ModelEvaluator()
metrics = evaluator.classification_metrics(y_true, y_pred)
conf_matrix = evaluator.get_confusion_matrix(y_true, y_pred)
# Check model fairness
fairness_metrics = bias_detector.calculate_group_fairness_metrics(y_true, y_pred, protected_feature)
# Visualization
viz = Visualizer()
viz.plot_confusion_matrix(conf_matrix)
viz.plot_learning_curve(train_scores, val_scores)
# Model Management
model_utils = ModelUtils()
model_utils.save_model(model, 'model.pkl')
best_model = model_utils.grid_search(model, param_grid, X, y)
# Cross Validation
cv = CrossValidator(n_splits=5)
scores = cv.cross_validate(model, X, y)
Requirements
- Python 3.7+
- NumPy >= 1.19.0
- Pandas >= 1.1.0
- scikit-learn >= 0.24.0
- Matplotlib >= 3.3.0
- Seaborn >= 0.11.0
- Joblib >= 1.0.0
- SciPy >= 1.6.0 # Required for advanced statistical tests in guardrails
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Issue Tracker: GitHub Issues
- Documentation: GitHub README
- Source Code: GitHub Repository
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlon-1.1.1.tar.gz.
File metadata
- Download URL: mlon-1.1.1.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa7efa825e1ca1fec8a0e21322e95460ed8aec885d915a859f5fc603d5bfbd22
|
|
| MD5 |
44dfe40cf4fef1d81132f4bb08f5cca1
|
|
| BLAKE2b-256 |
0a06fa8e2e5754cd1a9f0660c110412f6ffb27ae60f85bb89b750aa1ecb928c2
|
File details
Details for the file mlon-1.1.1-py3-none-any.whl.
File metadata
- Download URL: mlon-1.1.1-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89c50a33ae800a0ab8bd9d90396d4bd7d01909040db271f88b5674064477d31a
|
|
| MD5 |
f9cb5dce96dc125c87b5e0b93e81e873
|
|
| BLAKE2b-256 |
f3b4cfd1e986965d7dfa2b77b6ceec70ca577c956015b5fdd9ee6dd1dbb517dd
|