MKYZ is a Python library for ML and data science tasks.
Project description
MKYZ - Machine Learning Library
MKYZ is a comprehensive Python machine learning library designed to simplify data processing, model training, evaluation, and visualization tasks. Built on top of scikit-learn, it provides a unified API for common ML workflows.
📚 Examples
You can find comprehensive Jupyter notebooks in the examples/ directory:
- 01_quickstart.ipynb: End-to-end basic ML workflow.
- 02_data_profiling_and_eda.ipynb: Data profiling and HTML reports.
- 03_feature_engineering.ipynb: Advanced feature creation.
- 04_automl_and_optimization.ipynb: AutoML and hyperparameter tuning.
- 05_model_persistence_and_reports.ipynb: Saving models and rich reports.
🏗️ Architecture
Core Capabilities
- 🔄 Data Preparation - Automatic handling of missing values, outliers, and categorical encoding
- 🎯 Model Training - Support for 20+ classification, regression, and clustering algorithms
- 📊 AutoML - Automatic model selection and hyperparameter optimization
- 📈 Evaluation - Comprehensive metrics with 10 cross-validation strategies
- 🎨 Visualization - 40+ built-in plotting functions for EDA and model results
New in v0.2.0
- 💾 Model Persistence - Save and load models with metadata
- 🔧 Feature Engineering - Polynomial, datetime, lag, and rolling features
- 📝 Auto Reports - Generate HTML reports with one line of code
- ⚡ Parallel Processing - Built-in utilities for faster training
- 🛡️ Robust Error Handling - Custom exceptions for better debugging
📦 Installation
pip install mkyz
From Source
git clone https://github.com/mmustafakapici/mkyz.git
cd mkyz
pip install -e .
Dependencies
pandas, scikit-learn, numpy, matplotlib, seaborn,
plotly, xgboost, lightgbm, catboost, rich, mlxtend
🚀 Quick Start
Basic Usage (Original API)
import mkyz
# 1. Prepare data
data = mkyz.prepare_data('dataset.csv', target_column='price')
# 2. Train model
model = mkyz.train(data, task='classification', model='rf')
# 3. Make predictions
predictions = mkyz.predict(data, model)
# 4. Evaluate
scores = mkyz.evaluate(data, predictions)
print(scores)
# 5. Visualize
mkyz.visualize(data)
AutoML - Find the Best Model
import mkyz
data = mkyz.prepare_data('dataset.csv', target_column='target')
# Automatically train and compare all models
best_model = mkyz.auto_train(
data,
task='classification',
optimize_models=True,
optimization_method='bayesian'
)
New Modular API (v0.2.0)
import mkyz
# Configure globally
mkyz.set_config(random_state=42, n_jobs=-1, verbose=1)
# Load data flexibly
df = mkyz.load_data('data.csv') # Also supports Excel, JSON, Parquet
# Validate dataset
validation = mkyz.validate_dataset(df, target_column='target')
if not validation['is_valid']:
print(validation['issues'])
# Feature Engineering
fe = mkyz.FeatureEngineer()
df = fe.create_datetime_features(df, 'date_column')
df = fe.create_polynomial_features(df, ['age', 'income'], degree=2)
# Select best features
selected = mkyz.select_features(X, y, k=10, method='mutual_info')
# Advanced Cross-Validation
results = mkyz.cross_validate(
model, X, y,
cv=mkyz.CVStrategy.STRATIFIED,
n_splits=5,
return_train_score=True
)
print(f"Mean accuracy: {results['mean_test_score']:.4f}")
# Save trained model
mkyz.save_model(model, 'models/my_model', metadata={'version': '1.0'})
# Load model later
model = mkyz.load_model('models/my_model.joblib')
# Generate comprehensive report
report = mkyz.ModelReport(model, X_test, y_test, task='classification')
report.generate()
report.export_html('reports/model_report.html')
print(report.summary())
📚 Documentation
Modules Overview
| Module | Description |
|---|---|
mkyz.core |
Configuration, exceptions, base classes |
mkyz.data |
Data loading, preprocessing, feature engineering |
mkyz.evaluation |
Metrics, cross-validation, reporting |
mkyz.persistence |
Model saving and loading |
mkyz.utils |
Logging and parallel processing utilities |
Detailed Guides
- Installation Guide
- Quick Start Tutorial
- Data Preparation Guide
- Feature Engineering Guide
- Model Training Guide
- API Reference
🔧 Supported Models
Classification
| Model | Key | Description |
|---|---|---|
| Random Forest | rf |
Ensemble of decision trees |
| Logistic Regression | lr |
Linear classification |
| SVM | svm |
Support Vector Machine |
| KNN | knn |
K-Nearest Neighbors |
| Decision Tree | dt |
Single decision tree |
| Naive Bayes | nb |
Probabilistic classifier |
| Gradient Boosting | gb |
Boosted trees |
| XGBoost | xgb |
Extreme Gradient Boosting |
| LightGBM | lgbm |
Light Gradient Boosting |
| CatBoost | catboost |
Categorical Boosting |
Regression
| Model | Key | Description |
|---|---|---|
| Random Forest | rf |
Ensemble regressor |
| Linear Regression | lr |
OLS regression |
| SVR | svm |
Support Vector Regression |
| KNN | knn |
K-Nearest Neighbors |
| Decision Tree | dt |
Single decision tree |
Clustering
| Model | Key | Description |
|---|---|---|
| K-Means | kmeans |
Centroid-based |
| DBSCAN | dbscan |
Density-based |
| Agglomerative | agglomerative |
Hierarchical |
| GMM | gmm |
Gaussian Mixture |
| Mean Shift | mean_shift |
Mode-seeking |
Dimensionality Reduction
| Model | Key | Description |
|---|---|---|
| PCA | pca |
Principal Component Analysis |
| SVD | svd |
Truncated SVD |
| NMF | nmf |
Non-negative Matrix Factorization |
📊 Cross-Validation Strategies
from mkyz import cross_validate, CVStrategy
# Available strategies
strategies = [
CVStrategy.KFOLD, # Standard K-Fold
CVStrategy.STRATIFIED, # Stratified K-Fold (default)
CVStrategy.TIME_SERIES, # Time Series Split
CVStrategy.GROUP, # Group K-Fold
CVStrategy.REPEATED, # Repeated K-Fold
CVStrategy.REPEATED_STRATIFIED,# Repeated Stratified
CVStrategy.LEAVE_ONE_OUT, # Leave-One-Out
CVStrategy.SHUFFLE, # Shuffle Split
CVStrategy.STRATIFIED_SHUFFLE # Stratified Shuffle
]
# Usage
results = cross_validate(model, X, y, cv=CVStrategy.TIME_SERIES, n_splits=5)
🔧 Configuration
import mkyz
# View current config
print(mkyz.get_config().to_dict())
# Update config
mkyz.set_config(
random_state=42,
n_jobs=-1,
cv_folds=5,
verbose=1,
dark_mode=True
)
Available Settings
| Setting | Default | Description |
|---|---|---|
random_state |
42 | Random seed for reproducibility |
n_jobs |
-1 | Parallel jobs (-1 = all CPUs) |
cv_folds |
5 | Default CV folds |
test_size |
0.2 | Train/test split ratio |
verbose |
1 | Verbosity level |
optimization_method |
'grid_search' | 'grid_search' or 'bayesian' |
missing_value_strategy |
'mean' | 'mean', 'median', 'mode', 'drop' |
outlier_strategy |
'remove' | 'remove', 'cap', 'keep' |
🛡️ Error Handling
from mkyz import (
MKYZError, # Base exception
DataValidationError, # Data issues
ModelNotTrainedError,# Model not fitted
UnsupportedTaskError,# Invalid task type
PersistenceError # Save/load failures
)
try:
model = mkyz.load_model('nonexistent.joblib')
except PersistenceError as e:
print(f"Failed to load model: {e}")
📈 Visualization
import mkyz
# EDA visualizations
mkyz.visualize(data, plot_type='histogram')
mkyz.visualize(data, plot_type='correlation')
mkyz.visualize(data, plot_type='boxplot')
# Available plot types:
# histogram, bar, box, violin, pie, scatter, line,
# heatmap, pair, swarm, strip, kde, ridge, density,
# joint, regression, residual, qq, ecdf, dendrogram...
🤝 Contributing
Contributions are welcome! Please read our Contributing Guide.
# Clone the repository
git clone https://github.com/mmustafakapici/mkyz.git
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
👤 Author
Mustafa Kapıcı
- Email: m.mustafakapici@gmail.com
- GitHub: @mmustafakapici
🙏 Acknowledgments
- Built on top of scikit-learn
- Boosting models from XGBoost, LightGBM, CatBoost
- Visualization powered by Matplotlib, Seaborn, Plotly
Made with ❤️ in Turkey
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mkyz-0.2.3.tar.gz.
File metadata
- Download URL: mkyz-0.2.3.tar.gz
- Upload date:
- Size: 59.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51e3ef68c4510492585e7aa38e7119390e71f9eaa994049b6792e94f64456fb9
|
|
| MD5 |
def77a9616c5b9b75de25aebe17fc730
|
|
| BLAKE2b-256 |
a9993f5527d9cb6f5981a83324ec6478707ad7e11a0786a6eea59948b7737d7b
|
File details
Details for the file mkyz-0.2.3-py3-none-any.whl.
File metadata
- Download URL: mkyz-0.2.3-py3-none-any.whl
- Upload date:
- Size: 64.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efaf2c8b1a6b89d99246b90d1d9fd159f09e85fc4c5404f38212e654fde19511
|
|
| MD5 |
c0c4f5fd912ca1569f34b02dcaaa47f4
|
|
| BLAKE2b-256 |
22f77df48600735068b8df808ac12e955865dbd58db19a287e6d211ec6d976c9
|