Skip to main content

MKYZ is a Python library for ML and data science tasks.

Project description

MKYZ - Machine Learning Library

Version Python License

MKYZ is a comprehensive Python machine learning library designed to simplify data processing, model training, evaluation, and visualization tasks. Built on top of scikit-learn, it provides a unified API for common ML workflows.

📚 Examples

You can find comprehensive Jupyter notebooks in the examples/ directory:


🏗️ Architecture

Core Capabilities

  • 🔄 Data Preparation - Automatic handling of missing values, outliers, and categorical encoding
  • 🎯 Model Training - Support for 20+ classification, regression, and clustering algorithms
  • 📊 AutoML - Automatic model selection and hyperparameter optimization
  • 📈 Evaluation - Comprehensive metrics with 10 cross-validation strategies
  • 🎨 Visualization - 40+ built-in plotting functions for EDA and model results

New in v0.2.0

  • 💾 Model Persistence - Save and load models with metadata
  • 🔧 Feature Engineering - Polynomial, datetime, lag, and rolling features
  • 📝 Auto Reports - Generate HTML reports with one line of code
  • ⚡ Parallel Processing - Built-in utilities for faster training
  • 🛡️ Robust Error Handling - Custom exceptions for better debugging

📦 Installation

pip install mkyz

From Source

git clone https://github.com/mmustafakapici/mkyz.git
cd mkyz
pip install -e .

Dependencies

pandas, scikit-learn, numpy, matplotlib, seaborn, 
plotly, xgboost, lightgbm, catboost, rich, mlxtend

🚀 Quick Start

Basic Usage (Original API)

import mkyz

# 1. Prepare data
data = mkyz.prepare_data('dataset.csv', target_column='price')

# 2. Train model
model = mkyz.train(data, task='classification', model='rf')

# 3. Make predictions
predictions = mkyz.predict(data, model)

# 4. Evaluate
scores = mkyz.evaluate(data, predictions)
print(scores)

# 5. Visualize
mkyz.visualize(data)

AutoML - Find the Best Model

import mkyz

data = mkyz.prepare_data('dataset.csv', target_column='target')

# Automatically train and compare all models
best_model = mkyz.auto_train(
    data, 
    task='classification',
    optimize_models=True,
    optimization_method='bayesian'
)

New Modular API (v0.2.0)

import mkyz

# Configure globally
mkyz.set_config(random_state=42, n_jobs=-1, verbose=1)

# Load data flexibly
df = mkyz.load_data('data.csv')  # Also supports Excel, JSON, Parquet

# Validate dataset
validation = mkyz.validate_dataset(df, target_column='target')
if not validation['is_valid']:
    print(validation['issues'])

# Feature Engineering
fe = mkyz.FeatureEngineer()
df = fe.create_datetime_features(df, 'date_column')
df = fe.create_polynomial_features(df, ['age', 'income'], degree=2)

# Select best features
selected = mkyz.select_features(X, y, k=10, method='mutual_info')

# Advanced Cross-Validation
results = mkyz.cross_validate(
    model, X, y,
    cv=mkyz.CVStrategy.STRATIFIED,
    n_splits=5,
    return_train_score=True
)
print(f"Mean accuracy: {results['mean_test_score']:.4f}")

# Save trained model
mkyz.save_model(model, 'models/my_model', metadata={'version': '1.0'})

# Load model later
model = mkyz.load_model('models/my_model.joblib')

# Generate comprehensive report
report = mkyz.ModelReport(model, X_test, y_test, task='classification')
report.generate()
report.export_html('reports/model_report.html')
print(report.summary())

📚 Documentation

Modules Overview

Module Description
mkyz.core Configuration, exceptions, base classes
mkyz.data Data loading, preprocessing, feature engineering
mkyz.evaluation Metrics, cross-validation, reporting
mkyz.persistence Model saving and loading
mkyz.utils Logging and parallel processing utilities

Detailed Guides

🔧 Supported Models

Classification

Model Key Description
Random Forest rf Ensemble of decision trees
Logistic Regression lr Linear classification
SVM svm Support Vector Machine
KNN knn K-Nearest Neighbors
Decision Tree dt Single decision tree
Naive Bayes nb Probabilistic classifier
Gradient Boosting gb Boosted trees
XGBoost xgb Extreme Gradient Boosting
LightGBM lgbm Light Gradient Boosting
CatBoost catboost Categorical Boosting

Regression

Model Key Description
Random Forest rf Ensemble regressor
Linear Regression lr OLS regression
SVR svm Support Vector Regression
KNN knn K-Nearest Neighbors
Decision Tree dt Single decision tree

Clustering

Model Key Description
K-Means kmeans Centroid-based
DBSCAN dbscan Density-based
Agglomerative agglomerative Hierarchical
GMM gmm Gaussian Mixture
Mean Shift mean_shift Mode-seeking

Dimensionality Reduction

Model Key Description
PCA pca Principal Component Analysis
SVD svd Truncated SVD
NMF nmf Non-negative Matrix Factorization

📊 Cross-Validation Strategies

from mkyz import cross_validate, CVStrategy

# Available strategies
strategies = [
    CVStrategy.KFOLD,              # Standard K-Fold
    CVStrategy.STRATIFIED,         # Stratified K-Fold (default)
    CVStrategy.TIME_SERIES,        # Time Series Split
    CVStrategy.GROUP,              # Group K-Fold
    CVStrategy.REPEATED,           # Repeated K-Fold
    CVStrategy.REPEATED_STRATIFIED,# Repeated Stratified
    CVStrategy.LEAVE_ONE_OUT,      # Leave-One-Out
    CVStrategy.SHUFFLE,            # Shuffle Split
    CVStrategy.STRATIFIED_SHUFFLE  # Stratified Shuffle
]

# Usage
results = cross_validate(model, X, y, cv=CVStrategy.TIME_SERIES, n_splits=5)

🔧 Configuration

import mkyz

# View current config
print(mkyz.get_config().to_dict())

# Update config
mkyz.set_config(
    random_state=42,
    n_jobs=-1,
    cv_folds=5,
    verbose=1,
    dark_mode=True
)

Available Settings

Setting Default Description
random_state 42 Random seed for reproducibility
n_jobs -1 Parallel jobs (-1 = all CPUs)
cv_folds 5 Default CV folds
test_size 0.2 Train/test split ratio
verbose 1 Verbosity level
optimization_method 'grid_search' 'grid_search' or 'bayesian'
missing_value_strategy 'mean' 'mean', 'median', 'mode', 'drop'
outlier_strategy 'remove' 'remove', 'cap', 'keep'

🛡️ Error Handling

from mkyz import (
    MKYZError,           # Base exception
    DataValidationError, # Data issues
    ModelNotTrainedError,# Model not fitted
    UnsupportedTaskError,# Invalid task type
    PersistenceError     # Save/load failures
)

try:
    model = mkyz.load_model('nonexistent.joblib')
except PersistenceError as e:
    print(f"Failed to load model: {e}")

📈 Visualization

import mkyz

# EDA visualizations
mkyz.visualize(data, plot_type='histogram')
mkyz.visualize(data, plot_type='correlation')
mkyz.visualize(data, plot_type='boxplot')

# Available plot types:
# histogram, bar, box, violin, pie, scatter, line,
# heatmap, pair, swarm, strip, kde, ridge, density,
# joint, regression, residual, qq, ecdf, dendrogram...

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide.

# Clone the repository
git clone https://github.com/mmustafakapici/mkyz.git

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Mustafa Kapıcı

🙏 Acknowledgments


Made with ❤️ in Turkey

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkyz-0.2.3.tar.gz (59.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkyz-0.2.3-py3-none-any.whl (64.1 kB view details)

Uploaded Python 3

File details

Details for the file mkyz-0.2.3.tar.gz.

File metadata

  • Download URL: mkyz-0.2.3.tar.gz
  • Upload date:
  • Size: 59.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for mkyz-0.2.3.tar.gz
Algorithm Hash digest
SHA256 51e3ef68c4510492585e7aa38e7119390e71f9eaa994049b6792e94f64456fb9
MD5 def77a9616c5b9b75de25aebe17fc730
BLAKE2b-256 a9993f5527d9cb6f5981a83324ec6478707ad7e11a0786a6eea59948b7737d7b

See more details on using hashes here.

File details

Details for the file mkyz-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: mkyz-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 64.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for mkyz-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 efaf2c8b1a6b89d99246b90d1d9fd159f09e85fc4c5404f38212e654fde19511
MD5 c0c4f5fd912ca1569f34b02dcaaa47f4
BLAKE2b-256 22f77df48600735068b8df808ac12e955865dbd58db19a287e6d211ec6d976c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page