Skip to main content

ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code

Project description

MLFCrafter

ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code

PyPI Version Python Support Tests Documentation License Downloads


If you find MLFCrafter useful, please consider starring this repository!

GitHub stars

Your support helps us continue developing and improving MLFCrafter for the ML community.


What is MLFCrafter?

MLFCrafter is a Python framework that simplifies machine learning pipeline creation through chainable "crafter" components. Build, train, and deploy ML models with minimal code and maximum flexibility.

Key Features

  • 🔗 Chainable Architecture - Connect multiple processing steps seamlessly
  • 📊 Smart Data Handling - Automatic data ingestion from CSV, Excel, JSON
  • 🧹 Intelligent Cleaning - Multiple strategies for missing value handling
  • 📏 Flexible Scaling - MinMax, Standard, and Robust scaling options
  • 🤖 Multiple Models - Random Forest, XGBoost, Logistic Regression support
  • 📈 Comprehensive Metrics - Accuracy, Precision, Recall, F1-Score
  • 💾 Easy Deployment - One-click model saving with metadata
  • 🔄 Context-Based - Seamless data flow between pipeline steps

Quick Start

Installation

pip install mlfcrafter

Basic Usage

from mlfcrafter import MLFChain, DataIngestCrafter, CleanerCrafter, ScalerCrafter, ModelCrafter, ScorerCrafter, DeployCrafter

# Create ML pipeline in one line
chain = MLFChain(
    DataIngestCrafter(data_path="data/iris.csv"),
    CleanerCrafter(strategy="auto"),
    ScalerCrafter(scaler_type="standard"),
    ModelCrafter(model_name="random_forest"),
    ScorerCrafter(),
    DeployCrafter()
)

# Run entire pipeline
results = chain.run(target_column="species")
print(f"Test Score: {results['test_score']:.4f}")

Advanced Configuration

chain = MLFChain(
    DataIngestCrafter(data_path="data/titanic.csv", source_type="csv"),
    CleanerCrafter(strategy="mean", str_fill="Unknown"),
    ScalerCrafter(scaler_type="minmax", columns=["age", "fare"]),
    ModelCrafter(
        model_name="xgboost",
        model_params={"n_estimators": 200, "max_depth": 6},
        test_size=0.25
    ),
    ScorerCrafter(),
    DeployCrafter(model_path="models/titanic_model.joblib")
)

results = chain.run(target_column="survived")

Components (Crafters)

DataIngestCrafter

Loads data from various file formats:

DataIngestCrafter(
    data_path="path/to/data.csv",
    source_type="auto"  # auto, csv, excel, json
)

CleanerCrafter

Handles missing values intelligently:

CleanerCrafter(
    strategy="auto",    # auto, mean, median, mode, drop, constant
    str_fill="missing", # Fill value for strings
    int_fill=0.0       # Fill value for numbers
)

ScalerCrafter

Scales numerical features:

ScalerCrafter(
    scaler_type="standard",  # standard, minmax, robust
    columns=["age", "income"]  # Specific columns or None for all numeric
)

ModelCrafter

Trains ML models:

ModelCrafter(
    model_name="random_forest",  # random_forest, xgboost, logistic_regression
    model_params={"n_estimators": 100},
    test_size=0.2,
    stratify=True
)

ScorerCrafter

Calculates performance metrics:

ScorerCrafter(
    metrics=["accuracy", "precision", "recall", "f1"]  # Default: all metrics
)

DeployCrafter

Saves trained models:

DeployCrafter(
    model_path="model.joblib",
    save_format="joblib",  # joblib or pickle
    include_scaler=True,
    include_metadata=True
)

Alternative Usage Patterns

Step-by-Step Building

chain = MLFChain()
chain.add_crafter(DataIngestCrafter(data_path="data.csv"))
chain.add_crafter(CleanerCrafter(strategy="median"))
chain.add_crafter(ModelCrafter(model_name="xgboost"))
results = chain.run(target_column="target")

Loading Saved Models

artifacts = DeployCrafter.load_model("model.joblib")
model = artifacts["model"]
metadata = artifacts["metadata"]

Requirements

  • Python: 3.8 or higher
  • Core Dependencies: pandas, scikit-learn, numpy, xgboost, joblib

Development

Setup Development Environment

git clone https://github.com/brkcvlk/mlfcrafter.git
cd mlfcrafter
pip install -r requirements-dev.txt
pip install -e .

Run Tests

# Run all tests
python -m pytest tests/ -v

# Run tests with coverage  
python -m pytest tests/ -v --cov=mlfcrafter --cov-report=html

# Check code quality
ruff check .

# Auto-fix code issues
ruff check --fix .

# Format code
ruff format .

Run Examples

python example.py

Documentation

Complete documentation is available at MLFCrafter Docs

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support


Made for the ML Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlfcrafter-0.1.1.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlfcrafter-0.1.1-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file mlfcrafter-0.1.1.tar.gz.

File metadata

  • Download URL: mlfcrafter-0.1.1.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mlfcrafter-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7501546696fbf815c5714780f2d0a08dd1dd22ef5cd8af7207de5eaf939558b8
MD5 4933a0c30f6556a0f789e6d49c77e7f3
BLAKE2b-256 0cd5020b6f64e01cdfc803bc0f38da858089a03f5947fe9e87f4a40a46856215

See more details on using hashes here.

File details

Details for the file mlfcrafter-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mlfcrafter-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mlfcrafter-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a6336a67e0c45bd063c13a7e48303f0d17eac2336e4bd3c89cd3cabe6fbe789f
MD5 328ddb93cb1606467dff22713a4874f1
BLAKE2b-256 a4025d0b545fd2105e0eed5de07ef22d1f0bb39ba42714c23bdf0eafa4d568eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page