Skip to main content

ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code

Project description

MLFCrafter

ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code

PyPI Version Python Support Tests Documentation License Downloads


If you find MLFCrafter useful, please consider starring this repository!

GitHub stars

Your support helps us continue developing and improving MLFCrafter for the ML community.


What is MLFCrafter?

MLFCrafter is a Python framework that simplifies machine learning pipeline creation through chainable "crafter" components. Build, train, and deploy ML models with minimal code and maximum flexibility.

Key Features

  • 🔗 Chainable Architecture - Connect multiple processing steps seamlessly
  • 📊 Smart Data Handling - Automatic data ingestion from CSV, Excel, JSON
  • 🧹 Intelligent Cleaning - Multiple strategies for missing value handling
  • 📏 Flexible Scaling - MinMax, Standard, and Robust scaling options
  • 🤖 Multiple Models - Random Forest, XGBoost, Logistic Regression support
  • 📈 Comprehensive Metrics - Accuracy, Precision, Recall, F1-Score
  • 💾 Easy Deployment - One-click model saving with metadata
  • 🔄 Context-Based - Seamless data flow between pipeline steps

Quick Start

Installation

pip install mlfcrafter

Basic Usage

from mlfcrafter import MLFChain, DataIngestCrafter, CleanerCrafter, ScalerCrafter, ModelCrafter, ScorerCrafter, DeployCrafter

# Create ML pipeline in one line
chain = MLFChain(
    DataIngestCrafter(data_path="data/iris.csv"),
    CleanerCrafter(strategy="auto"),
    ScalerCrafter(scaler_type="standard"),
    ModelCrafter(model_name="random_forest"),
    ScorerCrafter(),
    DeployCrafter()
)

# Run entire pipeline
results = chain.run(target_column="species")
print(f"Test Score: {results['test_score']:.4f}")

Advanced Configuration

chain = MLFChain(
    DataIngestCrafter(data_path="data/titanic.csv", source_type="csv"),
    CleanerCrafter(strategy="mean", str_fill="Unknown"),
    ScalerCrafter(scaler_type="minmax", columns=["age", "fare"]),
    ModelCrafter(
        model_name="xgboost",
        model_params={"n_estimators": 200, "max_depth": 6},
        test_size=0.25
    ),
    ScorerCrafter(),
    DeployCrafter(model_path="models/titanic_model.joblib")
)

results = chain.run(target_column="survived")

Components (Crafters)

DataIngestCrafter

Loads data from various file formats:

DataIngestCrafter(
    data_path="path/to/data.csv",
    source_type="auto"  # auto, csv, excel, json
)

CleanerCrafter

Handles missing values intelligently:

CleanerCrafter(
    strategy="auto",    # auto, mean, median, mode, drop, constant
    str_fill="missing", # Fill value for strings
    int_fill=0.0       # Fill value for numbers
)

ScalerCrafter

Scales numerical features:

ScalerCrafter(
    scaler_type="standard",  # standard, minmax, robust
    columns=["age", "income"]  # Specific columns or None for all numeric
)

ModelCrafter

Trains ML models:

ModelCrafter(
    model_name="random_forest",  # random_forest, xgboost, logistic_regression
    model_params={"n_estimators": 100},
    test_size=0.2,
    stratify=True
)

ScorerCrafter

Calculates performance metrics:

ScorerCrafter(
    metrics=["accuracy", "precision", "recall", "f1"]  # Default: all metrics
)

DeployCrafter

Saves trained models:

DeployCrafter(
    model_path="model.joblib",
    save_format="joblib",  # joblib or pickle
    include_scaler=True,
    include_metadata=True
)

Alternative Usage Patterns

Step-by-Step Building

chain = MLFChain()
chain.add_crafter(DataIngestCrafter(data_path="data.csv"))
chain.add_crafter(CleanerCrafter(strategy="median"))
chain.add_crafter(ModelCrafter(model_name="xgboost"))
results = chain.run(target_column="target")

Loading Saved Models

artifacts = DeployCrafter.load_model("model.joblib")
model = artifacts["model"]
metadata = artifacts["metadata"]

Requirements

  • Python: 3.8 or higher
  • Core Dependencies: pandas, scikit-learn, numpy, xgboost, joblib

Development

Setup Development Environment

git clone https://github.com/brkcvlk/mlfcrafter.git
cd mlfcrafter
pip install -r requirements-dev.txt
pip install -e .

Run Tests

# Run all tests
python -m pytest tests/ -v

# Run tests with coverage  
python -m pytest tests/ -v --cov=mlfcrafter --cov-report=html

# Check code quality
ruff check .

# Auto-fix code issues
ruff check --fix .

# Format code
ruff format .

Run Examples

python example.py

Documentation

Complete documentation is available at MLFCrafter Docs

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support


Made for the ML Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlfcrafter-0.1.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlfcrafter-0.1.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file mlfcrafter-0.1.0.tar.gz.

File metadata

  • Download URL: mlfcrafter-0.1.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for mlfcrafter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16a22537ec1a9ec6817f7c4e23bd0c172fca62961707585629731585a19d9395
MD5 ca872150611df2e35d15fc5acefcf8c9
BLAKE2b-256 14f6ed2d765fba0cff7784650861e74b081f98b84100291d482fa71376f6c21e

See more details on using hashes here.

File details

Details for the file mlfcrafter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlfcrafter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for mlfcrafter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 190562de734124cee96ccb45a9ecaccbca9dc9d659366f289e6ae7efa53da7db
MD5 fcbf043ed6f0ee34d16171ce94a215d0
BLAKE2b-256 4b302f3ee8a243fadfa190151a77cef0f083e7478d9ac24dfb8bbf52e31558c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page