Skip to main content

A Lightweight MLOps Framework for Machine Learning Workflows

Project description

Collie

PyPI version Python 3.10+ License: MIT Documentation codecov

A Lightweight MLOps Framework for Machine Learning Workflows

Overview

Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.

Features

  • Component-Based Architecture: Modular design with specialized components for each ML workflow stage
  • MLflow Integration: Built-in experiment tracking, model registration, and deployment capabilities
  • Pipeline Orchestration: Seamless workflow management with event-driven architecture
  • Model Management: Automated model versioning, staging, and promotion
  • Framework Agnostic: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)

Architecture

Collie follows an event-driven architecture with the following core components:

  • Transformer: Data preprocessing and feature engineering
  • Tuner: Hyperparameter optimization
  • Trainer: Model training and validation
  • Evaluator: Model evaluation and comparison
  • Pusher: Model deployment and registration
  • Orchestrator: Workflow coordination and execution

Quick Start

Installation

Basic Installation (Core Framework Only)

pip install collie-mlops

This installs the core MLOps orchestration framework with MLflow integration (~100MB).

Install with ML Frameworks

Choose the installation that fits your needs:

For Traditional ML (Tabular Data)

# Individual frameworks
pip install collie-mlops[sklearn]      # scikit-learn support
pip install collie-mlops[xgboost]      # XGBoost support
pip install collie-mlops[lightgbm]     # LightGBM support

# Or install all tabular ML frameworks (~250MB)
pip install collie-mlops[tabular]

For Deep Learning

# PyTorch ecosystem (includes Transformers for NLP/Vision) (~3GB)
pip install collie-mlops[pytorch]

# Or use the alias
pip install collie-mlops[deep-learning]

For Complete Installation

# All frameworks (~3.5GB)
pip install collie-mlops[all]

Prerequisites

  • Python >= 3.10
  • MLflow tracking server (can be local or remote)

Components

Transformer

Handles data preprocessing, feature engineering, and data validation.

class CustomTransformer(Transformer):
    def handle(self, event) -> Event:
        # Process your data
        processed_data = ... 
        return Event(payload=TransformerPayload(train_data=processed_data))

Tuner

Performs hyperparameter optimization using various strategies.

class CustomTuner(Tuner):
    def handle(self, event) -> Event:
        # Optimize hyperparameters
        best_params = ...
        return Event(payload=TunerPayload(hyperparameters=best_params))

Trainer

Trains machine learning models with automatic experiment tracking.

class CustomTrainer(Trainer):
    def handle(self, event) -> Event:
        # Train your model
        model = ...
        return Event(payload=TrainerPayload(model=model))

Evaluator

Evaluates model performance and decides on deployment.

class CustomEvaluator(Evaluator):
    def handle(self, event) -> Event:
        # Evaluate model performance
        metrics = ...
        is_better: bool = ...
        return Event(payload=EvaluatorPayload(
            metrics=metrics, 
            is_better_than_production=is_better
        ))

Pusher

Handles model deployment and registration.

class CustomPusher(Pusher):
    def handle(self, event) -> Event:
        # Deploy model to production
        model_uri = ...
        return Event(payload=PusherPayload(model_uri=model_uri))

Orchestrator

Coordinates the execution of all components in the pipeline.

from collie import Orchestrator

# Create orchestrator with all components
orchestrator = Orchestrator(
    components=[
        CustomTransformer(),
        CustomTuner(),
        CustomTrainer(),
        CustomEvaluator(),
        CustomPusher()
    ],
    tracking_uri="http://localhost:5000",
    experiment_name="my_experiment",
    registered_model_name="my_model",
    mlflow_tags={"project": "my_project"},
    description="My ML Pipeline"
)

# Run the entire pipeline
orchestrator.run()

Configuration

MLflow Setup

Start MLflow tracking server:

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5000

Supported Frameworks

Collie supports multiple ML frameworks through its flexible optional dependency system:

Available Frameworks

  • scikit-learn - Traditional ML algorithms
  • XGBoost - Gradient boosting for tabular data
  • LightGBM - Fast gradient boosting framework
  • PyTorch - Deep learning framework
  • PyTorch Lightning - High-level PyTorch wrapper
  • Transformers - Hugging Face transformers for NLP
  • Sentence Transformers - Sentence embeddings

Installation Options

Use Case Command Size Frameworks Included
Core Only pip install collie-mlops ~100MB MLflow orchestration only
Tabular ML pip install collie-mlops[tabular] ~250MB sklearn, XGBoost, LightGBM
Deep Learning pip install collie-mlops[pytorch] ~3GB PyTorch, Lightning, Transformers
Complete pip install collie-mlops[all] ~3.5GB All frameworks

Note: Install only what you need to keep your environment lightweight!

Documentation

Here you are

Roadmap

Core Features

  • Pipeline Checkpoint & Resume - Save intermediate results and resume from failure points

Framework Support

  • TensorFlow/Keras support
  • Model monitoring and drift detection

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Collie in your research, please cite:

@software{collie2025,
  author = {ChingHuanChiu},
  title = {Collie: A Lightweight MLOps Framework},
  year = {2025},
  url = {https://github.com/ChingHuanChiu/collie}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collie_mlops-0.1.2b0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

collie_mlops-0.1.2b0-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file collie_mlops-0.1.2b0.tar.gz.

File metadata

  • Download URL: collie_mlops-0.1.2b0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for collie_mlops-0.1.2b0.tar.gz
Algorithm Hash digest
SHA256 f1babad486ef2b2aed0448b4ae578e23f4fc45aa69c7c8162aab287205c03ee8
MD5 12e42befdabc2aed9c431d471efb842b
BLAKE2b-256 3834caf498e7875073a3bc6a8ec5bc5b909b6e12795f6ac24e30aa5001438d03

See more details on using hashes here.

File details

Details for the file collie_mlops-0.1.2b0-py3-none-any.whl.

File metadata

  • Download URL: collie_mlops-0.1.2b0-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for collie_mlops-0.1.2b0-py3-none-any.whl
Algorithm Hash digest
SHA256 f78290fe20378ff065d7be2fa075dd44e72aaeb7a3039162a062dbe97244b846
MD5 e5704a85541c8b784431be2cb9eef9ac
BLAKE2b-256 231e2dd540092b40a9e5355909814f7c936a767e5f5aaf5ba2e67291a7c00c97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page