Skip to main content

A Lightweight MLOps Framework for Machine Learning Workflows

Project description

Collie

PyPI version Python 3.10+ License: MIT Documentation codecov

A Lightweight MLOps Framework for Machine Learning Workflows

Overview

Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.

Features

  • Component-Based Architecture: Modular design with specialized components for each ML workflow stage
  • MLflow Integration: Built-in experiment tracking, model registration, and deployment capabilities
  • Pipeline Orchestration: Seamless workflow management with event-driven architecture
  • Model Management: Automated model versioning, staging, and promotion
  • Framework Agnostic: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)

Architecture

Collie follows an event-driven architecture with the following core components:

  • Transformer: Data preprocessing and feature engineering
  • Tuner: Hyperparameter optimization
  • Trainer: Model training and validation
  • Evaluator: Model evaluation and comparison
  • Pusher: Model deployment and registration
  • Orchestrator: Workflow coordination and execution

Quick Start

Installation

pip install collie-mlops

This will install Collie with all supported ML frameworks including:

  • scikit-learn
  • PyTorch
  • XGBoost
  • LightGBM
  • Transformers (with Sentence Transformers)

Prerequisites

  • Python >= 3.10
  • MLflow tracking server (can be local or remote)

Components

Transformer

Handles data preprocessing, feature engineering, and data validation.

class CustomTransformer(Transformer):
    def handle(self, event) -> Event:
        # Process your data
        processed_data = ... 
        return Event(payload=TransformerPayload(train_data=processed_data))

Tuner

Performs hyperparameter optimization using various strategies.

class CustomTuner(Tuner):
    def handle(self, event) -> Event:
        # Optimize hyperparameters
        best_params = ...
        return Event(payload=TunerPayload(hyperparameters=best_params))

Trainer

Trains machine learning models with automatic experiment tracking.

class CustomTrainer(Trainer):
    def handle(self, event) -> Event:
        # Train your model
        model = ...
        return Event(payload=TrainerPayload(model=model))

Evaluator

Evaluates model performance and decides on deployment.

class CustomEvaluator(Evaluator):
    def handle(self, event) -> Event:
        # Evaluate model performance
        metrics = ...
        is_better: bool = ...
        return Event(payload=EvaluatorPayload(
            metrics=metrics, 
            is_better_than_production=is_better
        ))

Pusher

Handles model deployment and registration.

class CustomPusher(Pusher):
    def handle(self, event) -> Event:
        # Deploy model to production
        model_uri = ...
        return Event(payload=PusherPayload(model_uri=model_uri))

Configuration

MLflow Setup

Start MLflow tracking server:

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5000

Supported Frameworks

Collie supports multiple ML frameworks through its model flavor system currently:

  • PyTorch
  • scikit-learn
  • XGBoost
  • LightGBM
  • Transformers

Documentation

Here you are

Roadmap

  • TensorFlow/Keras support
  • Model monitoring and drift detection
  • Integrate an LLM training/fine-tuning framework

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Collie in your research, please cite:

@software{collie2025,
  author = {ChingHuanChiu},
  title = {Collie: A Lightweight MLOps Framework},
  year = {2025},
  url = {https://github.com/ChingHuanChiu/collie}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collie_mlops-0.1.1b0.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

collie_mlops-0.1.1b0-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file collie_mlops-0.1.1b0.tar.gz.

File metadata

  • Download URL: collie_mlops-0.1.1b0.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for collie_mlops-0.1.1b0.tar.gz
Algorithm Hash digest
SHA256 1c1c13ae9512ce74063bada8d34e39320af783707289f5766365bece8d0d22c9
MD5 d841841ed8a92ebe007a9d49a1b51f13
BLAKE2b-256 e71f86dd2b0523d675d709383f9498a14ed782b2e924cb28acde91bdbf3df52b

See more details on using hashes here.

File details

Details for the file collie_mlops-0.1.1b0-py3-none-any.whl.

File metadata

  • Download URL: collie_mlops-0.1.1b0-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for collie_mlops-0.1.1b0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2e09d6301fe6bed9323174bb7bed49e10f2923672998f9f254376311ed5572a
MD5 724928fe3d468c53f85dae03d1e201cd
BLAKE2b-256 893d13b0b890faa68be4d6e503aaad82fb1b3aa862f9bb542b3c4e0ad8a39116

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page