A Lightweight MLOps Framework for Machine Learning Workflows
Project description
Collie
A Lightweight MLOps Framework for Machine Learning Workflows
Overview
Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.
Features
- Component-Based Architecture: Modular design with specialized components for each ML workflow stage
- MLflow Integration: Built-in experiment tracking, model registration, and deployment capabilities
- Pipeline Orchestration: Seamless workflow management with event-driven architecture
- Model Management: Automated model versioning, staging, and promotion
- Framework Agnostic: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)
Architecture
Collie follows an event-driven architecture with the following core components:
- Transformer: Data preprocessing and feature engineering
- Tuner: Hyperparameter optimization
- Trainer: Model training and validation
- Evaluator: Model evaluation and comparison
- Pusher: Model deployment and registration
- Orchestrator: Workflow coordination and execution
Quick Start
Installation
Basic Installation (Core Framework Only)
pip install collie-mlops
This installs the core MLOps orchestration framework with MLflow integration (~100MB).
Install with ML Frameworks
Choose the installation that fits your needs:
For Traditional ML (Tabular Data)
# Individual frameworks
pip install collie-mlops[sklearn] # scikit-learn support
pip install collie-mlops[xgboost] # XGBoost support
pip install collie-mlops[lightgbm] # LightGBM support
# Or install all tabular ML frameworks (~250MB)
pip install collie-mlops[tabular]
For Deep Learning
# PyTorch ecosystem (includes Transformers for NLP/Vision) (~3GB)
pip install collie-mlops[pytorch]
# Or use the alias
pip install collie-mlops[deep-learning]
For Complete Installation
# All frameworks (~3.5GB)
pip install collie-mlops[all]
Prerequisites
- Python >= 3.10
- MLflow tracking server (can be local or remote)
Components
Transformer
Handles data preprocessing, feature engineering, and data validation.
class CustomTransformer(Transformer):
def handle(self, event) -> Event:
# Process your data
processed_data = ...
return Event(payload=TransformerPayload(train_data=processed_data))
Tuner
Performs hyperparameter optimization using various strategies.
class CustomTuner(Tuner):
def handle(self, event) -> Event:
# Optimize hyperparameters
best_params = ...
return Event(payload=TunerPayload(hyperparameters=best_params))
Trainer
Trains machine learning models with automatic experiment tracking.
class CustomTrainer(Trainer):
def handle(self, event) -> Event:
# Train your model
model = ...
return Event(payload=TrainerPayload(model=model))
Evaluator
Evaluates model performance and decides on deployment.
class CustomEvaluator(Evaluator):
def handle(self, event) -> Event:
# Evaluate model performance
metrics = ...
is_better: bool = ...
return Event(payload=EvaluatorPayload(
metrics=metrics,
is_better_than_production=is_better
))
Pusher
Handles model deployment and registration.
class CustomPusher(Pusher):
def handle(self, event) -> Event:
# Deploy model to production
model_uri = ...
return Event(payload=PusherPayload(model_uri=model_uri))
Orchestrator
Coordinates the execution of all components in the pipeline.
from collie import Orchestrator
# Create orchestrator with all components
orchestrator = Orchestrator(
components=[
CustomTransformer(),
CustomTuner(),
CustomTrainer(),
CustomEvaluator(),
CustomPusher()
],
tracking_uri="http://localhost:5000",
experiment_name="my_experiment",
registered_model_name="my_model",
mlflow_tags={"project": "my_project"},
description="My ML Pipeline"
)
# Run the entire pipeline
orchestrator.run()
Configuration
MLflow Setup
Start MLflow tracking server:
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
Supported Frameworks
Collie supports multiple ML frameworks through its flexible optional dependency system:
Available Frameworks
- scikit-learn - Traditional ML algorithms
- XGBoost - Gradient boosting for tabular data
- LightGBM - Fast gradient boosting framework
- PyTorch - Deep learning framework
- PyTorch Lightning - High-level PyTorch wrapper
- Transformers - Hugging Face transformers for NLP
- Sentence Transformers - Sentence embeddings
Installation Options
| Use Case | Command | Size | Frameworks Included |
|---|---|---|---|
| Core Only | pip install collie-mlops |
~100MB | MLflow orchestration only |
| Tabular ML | pip install collie-mlops[tabular] |
~250MB | sklearn, XGBoost, LightGBM |
| Deep Learning | pip install collie-mlops[pytorch] |
~3GB | PyTorch, Lightning, Transformers |
| Complete | pip install collie-mlops[all] |
~3.5GB | All frameworks |
Note: Install only what you need to keep your environment lightweight!
Documentation
Roadmap
Core Features
- Pipeline Checkpoint & Resume - Save intermediate results and resume from failure points
Framework Support
- TensorFlow/Keras support
- Model monitoring and drift detection
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use Collie in your research, please cite:
@software{collie2025,
author = {ChingHuanChiu},
title = {Collie: A Lightweight MLOps Framework},
year = {2025},
url = {https://github.com/ChingHuanChiu/collie}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file collie_mlops-0.1.2b0.tar.gz.
File metadata
- Download URL: collie_mlops-0.1.2b0.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1babad486ef2b2aed0448b4ae578e23f4fc45aa69c7c8162aab287205c03ee8
|
|
| MD5 |
12e42befdabc2aed9c431d471efb842b
|
|
| BLAKE2b-256 |
3834caf498e7875073a3bc6a8ec5bc5b909b6e12795f6ac24e30aa5001438d03
|
File details
Details for the file collie_mlops-0.1.2b0-py3-none-any.whl.
File metadata
- Download URL: collie_mlops-0.1.2b0-py3-none-any.whl
- Upload date:
- Size: 27.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f78290fe20378ff065d7be2fa075dd44e72aaeb7a3039162a062dbe97244b846
|
|
| MD5 |
e5704a85541c8b784431be2cb9eef9ac
|
|
| BLAKE2b-256 |
231e2dd540092b40a9e5355909814f7c936a767e5f5aaf5ba2e67291a7c00c97
|