Skip to main content

An easy-to-use, no boilerplate ML experiment tracker.

Project description

Squid ML

A no-boilerplate, ease-to-use AI/ML experiment tracker.

Do you find yourself spending more time setting up MLflow and the related infrastructure, compared to actually building models and data pipelines? Squid ML is here to help - log model training runs, artifacts, metrics, and more, using just 2 lines of code!

Features

  1. Quickly set up the tracking infrastructure: This repo uses MLfLow for logging experiments, MinIO as an artifact store, and PostgreSQL as the backend store for MLflow.
  2. Easily log experiments, runs, and artifacts: Use decorators to wrap the pipeline, which can then log the model training and model evaluation metrics. Scikit-learn, PyTorch, and TensorFlow are supported as of Feb 4, 2025.

Installation

  1. Ensure that Docker and Docker Compose V2 are installed and working on your machine.
  2. Use pip to install the package.
pip install squid-ml
  1. Building the package from source (optional).
git clone https://github.com/ar-bansal/squid-ml.git

cd squid-ml
python -m build 
pip install dist/squid_ml-0.1.1-py3-none-any.whl

Usage

  1. Start the tracking server: If you already have a tracking server set up, just call mlflow.set_tracking_uri(...) with your tracking URI.
from squid import Server

# Default project_name is the current working directory's basename
tracking_server = Server(project_name="my-project")     

tracking_server.start(
    quiet=False,                # Setting it to False will print logs on the terminal/cell
    use_current_env=True        # Automatically fetches Python and MLflow versions from your currently active environment's version for the best compatibility.
    )      

(OR)

# Python and mlflow version need to be specified the first time.
tracking_server.start(
    quiet=False,                # Setting it to False will print logs on the terminal/cell
    python_version="3.10",      # Match with your environment's version for the best compatibility
    mlflow_version="2.18.0"     # Match with your environment's version for the best compatibility
    )      
  1. Use the logging decorators: While wrapping your pipeline, add *args and **kwargs as parameters. While calling the function, pass experiment_name as a keyword argument.
from squid import SklearnLogger
from sklearn.linear_model import LinearRegression


def train_model(model, X_train, y_train):
    ...

def evaluate_model(model, X_test, y_test):
    ...


# Default logging_kwargs={}
# Refer to mlflow.sklearn.autolog's documentation for more logging_kwargs
sklearn_logger = SklearnLogger(
    logging_kwargs={
        "serialization_format": mlflow.sklearn.SERIALIZATION_FORMAT_PICKLE
    }
)

@sklearn_logger.log
def run_pipeline(X_train, X_test, y_train, y_test, model, *args, **kwargs):
    # call train_model to get the model and training metrics
    # call evaluate_model to get the validation/test metrics

    # create a single dictionary with all the metrics that need to be logged

    # return model and metrics to use the decorator to log the model and metrics
    return model, metrics


def main():
    model = LinearRegression()
    X_train, X_test, y_train, y_test = ...

    model, metrics = run_pipeline(
        X_train, 
        X_test, 
        y_train, 
        y_test, 
        model, 
        experiment_name="my-experiment"
    )

Notes

By default, the following values are used as username and passwords for the PostgreSQL and MinIO containers respectively:

DB_USERNAME=dbuser
DB_PASSWORD=dbpassword

ARTIFACT_STORE_ACCESS_KEY=storeuser
ARTIFACT_STORE_SECRET_KEY=storepassword

If you'd like to use different credentials for them, simply set the environment variables using os.

import os 

os.environ["DB_USERNAME"] = mynewuser
os.environ["DB_PASSWORD"] = mynewpassword

os.environ["ARTIFACT_STORE_ACCESS_KEY"] = mynewuser
os.environ["ARTIFACT_STORE_SECRET_KEY"] = mynewpassword

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

squid_ml-0.1.2.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

squid_ml-0.1.2-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file squid_ml-0.1.2.tar.gz.

File metadata

  • Download URL: squid_ml-0.1.2.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for squid_ml-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6f7aa190c8848a8b821e2f44f44a62c0ac3e170f036dc2e1c4c4518c24492be2
MD5 9adbb471a6d90688b5d7ba97479efcbe
BLAKE2b-256 fff16cd1894dbd58c7eb7cc69670c7ebb4d3f15b0cb443de4a0acc8f8c371aa4

See more details on using hashes here.

File details

Details for the file squid_ml-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: squid_ml-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for squid_ml-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8058ddfcadbaee5455341a145265fa0dfd765c76f4e25d15b95010fae775e838
MD5 a78066af956ebd269690bbe4d716fa70
BLAKE2b-256 ca1f9eb3faf07b7d91f1b52aab5295f6fe4ba28600c06d3efb99b446b4135677

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page