Skip to main content

A Python library to integrate IBM COS with MLFlow artifact registry

Project description

MLflow IBM COS Registry

A Python library that integrates IBM Cloud Object Storage (COS) with MLflow for model registry capabilities. This package provides an extended MLflow artifact repository implementation that leverages IBM COS for storing, versioning, and retrieving machine learning models.

Features

  • Store and manage ML models in IBM Cloud Object Storage
  • Versioning with model fingerprinting
  • Specialized support for "latest" model version
  • Efficient caching to avoid redundant downloads
  • Integration with MLflow's PyFunc model flavor

Installation

Install the package using pip or any other package manager:

pip install mlflow-ibmcos

Or install from source:

git clone https://github.com/donielix/mlflow-ibm-cos-registry.git
cd mlflow-ibm-cos-registry
pip install -e .

Requirements

  • Python 3.8 or later
  • IBM Cloud Object Storage account
  • MLflow 2.15.0 or later

Quick Start

from mlflow_ibmcos import COSModelRegistry

# Initialize registry
registry = COSModelRegistry(
    bucket="my-model-bucket",
    model_name="text-classifier",
    model_version="latest",
    endpoint_url="https://s3.us-south.cloud-object-storage.appdomain.cloud",
    aws_access_key_id="your-access-key",
    aws_secret_access_key="your-secret-key"
)

# Log a model
registry.log_pyfunc_model_as_code(
    model_code_path="path/to/model_code.py",
    artifacts={"model": "path/to/model.pkl"}
)

# Download a model
local_path = registry.download_artifacts(dst_path="models")

# Load a model
model = registry.load_model(local_path)

# Make predictions
predictions = model.predict(data)

Authentication

The registry requires IBM COS credentials which can be provided in several ways:

  1. Direct parameters:

    registry = COSModelRegistry(
        # Required parameters
        model_name="my-model",
        model_version="1.0.0",
        # Authentication parameters
        bucket="my-bucket",
        endpoint_url="https://s3.example.com",
        aws_access_key_id="your-access-key",
        aws_secret_access_key="your-secret-key"
    )
    
  2. Environment variables:

    export AWS_ENDPOINT_URL="https://s3.example.com"
    export AWS_ACCESS_KEY_ID="your-access-key"
    export AWS_SECRET_ACCESS_KEY="your-secret-key"
    export COS_BUCKET_NAME="my-bucket"
    
    registry = COSModelRegistry(
        model_name="my-model",
        model_version="1.0.0",
    )
    

Usage Examples

Uploading Models

Log a PyFunc Model as Code

# Upload a model defined in a Python file
registry.log_pyfunc_model_as_code(
    model_code_path="path/to/model_code.py",
    artifacts={
        "model": "path/to/model.pkl",
        "encoder": "path/to/encoder.pkl"
    }
)

Log Model Artifacts Directly

# Upload model artifacts from a directory
registry.log_artifacts(local_dir="path/to/model_directory")

Downloading Models

# Download model artifacts to a specified directory
model_path = registry.download_artifacts(dst_path="models")

# Download and delete other versions
model_path = registry.download_artifacts(
    dst_path="models",
    delete_other_versions=True,
)

Working with Model Versions

Using the "latest" Tag

The "latest" tag is special and allows you to continually update a model:

registry = COSModelRegistry(
    model_name="my-model",
    model_version="latest",
    # authentication parameters...
)

# Each time you log artifacts, it will update the "latest" version
registry.log_artifacts("path/to/model_dir")

When downloading a model with the "latest" tag, the registry will automatically fetch updates if the remote fingerprint differs from the local one.

Using Version Numbers

For stable versioning:

registry = COSModelRegistry(
    model_name="my-model",
    model_version="1.0.0",  # Semantic versioning recommended
    # authentication parameters...
)

Version-tagged models won't be overwritten when uploaded again - you'll need to use a different version or the "latest" tag.

Deleting Models

# Initialize registry pointing to the model version to delete
registry = COSModelRegistry(
    model_name="my-model",
    model_version="1.0.0",
    # authentication parameters...
)

# Delete the model (requires confirmation)
registry.delete_model_version(confirm=True)

API Reference

COSModelRegistry

The main class for interacting with the IBM COS model registry.

COSModelRegistry(
    model_name: str,
    model_version: str,
    bucket: Optional[str] = None,
    prefix: Optional[str] = None,
    **kwargs
)

Parameters:

  • model_name: Name of the model
  • model_version: Version of the model (can be a semantic version or "latest")
  • bucket: IBM COS bucket name. If not provided, it will be fetched from COS_BUCKET_NAME environment variable
  • prefix: Custom prefix for storage path (defaults to "traductor/registry")
  • **kwargs: Additional parameters including:
    • endpoint_url: IBM COS endpoint URL
    • aws_access_key_id: Access key for IBM COS
    • aws_secret_access_key: Secret key for IBM COS
    • config: Additional configuration for the S3 client

Main Methods:

  • log_pyfunc_model_as_code(model_code_path, artifacts=None, **kwargs): Log a PyFunc model
  • log_artifacts(local_dir, artifact_path=None): Log model artifacts
  • download_artifacts(artifact_path=None, dst_path=None, delete_other_versions=False): Download model artifacts
  • load_model(model_local_path, **kwargs): Load a downloaded model
  • delete_model_version(confirm=False): Delete a model version

Fingerprinting

The registry uses fingerprinting to track model changes and optimize downloads:

  • A SHA-512 hash of the model directory is created when logging a model
  • When downloading, the fingerprints are compared to avoid redundant downloads
  • For "latest" models, differences in fingerprints trigger automatic updates

Development

Setting Up Development Environment

  1. Clone the repository
  2. Install development dependencies:
    pip install -e ".[dev]"
    
  3. Install pre-commit hooks:
    pre-commit install
    

Running Tests

pytest tests/

For coverage report:

pytest --cov=mlflow_ibmcos tests/

Contact

For issues, questions, or contributions, please contact:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_ibmcos-0.1.5.tar.gz (98.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_ibmcos-0.1.5-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_ibmcos-0.1.5.tar.gz.

File metadata

  • Download URL: mlflow_ibmcos-0.1.5.tar.gz
  • Upload date:
  • Size: 98.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for mlflow_ibmcos-0.1.5.tar.gz
Algorithm Hash digest
SHA256 5788acc550b658e636fa42d76ef5bde0bae42baca066b6fd6cb05fe82a8b90b4
MD5 8eac36be1fa15ce8bd42ebdd1e7cb6bb
BLAKE2b-256 2a51a8ca55f5f39031aaa0e17572c44c1d7f6e6d90e2afc746352af816cdcbaa

See more details on using hashes here.

File details

Details for the file mlflow_ibmcos-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for mlflow_ibmcos-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 62d15a9dbd6551b967a10201c864af58566f63e462cbd7ab35a2d042ccda1be1
MD5 198663547ffa32ede0687b1a9e3f7960
BLAKE2b-256 4a6ff0b9131b77655d5b305389e76848ce20a663f56875c812f7cdebf139dddd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page