Skip to main content

Python MLOPs quality control tooling for your production ML workflows

Project description


opsml logo

Universal Artifact Registration System for Machine Learning

OpsML Documentation

Tests Examples Style Ruff Py-Versions Checked with mypy Pydantic v2 gitleaks

What is it?

OpsML provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.

Features:

  • Simple Design: Standardized design that can easily be incorporated into existing projects.

  • Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think trading cards for machine learning.

  • Type Checking: Strongly typed and type checking for data and model artifacts.

  • Support: Robust support for a variety of ML and data libraries.

  • Automation: Automated processes including onnx model conversion, metadata creation and production packaging.

Incorporate into Existing Workflows

Add quality control to your ML projects with little effort! With opsml, data and models are added to interfaces and cards, which are then registered via card registries.

Incorporate into Existing Workflows

Given it's simple and modular design, opsml can be easily incorporated into existing workflows.


opsml logo

Installation:

Poetry

poetry add opsml

Pip

pip install opsml

Setup your local environment:

By default, opsml will log artifacts and experiments locally. To change this behavior and log to a remote server, you'll need to set the following environment variables:

export OPSML_TRACKING_URI=${YOUR_TRACKING_URI}

Quickstart

If running the example below locally without a server, make sure to install the server extra:

poetry add "opsml[server]"
# imports
from sklearn.linear_model import LinearRegression
from opsml import (
    CardInfo,
    CardRegistries,
    DataCard,
    DataSplit,
    ModelCard,
    PandasData,
    SklearnModel,
)
from opsml.helpers.data import create_fake_data


info = CardInfo(name="linear-regression", repository="opsml", user_email="user@email.com")
registries = CardRegistries()


#--------- Create DataCard ---------#

# create fake data
X, y = create_fake_data(n_samples=1000, task_type="regression")
X["target"] = y

# Create data interface
data_interface = PandasData(
    data=X,
    data_splits=[
        DataSplit(label="train", column_name="col_1", column_value=0.5, inequality=">="),
        DataSplit(label="test", column_name="col_1", column_value=0.5, inequality="<"),
    ],
    dependent_vars=["target"],
)

# Create and register datacard
datacard = DataCard(interface=data_interface, info=info)
registries.data.register_card(card=datacard)

#--------- Create ModelCard ---------#

# split data
data = datacard.split_data()

# fit model
reg = LinearRegression()
reg.fit(data["train"].X.to_numpy(), data["train"].y.to_numpy())

# create model interface
interface = SklearnModel(
    model=reg,
    sample_data=data["train"].X.to_numpy(),
    task_type="regression",  # optional
)

# create modelcard
modelcard = ModelCard(
    interface=interface,
    info=info,
    to_onnx=True,  # lets convert onnx
    datacard_uid=datacard.uid,  # modelcards must be associated with a datacard
)
registries.model.register_card(card=modelcard)

Table of Contents

Usage

Now that opsml is installed, you're ready to start using it!

It's time to point you to the official Documentation Website for more information on how to use opsml

Advanced Installation Scenarios

Opsml is designed to work with a variety of 3rd-party integrations depending on your use-case.

Types of extras that can be installed:

  • Postgres: Installs postgres pyscopg2 dependency to be used with Opsml

    poetry add "opsml[postgres]"
    
  • Server: Installs necessary packages for setting up a Fastapi-based Opsml server

    poetry add "opsml[server]"
    
  • GCP with mysql: Installs mysql and gcsfs to be used with Opsml

    poetry add "opsml[gcs,mysql]"
    
  • GCP with mysql(cloud-sql): Installs mysql and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_mysql]"
    
  • GCP with postgres: Installs postgres and gcsgs to be used with Opsml

    poetry add "opsml[gcs,postgres]"
    
  • GCP with postgres(cloud-sql): Installs postgres and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_postgres]"
    
  • AWS with postgres: Installs postgres and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,postgres]"
    
  • AWS with mysql: Installs mysql and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,mysql]"
    

Environment Variables

The following environment variables are used to configure opsml. When using opsml as a client (i.e., not running a server), the only variable that must be set is OPSML_TRACKING_URI.

Name Description
APP_ENV The environment to use. Supports development, staging, and production
GOOGLE_ACCOUNT_JSON_BASE64 The base64 string of the the GCP service account to use.
OPSML_MAX_OVERFLOW The SQL "max_overflow" size. Defaults to 5
OPSML_POOL_SIZE The SQL connection pool size. Defaults to 10.
OPSML_STORAGE_URI The location of storage to use. Supports a local file system, AWS, and GCS. Example: gs://some-bucket
OPSML_TRACKING_URI Used when logging artifacts to an opsml server (a.k.a., the server which "tracks" artifacts)
OPSML_USERNAME An optional server username. If the server is setup with login enabled, all clients must use HTTP basic auth with this username
OPSML_PASSWORD An optional server password. If the server is setup with login enabled, all clients must use HTTP basic auth with this password
OPSML_RUN_ID If set, the run will be automatically loaded when creating new cards.

Supported Libraries

Opsml is designed to work with a variety of ML and data libraries. The following libraries are currently supported:

Data Libraries

Name Opsml Implementation
Pandas PandasData
Polars PolarsData
Torch TorchData
Arrow ArrowData
Numpy NumpyData
Sql SqlData
Text TextDataset
Image ImageDataset

Model Libraries

Name Opsml Implementation Example
Sklearn SklearnModel link
LightGBM LightGBMModel link
XGBoost XGBoostModel link
CatBoost CatBoostModel link
Torch TorchModel link
Torch Lightning LightningModel link
TensorFlow TensorFlowModel link
HuggingFace HuggingFaceModel link
Vowpal Wabbit VowpalWabbitModel link

Contributing

If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap section in the docs and get started :smiley:

Thanks goes to these phenomenal projects and people and people for creating a great foundation to build from!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsml-2.1.8.tar.gz (2.6 MB view hashes)

Uploaded Source

Built Distribution

opsml-2.1.8-py3-none-any.whl (2.7 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page