Skip to main content

Python MLOPs quality control tooling for your production ML workflows

Project description


opsml logo

Universal Artifact Registration System for Machine Learning

OpsML Documentation

Tests Examples Style Ruff Py-Versions Checked with mypy Pydantic v2 gitleaks

What is it?

OpsML provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.

Features:

  • Simple Design: Standardized design that can easily be incorporated into existing projects.

  • Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think trading cards for machine learning.

  • Type Checking: Strongly typed and type checking for data and model artifacts.

  • Support: Robust support for a variety of ML and data libraries.

  • Automation: Automated processes including onnx model conversion, metadata creation and production packaging.

Incorporate into Existing Workflows

Add quality control to your ML projects with little effort! With opsml, data and models are added to interfaces and cards, which are then registered via card registries.

Incorporate into Existing Workflows

Given its simple and modular design, opsml can be easily incorporated into existing workflows.


opsml logo

Installation:

Poetry

poetry add opsml

Pip

pip install opsml

Setup your local environment:

By default, opsml will log artifacts and experiments locally. To change this behavior and log to a remote server, you'll need to set the following environment variables:

export OPSML_TRACKING_URI=${YOUR_TRACKING_URI}

Quickstart

If running the example below locally without a server, make sure to install the server extra:

poetry add "opsml[server]"
# imports
from sklearn.linear_model import LinearRegression
from opsml import (
    CardInfo,
    CardRegistries,
    DataCard,
    DataSplit,
    ModelCard,
    PandasData,
    SklearnModel,
)
from opsml.helpers.data import create_fake_data


info = CardInfo(name="linear-regression", repository="opsml", user_email="user@email.com")
registries = CardRegistries()


#--------- Create DataCard ---------#

# create fake data
X, y = create_fake_data(n_samples=1000, task_type="regression")
X["target"] = y

# Create data interface
data_interface = PandasData(
    data=X,
    data_splits=[
        DataSplit(label="train", column_name="col_1", column_value=0.5, inequality=">="),
        DataSplit(label="test", column_name="col_1", column_value=0.5, inequality="<"),
    ],
    dependent_vars=["target"],
)

# Create and register datacard
datacard = DataCard(interface=data_interface, info=info)
registries.data.register_card(card=datacard)

#--------- Create ModelCard ---------#

# split data
data = datacard.split_data()

# fit model
reg = LinearRegression()
reg.fit(data["train"].X.to_numpy(), data["train"].y.to_numpy())

# create model interface
interface = SklearnModel(
    model=reg,
    sample_data=data["train"].X.to_numpy(),
    task_type="regression",  # optional
)

# create modelcard
modelcard = ModelCard(
    interface=interface,
    info=info,
    to_onnx=True,  # lets convert onnx
    datacard_uid=datacard.uid,  # modelcards must be associated with a datacard
)
registries.model.register_card(card=modelcard)

Table of Contents

Usage

Now that opsml is installed, you're ready to start using it!

It's time to point you to the official Documentation Website for more information on how to use opsml

Advanced Installation Scenarios

Opsml is designed to work with a variety of 3rd-party integrations depending on your use-case.

Types of extras that can be installed:

  • Postgres: Installs postgres pyscopg2 dependency to be used with Opsml

    poetry add "opsml[postgres]"
    
  • Server: Installs necessary packages for setting up a Fastapi-based Opsml server

    poetry add "opsml[server]"
    
  • GCP with mysql: Installs mysql and gcsfs to be used with Opsml

    poetry add "opsml[gcs,mysql]"
    
  • GCP with mysql(cloud-sql): Installs mysql and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_mysql]"
    
  • GCP with postgres: Installs postgres and gcsgs to be used with Opsml

    poetry add "opsml[gcs,postgres]"
    
  • GCP with postgres(cloud-sql): Installs postgres and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_postgres]"
    
  • AWS with postgres: Installs postgres and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,postgres]"
    
  • AWS with mysql: Installs mysql and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,mysql]"
    

Environment Variables

The following environment variables are used to configure opsml. When using opsml as a client (i.e., not running a server), the only variable that must be set is OPSML_TRACKING_URI.

Name Description
APP_ENV The environment to use. Supports development, staging, and production
GOOGLE_ACCOUNT_JSON_BASE64 The base64 string of the the GCP service account to use.
OPSML_MAX_OVERFLOW The SQL "max_overflow" size. Defaults to 5
OPSML_POOL_SIZE The SQL connection pool size. Defaults to 10.
OPSML_STORAGE_URI The location of storage to use. Supports a local file system, AWS, and GCS. Example: gs://some-bucket
OPSML_TRACKING_URI Used when logging artifacts to an opsml server (a.k.a., the server which "tracks" artifacts)
OPSML_USERNAME An optional server username. If the server is setup with login enabled, all clients must use HTTP basic auth with this username
OPSML_PASSWORD An optional server password. If the server is setup with login enabled, all clients must use HTTP basic auth with this password
OPSML_RUN_ID If set, the run will be automatically loaded when creating new cards.

Supported Libraries

Opsml is designed to work with a variety of ML and data libraries. The following libraries are currently supported:

Data Libraries

Name Opsml Implementation
Pandas PandasData
Polars PolarsData
Torch TorchData
Arrow ArrowData
Numpy NumpyData
Sql SqlData
Text TextDataset
Image ImageDataset

Model Libraries

Name Opsml Implementation Example
Sklearn SklearnModel link
LightGBM LightGBMModel link
XGBoost XGBoostModel link
CatBoost CatBoostModel link
Torch TorchModel link
Torch Lightning LightningModel link
TensorFlow TensorFlowModel link
HuggingFace HuggingFaceModel link
Vowpal Wabbit VowpalWabbitModel link

Contributing

If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap section in the docs and get started :smiley:

Thanks goes to these phenomenal projects and people for creating a great foundation to build from!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsml-2.2.2.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

opsml-2.2.2-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file opsml-2.2.2.tar.gz.

File metadata

  • Download URL: opsml-2.2.2.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for opsml-2.2.2.tar.gz
Algorithm Hash digest
SHA256 468d70542590110084d8523c0d4379fbe616bac888dbf3de6d5b8b478a6c488a
MD5 5ac61dc31fa8e6800830eb7494fd6764
BLAKE2b-256 5993d97878e2a78df9fbac9b9a80f48abef760aa7e0ec82682a322da7d3e8b3e

See more details on using hashes here.

File details

Details for the file opsml-2.2.2-py3-none-any.whl.

File metadata

  • Download URL: opsml-2.2.2-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for opsml-2.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7d9c344d11950db394256ddcf9a312a2a3c5214b85d741965623e582d3a6abf2
MD5 b61797a297fae68c5142cc31eae79158
BLAKE2b-256 593b0ec282c7d644a84ac3b014de2f726b8baa15d12148100731623d8def7f0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page