Skip to main content

Python MLOPs quality control tooling for your production ML workflows

Project description


opsml logo

Universal Artifact Registration System for Machine Learning

OpsML Documentation

Tests Examples Style Ruff Py-Versions Checked with mypy Pydantic v2 gitleaks

What is it?

OpsML provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.

Features:

  • Simple Design: Standardized design that can easily be incorporated into existing projects.

  • Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think trading cards for machine learning.

  • Type Checking: Strongly typed and type checking for data and model artifacts.

  • Support: Robust support for a variety of ML and data libraries.

  • Automation: Automated processes including onnx model conversion, metadata creation and production packaging.

Incorporate into Existing Workflows

Add quality control to your ML projects with little effort! With opsml, data and models are added to interfaces and cards, which are then registered via card registries.

Incorporate into Existing Workflows

Given its simple and modular design, opsml can be easily incorporated into existing workflows.


opsml logo

Installation:

Poetry

poetry add opsml

Pip

pip install opsml

Setup your local environment:

By default, opsml will log artifacts and experiments locally. To change this behavior and log to a remote server, you'll need to set the following environment variables:

export OPSML_TRACKING_URI=${YOUR_TRACKING_URI}

Quickstart

If running the example below locally without a server, make sure to install the server extra:

poetry add "opsml[server]"
# imports
from sklearn.linear_model import LinearRegression
from opsml import (
    CardInfo,
    CardRegistries,
    DataCard,
    DataSplit,
    ModelCard,
    PandasData,
    SklearnModel,
)
from opsml.helpers.data import create_fake_data


info = CardInfo(name="linear-regression", repository="opsml", user_email="user@email.com")
registries = CardRegistries()


#--------- Create DataCard ---------#

# create fake data
X, y = create_fake_data(n_samples=1000, task_type="regression")
X["target"] = y

# Create data interface
data_interface = PandasData(
    data=X,
    data_splits=[
        DataSplit(label="train", column_name="col_1", column_value=0.5, inequality=">="),
        DataSplit(label="test", column_name="col_1", column_value=0.5, inequality="<"),
    ],
    dependent_vars=["target"],
)

# Create and register datacard
datacard = DataCard(interface=data_interface, info=info)
registries.data.register_card(card=datacard)

#--------- Create ModelCard ---------#

# split data
data = datacard.split_data()

# fit model
reg = LinearRegression()
reg.fit(data["train"].X.to_numpy(), data["train"].y.to_numpy())

# create model interface
interface = SklearnModel(
    model=reg,
    sample_data=data["train"].X.to_numpy(),
    task_type="regression",  # optional
)

# create modelcard
modelcard = ModelCard(
    interface=interface,
    info=info,
    to_onnx=True,  # lets convert onnx
    datacard_uid=datacard.uid,  # modelcards must be associated with a datacard
)
registries.model.register_card(card=modelcard)

Table of Contents

Usage

Now that opsml is installed, you're ready to start using it!

It's time to point you to the official Documentation Website for more information on how to use opsml

Advanced Installation Scenarios

Opsml is designed to work with a variety of 3rd-party integrations depending on your use-case.

Types of extras that can be installed:

  • Postgres: Installs postgres pyscopg2 dependency to be used with Opsml

    poetry add "opsml[postgres]"
    
  • Server: Installs necessary packages for setting up a Fastapi-based Opsml server

    poetry add "opsml[server]"
    
  • GCP with mysql: Installs mysql and gcsfs to be used with Opsml

    poetry add "opsml[gcs,mysql]"
    
  • GCP with mysql(cloud-sql): Installs mysql and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_mysql]"
    
  • GCP with postgres: Installs postgres and gcsgs to be used with Opsml

    poetry add "opsml[gcs,postgres]"
    
  • GCP with postgres(cloud-sql): Installs postgres and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_postgres]"
    
  • AWS with postgres: Installs postgres and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,postgres]"
    
  • AWS with mysql: Installs mysql and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,mysql]"
    

Environment Variables

The following environment variables are used to configure opsml. When using opsml as a client (i.e., not running a server), the only variable that must be set is OPSML_TRACKING_URI.

Name Description
APP_ENV The environment to use. Supports development, staging, and production
GOOGLE_ACCOUNT_JSON_BASE64 The base64 string of the the GCP service account to use.
OPSML_MAX_OVERFLOW The SQL "max_overflow" size. Defaults to 5
OPSML_POOL_SIZE The SQL connection pool size. Defaults to 10.
OPSML_STORAGE_URI The location of storage to use. Supports a local file system, AWS, and GCS. Example: gs://some-bucket
OPSML_TRACKING_URI Used when logging artifacts to an opsml server (a.k.a., the server which "tracks" artifacts)
OPSML_USERNAME An optional server username. If the server is setup with login enabled, all clients must use HTTP basic auth with this username
OPSML_PASSWORD An optional server password. If the server is setup with login enabled, all clients must use HTTP basic auth with this password
OPSML_RUN_ID If set, the run will be automatically loaded when creating new cards.

Supported Libraries

Opsml is designed to work with a variety of ML and data libraries. The following libraries are currently supported:

Data Libraries

Name Opsml Implementation
Pandas PandasData
Polars PolarsData
Torch TorchData
Arrow ArrowData
Numpy NumpyData
Sql SqlData
Text TextDataset
Image ImageDataset

Model Libraries

Name Opsml Implementation Example
Sklearn SklearnModel link
LightGBM LightGBMModel link
XGBoost XGBoostModel link
CatBoost CatBoostModel link
Torch TorchModel link
Torch Lightning LightningModel link
TensorFlow TensorFlowModel link
HuggingFace HuggingFaceModel link
Vowpal Wabbit VowpalWabbitModel link

Contributing

If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap section in the docs and get started :smiley:

Thanks goes to these phenomenal projects and people for creating a great foundation to build from!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opsml-2.2.4.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

opsml-2.2.4-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file opsml-2.2.4.tar.gz.

File metadata

  • Download URL: opsml-2.2.4.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for opsml-2.2.4.tar.gz
Algorithm Hash digest
SHA256 fe4446008d9be0a21f51370a6561873725110ab1d96b7f241d31815bd264b5f0
MD5 7f82924a72b1c6aec333d63e59a9f496
BLAKE2b-256 0550089c6f1596bfdc1c114eadf09d032b03752615d59e5efdc1a8ebecf42dd0

See more details on using hashes here.

File details

Details for the file opsml-2.2.4-py3-none-any.whl.

File metadata

  • Download URL: opsml-2.2.4-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for opsml-2.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3b608d65351cf6bd1f1bbadc8056276e233dc673b3bf16d1769981e7321f18e6
MD5 db01d502a75bfd2ce8c88f6aec1c8538
BLAKE2b-256 99b9e42c92a003d9d3230947682d1747c23dfb07d5c28f03d3df704dd544b113

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page