Python MLOPs quality control tooling for your production ML workflows
Project description
Universal Artifact Registration System for Machine Learning
OpsML Documentation
What is it?
OpsML
provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.
Features:
-
Simple Design: Standardized design that can easily be incorporated into existing projects.
-
Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think
trading cards for machine learning
. -
Type Checking: Strongly typed and type checking for data and model artifacts.
-
Support: Robust support for a variety of ML and data libraries.
-
Automation: Automated processes including onnx model conversion, metadata creation and production packaging.
Incorporate into Existing Workflows
Add quality control to your ML projects with little effort! With opsml
, data and models are added to interfaces and cards, which are then registered via card registries.
Incorporate into Existing Workflows
Given its simple and modular design, opsml
can be easily incorporated into existing workflows.
Installation:
Poetry
poetry add opsml
Pip
pip install opsml
Setup your local environment:
By default, opsml
will log artifacts and experiments locally. To change this behavior and log to a remote server, you'll need to set the following environment variables:
export OPSML_TRACKING_URI=${YOUR_TRACKING_URI}
Quickstart
If running the example below locally without a server, make sure to install the server
extra:
poetry add "opsml[server]"
# imports
from sklearn.linear_model import LinearRegression
from opsml import (
CardInfo,
CardRegistries,
DataCard,
DataSplit,
ModelCard,
PandasData,
SklearnModel,
)
from opsml.helpers.data import create_fake_data
info = CardInfo(name="linear-regression", repository="opsml", user_email="user@email.com")
registries = CardRegistries()
#--------- Create DataCard ---------#
# create fake data
X, y = create_fake_data(n_samples=1000, task_type="regression")
X["target"] = y
# Create data interface
data_interface = PandasData(
data=X,
data_splits=[
DataSplit(label="train", column_name="col_1", column_value=0.5, inequality=">="),
DataSplit(label="test", column_name="col_1", column_value=0.5, inequality="<"),
],
dependent_vars=["target"],
)
# Create and register datacard
datacard = DataCard(interface=data_interface, info=info)
registries.data.register_card(card=datacard)
#--------- Create ModelCard ---------#
# split data
data = datacard.split_data()
# fit model
reg = LinearRegression()
reg.fit(data["train"].X.to_numpy(), data["train"].y.to_numpy())
# create model interface
interface = SklearnModel(
model=reg,
sample_data=data["train"].X.to_numpy(),
task_type="regression", # optional
)
# create modelcard
modelcard = ModelCard(
interface=interface,
info=info,
to_onnx=True, # lets convert onnx
datacard_uid=datacard.uid, # modelcards must be associated with a datacard
)
registries.model.register_card(card=modelcard)
Table of Contents
Usage
Now that opsml
is installed, you're ready to start using it!
It's time to point you to the official Documentation Website for more information on how to use opsml
Advanced Installation Scenarios
Opsml
is designed to work with a variety of 3rd-party integrations depending on your use-case.
Types of extras that can be installed:
-
Postgres: Installs postgres pyscopg2 dependency to be used with
Opsml
poetry add "opsml[postgres]"
-
Server: Installs necessary packages for setting up a
Fastapi
-basedOpsml
serverpoetry add "opsml[server]"
-
GCP with mysql: Installs mysql and gcsfs to be used with
Opsml
poetry add "opsml[gcs,mysql]"
-
GCP with mysql(cloud-sql): Installs mysql and cloud-sql gcp dependencies to be used with
Opsml
poetry add "opsml[gcp_mysql]"
-
GCP with postgres: Installs postgres and gcsgs to be used with
Opsml
poetry add "opsml[gcs,postgres]"
-
GCP with postgres(cloud-sql): Installs postgres and cloud-sql gcp dependencies to be used with
Opsml
poetry add "opsml[gcp_postgres]"
-
AWS with postgres: Installs postgres and s3fs dependencies to be used with
Opsml
poetry add "opsml[s3,postgres]"
-
AWS with mysql: Installs mysql and s3fs dependencies to be used with
Opsml
poetry add "opsml[s3,mysql]"
Environment Variables
The following environment variables are used to configure opsml. When using
opsml as a client (i.e., not running a server), the only variable that must be
set is OPSML_TRACKING_URI
.
Name | Description |
---|---|
APP_ENV | The environment to use. Supports development , staging , and production |
GOOGLE_ACCOUNT_JSON_BASE64 | The base64 string of the the GCP service account to use. |
OPSML_MAX_OVERFLOW | The SQL "max_overflow" size. Defaults to 5 |
OPSML_POOL_SIZE | The SQL connection pool size. Defaults to 10. |
OPSML_STORAGE_URI | The location of storage to use. Supports a local file system, AWS, and GCS. Example: gs://some-bucket |
OPSML_TRACKING_URI | Used when logging artifacts to an opsml server (a.k.a., the server which "tracks" artifacts) |
OPSML_USERNAME | An optional server username. If the server is setup with login enabled, all clients must use HTTP basic auth with this username |
OPSML_PASSWORD | An optional server password. If the server is setup with login enabled, all clients must use HTTP basic auth with this password |
OPSML_RUN_ID | If set, the run will be automatically loaded when creating new cards. |
Supported Libraries
Opsml
is designed to work with a variety of ML and data libraries. The following libraries are currently supported:
Data Libraries
Name | Opsml Implementation |
---|---|
Pandas | PandasData |
Polars | PolarsData |
Torch | TorchData |
Arrow | ArrowData |
Numpy | NumpyData |
Sql | SqlData |
Text | TextDataset |
Image | ImageDataset |
Model Libraries
Name | Opsml Implementation | Example |
---|---|---|
Sklearn | SklearnModel |
link |
LightGBM | LightGBMModel |
link |
XGBoost | XGBoostModel |
link |
CatBoost | CatBoostModel |
link |
Torch | TorchModel |
link |
Torch Lightning | LightningModel |
link |
TensorFlow | TensorFlowModel |
link |
HuggingFace | HuggingFaceModel |
link |
Vowpal Wabbit | VowpalWabbitModel |
link |
Contributing
If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap
section in the docs and get started :smiley:
Thanks goes to these phenomenal projects and people for creating a great foundation to build from!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file opsml-2.2.4.tar.gz
.
File metadata
- Download URL: opsml-2.2.4.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe4446008d9be0a21f51370a6561873725110ab1d96b7f241d31815bd264b5f0 |
|
MD5 | 7f82924a72b1c6aec333d63e59a9f496 |
|
BLAKE2b-256 | 0550089c6f1596bfdc1c114eadf09d032b03752615d59e5efdc1a8ebecf42dd0 |
File details
Details for the file opsml-2.2.4-py3-none-any.whl
.
File metadata
- Download URL: opsml-2.2.4-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b608d65351cf6bd1f1bbadc8056276e233dc673b3bf16d1769981e7321f18e6 |
|
MD5 | db01d502a75bfd2ce8c88f6aec1c8538 |
|
BLAKE2b-256 | 99b9e42c92a003d9d3230947682d1747c23dfb07d5c28f03d3df704dd544b113 |