A Python package for registering machine learning models directly to the Snowflake Model Registry, leveraging Snowflake ML capabilities.

These details have been verified by PyPI

Maintainers

aavesh_verma aditya1_singh2 maheshgadipea RakeshGadiparthi

Project description

Fosforml

Overview

The fosforml package is designed to facilitate the registration, management, and deployment of machine learning models with a focus on integration with Snowflake. It provides tools for managing datasets, model metadata, and the lifecycle of models within a Snowflake environment.

Features

Model Registration: Register models to the Snowflake Model registry with detailed metadata, including descriptions, types, and dependencies.
Dataset Management: Handle datasets within Snowflake, including creation, versioning, and deletion of dataset objects.
Metadata Management: Update model registry with descriptions and tags for better organization and retrieval.
Snowflake Session Management: Manage Snowflake sessions for executing operations within the Snowflake environment.

Installation

To install the fosforml package, ensure you have Python installed on your system and run the following command:

pip install fosforml

Usage

Register a model with the Snowflake Model Registry using the register_model function. The function supports both Snowflake and Pandas dataframes, catering to different data handling preferences.

Requirements

Snowflake DataFrame: If you are using Snowflake as your data warehouse, you must provide a Snowflake DataFrame (snowflake.snowpark.dataframe.DataFrame) that includes model feature names, labels, and output column names.
- snowflake_df: Training snowflake dataframe with required feature,label and prediction columns.
Pandas DataFrame: For users preferring local or in-memory data processing, you must upload the following as Pandas DataFrames (pandas.DataFrame):
- x_train: Training data with feature columns.
- y_train: Training data labels.
- x_test: Test data with feature columns.
- y_test: Test data labels.
- y_pred: Predicted labels for the test data.
- y_prob: Predicted probabilities for the test data classes for classification problems.
Numpy data arrays are not allowed as input datasets to register the model
dataset_name: Name of the dataset on which the model is trained.
dataset_source: Name of the source from where the dataset is pulled/created.
source: Model environment name where the model is being developed (e.g., Notebook/Experiment).

Supported Model Flavors

Currently, the framework supports the following model flavors:

Snowflake Models (snowflake): Models that are directly integrated with Snowflake, leveraging Snowflake's data processing capabilities.
Scikit-Learn Models (sklearn): Models built using the Scikit-Learn library, a widely used library for machine learning in Python.

Registering a Model

To register a model with the fosforml package, you need to provide the model object, session, and other relevant details such as the model name, description, and type.

For Snowflake Models:

from fosforml import register_model

register_model(
  model_obj=pipeline,
  session=session,
  name="MyModel",
  snowflake_df=pred_df,
  dataset_name="HR_CHURN",
  dataset_source="Dataset",
  source="Notebook",
  description="This is a Snowflake model",
  flavour="snowflake",
  model_type="classification",
  conda_dependencies=["scikit-learn==1.3.2"]
)

For Scikit-Learn Models:

from fosforml import register_model

register_model(
  model_obj=model,
  session=session,
  x_train=x_train,
  y_train=y_train,
  x_test=x_test,
  y_test=y_test,
  y_pred=y_pred,
  y_prob=y_prob,
  source="Notebook",
  dataset_name="HR_CHURN",
  dataset_source="InMemory",
  name="MyModel",
  description="This is a sklearn model",
  flavour="sklearn",
  model_type="classification",
  conda_dependencies=["scikit-learn==1.3.2"]
)

Snowflake Session Management

The SnowflakeSession class is used to manage connections to Snowflake, facilitating the execution of operations within the Snowflake environment. It provides the following features:

session: To get the Snowflake session object.
connection_params: To get the Snowflake connection parameters.

from fosforml.model_manager.snowflakesession import get_session, get_connection_params

session = get_session()
connection_params = get_connection_params()

Retrieving Model History

The ModelRegistry class provides functionalities to interact with the history of machine learning models stored in your environment. By utilizing this class, you can retrieve list of all models and their respective versions. This feature is particularly useful for tracking model evolution and managing model versions effectively.

Initializing ModelRegistry

To begin, you need to initialize the ModelRegistry class with an active session and connection parameters. These parameters are essential for establishing a connection to your data storage environment, where your models and their metadata are stored.

from fosforml.model_manager import ModelRegistry

registry = ModelRegistry(
    session=session,
    connection_params=connection_params
)

Listing All Models

To obtain a list of all models stored in your environment, use the list_models method. This method returns a list of model names, providing a quick overview of the models you have.

model_list = registry.list_models()
print("Models:", model_list)

To list model versions

For more detailed insights into a specific model's evolution, The list_model_versions method can be used. By specifying a model's name, you can retrieve a list of all versions associated with that model. This allows for easy tracking of model updates and iterations

versions_list = registry.list_model_versions(model_name='YourModelName')
print("Versions_list:",versions_list)

Managing Datasets with DatasetManager

The DatasetManager class is designed to facilitate the management of datasets associated with machine learning models in Snowflake. It allows for the creation, uploading, listing, deletion, and retrieval of datasets in a structured manner.

Initializing DatasetManager

To use DatasetManager, you need to initialize it with the model name, version, session, and connection parameters. The session and connection parameters ensure that DatasetManager can interact with the Snowflake environment where your datasets and models are stored.

from fosforml.model_manager import DatasetManager

dataset_manager = DatasetManager(
    model_name="YourModelName",
    version_name="v1",
    session=session,
    connection_params=connection_params
)

Upload datasets to a specific model version

To upload datasets to a specific model version, use the following code:

dataset_manager.upload_datasets(session=session, datasets={"x_train": snowflake_train_dataframe_,
                                                           "x_test": snowflake_test_dataframe_},
                                                            ...
                                                           )

Listing Datasets

To list all datasets associated with a specific model and version, use the list_datasets method. This method returns a list of dataset names that have been uploaded and registered under the specified model and version.

datasets = dataset_manager.list_datasets()
print("Available datasets:", datasets)

Reading Datasets

The DatasetManager provides a method to read datasets: read_dataset. This method allows you to retrieve datasets either as Pandas DataFrames or as native Snowflake query results, depending on the to_pandas parameter.

To read as a Pandas DataFrame

To read a dataset as a Pandas DataFrame, set to_pandas=True as shown below:

dataset_df = dataset_manager.read_dataset(dataset_name="x_train", to_pandas=True)
print(dataset_df.head())

To read as a Snowflake DataFrame

To read a dataset as a Snowflake DataFrame, set to_pandas=False as shown below:

dataset_result = dataset_manager.read_dataset(dataset_name="x_train", to_pandas=False)
print(dataset_result.show())

Delete datasets

To delete datasets associated with a specific model version, use the following code:

dataset_manager.remove_datasets()

Dependencies

pandas
snowflake-ml-python
requests

Ensure these dependencies are installed in your environment to use the fosforml package effectively.

For issues and contributions, please refer to the project's GitHub repository.

Additional Resources

For further assistance and examples on how to register models using fosforml, please refer to the example folder in the project repository. This folder contains Jupyter notebooks that provide step-by-step guidance on model registration and other operations.

Visit www.fosfor.com for more information.

Project details

These details have been verified by PyPI

Maintainers

aavesh_verma aditya1_singh2 maheshgadipea RakeshGadiparthi

Release history Release notifications | RSS feed

This version

1.1.8

Sep 25, 2024

1.1.8a0 pre-release

Oct 14, 2024

1.1.7

Sep 2, 2024

1.1.7a0 pre-release

Sep 18, 2024

1.1.6

Jul 26, 2024

1.1.5

Jul 23, 2024

1.1.4

Jul 19, 2024

1.1.3

Jul 18, 2024

1.1.1

Jul 17, 2024

1.1.0

Jul 15, 2024

1.0.9

Jul 12, 2024

1.0.8

Jul 12, 2024

1.0.7

Jul 12, 2024

1.0.6

Jul 11, 2024

1.0.5

Jul 11, 2024

1.0.4

Jul 11, 2024

1.0.3

Jul 10, 2024

1.0.2

Jul 8, 2024

1.0.1

Jun 19, 2024

1.0.1b1 pre-release

Jun 19, 2024

1.0.1b0 pre-release

Jun 14, 2024

1.0.0

May 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fosforml-1.1.8.tar.gz (39.4 kB view details)

Uploaded Sep 25, 2024 Source

Built Distribution

fosforml-1.1.8-py3-none-any.whl (42.4 kB view details)

Uploaded Sep 25, 2024 Python 3

File details

Details for the file fosforml-1.1.8.tar.gz.

File metadata

Download URL: fosforml-1.1.8.tar.gz
Upload date: Sep 25, 2024
Size: 39.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for fosforml-1.1.8.tar.gz
Algorithm	Hash digest
SHA256	`9f300261ef207bd13dd42f00c51caa42540e8b5ebaaa9a60abd88591931772bb`
MD5	`c99cd1cf5ce30efa57e324be7273da79`
BLAKE2b-256	`4cbcf2da08c3d172f36cd058b49d2515f062cfaf4eb4e7d97091b7e178044bed`

See more details on using hashes here.

Provenance

File details

Details for the file fosforml-1.1.8-py3-none-any.whl.

File metadata

Download URL: fosforml-1.1.8-py3-none-any.whl
Upload date: Sep 25, 2024
Size: 42.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for fosforml-1.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e203d55c8313ebc78e21deee2c92cc2ff7a313b0138467fdbd42bb3cec13362`
MD5	`1c625df98aff1f5f56f569d8081fffea`
BLAKE2b-256	`942e3613fd0ccdbf3709dec86f87fe7624737a6f08bd1a813c88e65e7352dfde`

See more details on using hashes here.

fosforml 1.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fosforml

Overview

Features

Installation

Usage

Requirements

Supported Model Flavors

Registering a Model

For Snowflake Models:

For Scikit-Learn Models:

Snowflake Session Management

Retrieving Model History

Initializing ModelRegistry

Listing All Models

To list model versions

Managing Datasets with DatasetManager

Initializing DatasetManager

Upload datasets to a specific model version

Listing Datasets

Reading Datasets

To read as a Pandas DataFrame

To read as a Snowflake DataFrame

Delete datasets

Dependencies

Additional Resources

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance