Skip to main content

Python SDK for MLOps

Project description


Docs Latest License PYPI

Katonic Python SDK for Complete ML Model Life Cycle.

Katonic Python SDK is a comprehensive package to perform all the Machine Learning and Data Science related operations.

For a complete list of APIs and examples, please take a look at the Python API Reference in Examples

Minimum Requirements

  • Katonic Platform 3.2 or Higher.
  • Python 3.7 or Higher.

Download using pip for the base package.

pip install katonic

The topics in this page:

  • Connectors
  • Filemanager
  • Feature Store
  • Experiment Operations
  • Registry Operations
  • Pipeline Operations
  • Drift

Connectors

A typical AI model life cycle starts with loading the data into your workspace and analyzing it to discover useful insights. for that you can use Katonic's SDK, there are several connectors inside it you can use to load the data and put it where ever you want to work with. Ex. Azure blob, MySql, Postgres etc.

Install Connectors.

pip install katonic[connectors]

Connector example to get the data from SNOWFLAKE.

snowflake-credentials.json

{
"USER": "username",
"PASSWORD": "password",
"ACCOUNT": "id.uae-north.azure",
"DATABASE": "SNOWFLAKE_SAMPLE_DATA",
"TABLE_NAME": "CUSTOMER",
"SCHEMA": "TPCH_SF1",
"WAREHOUSE": "COMPUTE_WH"
}
# Define all your configurations inside a JSON file.
import json

with open('snowflake-credentials.json') as f:
    config = json.load(f)

Initializing the SnowFlakeConnector with the provided credentials and configuration.

from katonic.connectors.python.snowflake import SnowFlakeConnector

df = SnowFlakeConnector(
    user=config["USER"],
    password=config["PASSWORD"],
    account=config["ACCOUNT"],
    database=config["DATABASE"],
    table_name=config["TABLE_NAME"],
    schema=config["SCHEMA"],
    warehouse=config["WAREHOUSE"],
    query="SELECT * FROM TPCH_SF1.CUSTOMER",
    output="local",
    file_name="driver_data",
)
df.get_data()
======== OUTPUT ========
Connection to snowflake stablished Successfully.
File saved to your 'local' file system with name 'snowflake_TPCH_SF1_SNOWFLAKE_SAMPLE_DATA_driver_data_2022_04_20_08_46_38.csv' Successfully.

Filemanager

Once getting the data you can use Katonic Filemanager to Get, Store and Update or manipulate Objects within the file manager with Katonic SDK.

Install Filemanager.

pip install katonic[filemanager]

Filemanager example to put/move the object from filemanager's public bucket to private bucket.

filemanager-credentials.json

{
"ACCESS_KEY":"TV6WFGHTR3TFBIBAO0R",
"SECRET_KEY":"BoW+p+iLAMNS4cbUNsSLVEmscITdTDMLXC8Emfz",
"PRIVATE_BUCKET":"private-storage-6583",
"PUBLIC_BUCKET":"shared-storage",
}
# Define all your configurations inside a JSON file.
import json

with open('filemanager-credentials.json') as f:
    config = json.load(f)

Initializing the Filemanager with the provided credentials and configuration.

from katonic.filemanager.session import Filemanager

fm = Filemanager(
    access_key=config["ACCESS_KEY"],
    secret_key=config["SECRET_KEY"],
)

client = fm.client()
client.fput_object(
    config["BUCKET"],
    "/home/data/sample-file.txt",
    "/data/sample-file.txt"
)

Feature Store

Once you loaded all the necessary data that you want to work with. You'll do the preprocessing of it. Which consists of Handling the missing values, Removing the Outliers, Scaling the Data and Encoding the features etc. Once you've finished preprocessing the data. You need to ingest the data into a Feature store.

By uploading the clean data to a feature store, you can share it across the organization. So that other teams and data scientist working on the same problem can make use of it. By this way you can achieve Feature Reusability.

Training models and making predictions from the Feature store data will improve the consistency between the training data and serving data otherwise it will lead to training-serving skew.

Install Feature Store.

pip install katonic[fs]

You can find the feature store examples here.

Experiment Operations

Training Machine Learning models just with one or two lines of code, can be done by the Auto ML component inside the Katonic SDK.

Even all the metrics for Classification and Regression will get catalouged using SDK. Available Metrices are Accuracy score, F-1 score, Precison, Recall, Log loss, Mean Squared Error, Mean Absolute Error and Root Mean Squared Error.

Install Auto ML.

pip install katonic[ml]

Auto ML Examples.

from katonic.ml.client import set_exp
from katonic.ml.classification import Classifier

# Creating a new experiment using set_exp function from ml client.
exp_name = "customer_churn_prediction"
set_exp(exp_name)

clf = Classifier(X_train,X_test,y_train,y_test, exp_name)

clf.GradientBoostingClassifier()
clf.DecisionTreeClassifier(max_depth=8, criterion="gini")

# Get registered models and metrics in Mlflow
df_runs = clf.search_runs(exp_id)
print("Number of runs done : ", len(df_runs))

Registry Operations

Once you finished training the models with your data. Katonic's SDK will keep track of all the models and store the Model metadata and metrices inside the Experiment Registry. From there you can choose the best model and send it into Model Registy.

In Model Registy you can store the Best models according to your performance Metrices. By using the model registy you can tag the models with staging or production. The models that are with the tag production can be Deployed to the production and the models with staging tag can get a review check from the QA team and get to the further stages.

Pipeline Operations

No Data Scientist want to do the same thing again and again, instead of that Data Scientist want to use the previous work that he had done for the future purposes. We can do the same thing inside an AI Model Life Cycle.

We can convert all the work that we had done till now into a Scalable Pipeline. For that you can use the Pipelines component inside the Katonic SDK. If you want to perform the same operations with the different data, you just need to change the data source and run the pipeline. Every thing will get done automatically in a scalable manner.

Install Pipelines.

pip install katonic[PIPELINE]

How to create and execute a pipeline using Katonic SDK.

  • Create a pipeline function

  • create a pipeline by defining task inside a pipeline function

from katonic.pipeline import kfp,dsl,component
def print_something(data: str):
    print(data)
    
@dsl.pipeline(
    name='Print Something',
    description='A pipeline that prints some data'
)
def pipeline():
    print_something_op = kfp.components.create_component_from_func(func=print_something)
    
    data = "Hello World!!"
    print_something_op(data)

create_component_from_func is used to convert functions to components that is stored inside print_something, data is passed inside print_something_op to print it.

Compiling And Running: Here pipeline experiment name, function is defined.

from datetime import datetime
import uuid
EXPERIMENT_NAME = "Print_Something"
pipeline_func = pipeline

using the pipeline funcion and yaml filename the pipeline is compiled that generated the .yaml file.

kfp.compiler.Compiler.compile() compiles your Python DSL code into a single static configuration (in YAML format) that the Kubeflow Pipelines service can process.

pipeline_filename = pipeline_func.__name__ + f'{uuid.uuid1()}.pipeline.yaml'
kfp.compiler.Compiler().compile(pipeline_func, pipeline_filename)

The pipeline is uploaded using the kfp.client() that contains all the pipeline details.

client = kfp.Client() 
experiment = client.create_experiment(EXPERIMENT_NAME)
run_name = pipeline_func.__name__ + str(datetime.now().strftime("%d-%m-%Y-%H-%M-%S"))
client.upload_pipeline(pipeline_filename)
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename)

Drift

An AI model life cycle will not end with the model deployment. You need to monitor the model's performance continuously in order to detect the model detoriation or model degradation. Drift component from Katonic's SDK will help you to find the Drift inside your data. It will perform certain statistical analysis upon the data in order to check if the upcoming data has any Outliers or the data is abnormal it will let you know through a Visual representaion.

Install Drift.

pip install katonic[DRIFT]

You can find the drift examples here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

katonic-1.2.tar.gz (80.3 kB view hashes)

Uploaded Source

Built Distribution

katonic-1.2-py3-none-any.whl (111.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page