Skip to main content

No project description provided

Project description

mlplatformutils


mlplatformutils package for observability and ML Pipeline Processing


This framework supports Azure Machine Learning training Pipeline supporting across computes such as Azure Synapse Spark, Virtual Machines Clusters, Azure Kubernetes Cluster, Azure Databricks. It supports reading/writing data from Azure Data Lake Gen2 in parquet and DELTA format, Azure Data Explorer (Kusto), Azure Sql DB instnces. The framework suports Python and Spark scalably. Writes with Spark with capabilties such a dynamic partitition overwrites, repartitioning are fully supported. In operating data reads and writes from such sources, The framework integrates built-in lineage framework providing column level lineage across the systems on a scalable Graph leveraging Azure Cosmos Gremlin Graph DB service. This enables a robust upstream dependency tracking and proactive alerting & eventing. All operations are suported over Service Principal (Client Id, Client Secrets) for applications and processing. The package also provides creating and managing computes, PIP dependecies for Azure Machine Learning Workspace and the training definitions.

Description


app_insights_logger - Contains telemetrylogger Class with Functions to Manage and Log Telemetry into Azure Application Insights


  • trackEvent
  • trackTrace
  • trackException
  • logEvent
  • gather_event_details

lineagegraph - Contains LineageGraph Class with functions to manage Graph on Azure Cosmos DB enabled with Gremlin


  • add_vertex
  • get_vertices
  • is_vertex
  • update_vertex
  • insert_edges
  • drop_vertex
  • drop_edge
  • query_graph
  • update_lineage_graph
  • connect_lineage_graph

platformutils - Contains platform utility functions to check, install depedencies, check Azure ML Compute

  • is_package_installed
  • install_pip
  • get_environment
  • set_environment
  • assert_amlcompute
  • read_setup_ini

sparkutils - Contains functions to read data from sources such as (Azure Data Lake Gen2, Azure Data Explorer (Kusto), Azure Sql Server) and write (Azure Data Lake Gen2)while ensuring integrated Lineage Graph Logging.

  • read_from_adls_gen2
  • write_to_adls_gen2
  • read_from_kusto
  • read_from_azsql

sparkcoreutils - Contains functions to read data from sources such as (Azure Data Lake Gen2, Azure Data Explorer (Kusto), Azure Sql Server) and write (Azure Data Lake Gen2) without integrated Lineage Graph Logging.

  • read_from_adls_gen2
  • write_to_adls_gen2
  • read_from_kusto
  • read_from_azsql

pandasutils - Contains functions to read data from Azure Data Lake Gen2 (from Delta Format or Parquet Format) into Pandas Dataframe without Spark while ensuring integrated Lineage Graph Logging.

  • read_from_delta_as_pandas
  • read_parquet_file_from_adlsgen2_as_pandas
  • read_parquet_directory_from_adlsgen2_as_pandas
  • write_pandas_as_parquet_file_to_adlsgen2

pandascoreutils - Contains functions to read data from Azure Data Lake Gen2 (from Delta Format or Parquet Format) into Pandas Dataframe without Spark without integrated Lineage Graph Logging.

  • read_from_delta_as_pandas
  • read_parquet_file_from_adlsgen2_as_pandas
  • read_parquet_directory_from_adlsgen2_as_pandas
  • write_pandas_as_parquet_file_to_adlsgen2

freshnessutils - Contains functions to add freshness details into Azure Cosmos (NoSQL) document db. This helps with the details on the freshness metrics on evaluating the SLA, and downstream processing. It captures and provides details on model, training dataset freshness for the most recent and historical processing.

  • add_freshness
  • upsert_freshness
  • query_freshness

Examples


from mlplatformutils.core.platformutils import is_package_installed
print(is_package_installed("pandas"))
from mlplatformutils.core.app_insights_logger import telemetrylogger
from mlplatformutils.core.lineagegraph import LineageGraph
from mlplatformutils.core.sparkutils import write_to_adls_gen2, read_from_adls_gen2
from mlplatformutils.core.pandasutils import write_pandas_as_parquet_file_to_adlsgen2, read_parquet_directory_from_adlsgen2_as_pandas
from mlplatformutils.core.sparkcoreutils import write_to_adls_gen2, read_from_adls_gen2
from mlplatformutils.core.pandascoreutils import write_pandas_as_parquet_file_to_adlsgen2, read_parquet_directory_from_adlsgen2_as_pandas
from mlplatformutils.core.freshnessutils import add_freshness, upsert_freshness, query_freshness
import mlplatformutils.core.version as vr
print(vr.__version__)

Notes


When Running this Lineage Package from Jupyter Nootebook, the below 3 Lines Help overcome JupyterNotebook RuntimeError: Cannot run the event loop while another loop is running
import asyncio
import nest_asyncio
nest_asyncio.apply()

Structure


.
|-- LICENSE.txt
|-- README.rst
|-- setup.cfg
|-- setup.py
|-- src
| |-- mlplatformutils
| | |-- __init__.py
| | |-- core
| | |-- |-- __init__.py
| | |-- |-- sparkcoreutils.py
| | |-- |-- sparkutils.py
| | |-- |-- platformutils.py
| | |-- |-- pandascoreutils.py
| | |-- |-- pandasutils.py
| | |-- |-- lineagegraph.py
| | |-- |-- freshnessutils.py
| | |-- |-- app_insights_logger.py
|-- tests
| |-- __init__.py
| |-- core
| |-- |--__init__.py
| |-- |-- test_sparkcoreutils.py
| |-- |-- test_sparkutils.py
| |-- |-- test_platformutils.py
| |-- |-- test_pandascoreutils.py
| |-- |-- test_pandasutils.py
| |-- |-- test_lineagegraph.py
| |-- |-- test_freshnessutils.py
| |-- |-- test_app_insights_logger.py

Instructions


install twine - twine is a utility package that is used for publishing Python packages on PyPI

python -m pip install twine

Build Package - create the source distribution of the package

python setup.py sdist

Upload Package to PyPI

**python -m twine upload dist/ ***

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlplatformutils-0.9.5.5.tar.gz (13.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page