GDMO native classes for standardized interaction with data objects within Azure Databricks. Contains TimeSeriesForecasting, APIRequest, Landing, and Delta functions.
Project description
gdmo
GDMO native classes for standardized interaction with data objects within Azure Databricks
This custom library allows our engineering team to use standardized packages that strip away a load of administrative and repetitive tasks from their daily object interactions. The current classes supported (V0.1.0) are:
Installation
Install this library using pip
:
pip install gdmo
Usage
Forecast - Forecast
Standardized way of forecasting a dataset. Input a dataframe with a Series, a Time, and a Value column, and see the function automatically select the right forecasting model and generate an output.
Example usage:
from gdmo import TimeSeriesForecast
forecaster = TimeSeriesForecast(spark, 'Invoiced Revenue')\
.set_columns('InvoiceDate', 'ProductCategory', 'RevenueUSD')\
.set_forecast_length(forecast_length)\
.set_last_data_point(lastdatamonth)\
.set_input(df)\
.set_growth_cap(0.02)\
.set_use_cap_growth(True)\
.set_modelselection_breakpoints(12, 24)\
.set_track_outcome(False)\
.build_forecast()
forecaster.inspect_forecast()
API - APIRequest
Class to perform a standard API Request using the request library, which allows a user to just add their endpoint / authentication / method data, and get the data returned without the need of writing error handling or need to understand how to properly build a request.
Example usage:
request = APIRequest(uri)\
.set_content_type('application/json') \
.set_header('bearer xxxxx') \
.set_method('GET') \
.set_parameters({"Month": "2024-01-01"})\
.make_request()
response = request.get_json_response()
display(response)
Tables - Landing
A class for landing API ingests and other data into Azure Data Lake Storage (ADLS). Currently can ingest Sharepoint (excel) data and JSON (API-sourced) data.
Example usage to ingest files from Sharepoint folder:
environment = 'xxxxx' #Databricks catalog
Sharepointsite = 'xxxxx'
UserName = 'xxxxx'
Password = 'xxxxx'
Client_ID = 'xxxxx'
adls_temp = 'xxxxx'
sharepoint = Landing(spark, dbutils, database="xxx", bronze_table="xxx", catalog=environment, container='xxx')\
.set_tmp_file_location(adls_temp)\
.set_sharepoint_location(Sharepointsite)\
.set_sharepoint_auth(UserName, Password, Client_ID)\
.set_auto_archive(False)\
.get_all_sharepoint_files()
Example usage to ingest JSON content from an API:
#Sample API request using the APIRequest class
uri = 'xxxxx'
request = APIRequest(uri).make_request()
response = request.get_json_response()
#Initiate the class, tell it where the bronze table is located, load configuration data for that table (required for delta merge), add the JSON to the landing area in ADLS, then put the landed data into a bronze delta table in the databricks catalog.
landing = Landing(spark, dbutils, database="xxx", bronze_table="xxx", target_folder=location, filename=filename, catalog=environment, container='xxx')\
.set_bronze(bronze)\
.set_config(config)\
.put_json_content(response)\
.put_bronze()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gdmo-0.0.38.tar.gz
.
File metadata
- Download URL: gdmo-0.0.38.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9ef703fcfe220ca44b2509fad51460acb478e0fb6ea15ef973291f7c98b29d3 |
|
MD5 | 569a799e3a7297deea4d3a7f6c21b137 |
|
BLAKE2b-256 | dd6a2ddfa5148efbb60380f31c7082c186288c35de3d71fd34616b0da4eaccb2 |
File details
Details for the file gdmo-0.0.38-py3-none-any.whl
.
File metadata
- Download URL: gdmo-0.0.38-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b48e74ce9658ec40dab9817b73d61fbcd7d1d9009e2ea6dfe36257eab2b4983 |
|
MD5 | d0be0af6906c024da1425232a656b13b |
|
BLAKE2b-256 | c2f2fc80c25a512d8ca3b93270c51254c3fe01c4f08920f2d97a59c2a4456ca3 |