Python SDK for programmatic access to the Zipher API
Project description
Zipher SDK
The Zipher SDK is a Python library for interacting with Zipher's APIs.
Package Installation
You can install the Zipher SDK using pip:
pip install zipher-sdk
Providing Zipher with access to Databricks workspace
After installing zipher-sdk package a cli tool to automatically create all necessary resources for Zipher is available.
Setting up credentials
You need to provide credentials that will be used to create all necessary resources and permissions for Zipher.
Here are ways to set up credentials:
.databrickscfgconfig (cli tool supports profile choice as an argument)ZIPHER_DATABRICKS_HOSTandZIPHER_DATABRICKS_TOKENorZIPHER_DATABRICKS_CLIENT_ID,ZIPHER_DATABRICKS_CLIENT_SECRETenvironment variables- providing credentials as arguments to cli tool
Cli tool usage examples
Providing Zipher with access to a list of jobs
zipher setup --jobs-list 12345678,87654321,12344321,21436587
Providing Zipher with access to n jobs from the workspace
zipher setup --max-jobs 50
Providing Zipher with readonly access to a list of jobs
zipher setup --readonly --jobs-list 12345678,87654321,12344321,21436587
Full cli tool specification
usage: zipher setup [-h] [--workspace-host WORKSPACE_HOST] [--access-token ACCESS_TOKEN] [--client-id CLIENT_ID] [--client-secret CLIENT_SECRET] [--profile PROFILE] [--verbose] [--jobs-list JOBS_LIST] [--max-jobs MAX_JOBS]
[--max-runs MAX_RUNS] [--days-back DAYS_BACK] [--readonly] [--pat] [--skip-approval]
options:
-h, --help show this help message and exit
--workspace-host WORKSPACE_HOST
Databricks workspace host URL.
--access-token ACCESS_TOKEN
Databricks workspace access token.
--client-id CLIENT_ID
Databricks workspace OAuth client id.
--client-secret CLIENT_SECRET
Databricks workspace OAuth client secret.
--profile PROFILE Profile name from .databrickscfg.
--verbose Print full error message on fail.
--jobs-list JOBS_LIST
Comma-separated list of jobs ids to provide access to.
--max-jobs MAX_JOBS Maximum number of jobs to consider when iterating over jobs to grant permissions (default: 2000).
--max-runs MAX_RUNS Maximum number of runs to consider when iterating over runs to grant permissions to relative jobs (default: 2000).
--days-back DAYS_BACK
How many days back to fetch relevant job runs for permission updates (default: 7).
--readonly Provide Zipher with only CAN_VIEW permissions on listed jobs. When not provided will default to CAN_MANAGE permissions.
--pat Generate Personal Access Token for Zipher instead of default OAuth client creds.
--skip-approval Skip user input approval.
SDK Usage
Here are some basic examples of how you can use the Zipher SDK to optimize your databricks clusters using Zipher's ML-powered optimization engine:
Update Existing Configuration
You can update an existing configuration by initializing a zipher Client and sending a JSON payload to the update_existing_conf function. Here's how you can do it:
from zipher import Client
client = Client(customer_id="my_customer_id") # assuming the zipher API key is stored in ZIPHER_API_KEY environment variable
# Your existing cluster config:
config_payload = {
"new_cluster": {
"autoscale": {
"min_workers": 1,
"max_workers": 30
},
"cluster_name": "my-cluster",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.driver.maxResultSize": "4g"
},
"aws_attributes": {
"first_on_demand": 0,
"availability": "SPOT",
"zone_id": "auto",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "rd-fleet.2xlarge",
"driver_node_type_id": "rd-fleet.xlarge",
"spark_env_vars": {},
"enable_elastic_disk": "false"
}
}
# Update configuration
optimized_cluster = client.update_existing_conf(job_id="my-job-id", existing_conf=config_payload)
# Continue with sending the optimized configuration to Databricks via the Databricks python SDK, Airflow operator, etc.
Update Existing Multiple Tasks Configuration
You can update multiple databricks tasks by initializing a zipher Client and sending a JSON representing a list of dbx SubmitTask objects to
the get_optimized_tasks function.
from zipher import Client
client = Client(customer_id="my_customer_id") # assuming the zipher API key is stored in ZIPHER_API_KEY environment variable
tasks_to_optimize = [
{
"task_key": "task_1",
"description": "Test notebook task",
"notebook_task": {
"notebook_path": "/path/to/your/notebook",
"base_parameters": {
"param1": "value1"
}
},
"new_cluster": {
"spark_version": "14.3.x-scala2.12",
"node_type_id": "m6id.large",
"driver_node_type_id": "m6id.large",
"num_workers": 2,
"aws_attributes": {
"first_on_demand": 0,
"availability": "SPOT",
"zone_id": "auto",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"spark_conf": {
"spark.driver.maxResultSize": "4g"
}
}
},
{
"task_key": "task_2",
"description": "Test Python task",
"spark_python_task": {
"python_file": "/path/to/your/python_file.py",
},
"new_cluster": {
"spark_version": "14.3.x-scala2.12",
"node_type_id": "m6id.large",
"driver_node_type_id": "m6id.large",
"num_workers": 2,
"spark_conf": {
"spark.driver.maxResultSize": "4g"
}
},
"timeout_seconds": 3600,
"depends_on": [
{
"task_key": "task_1"
}
]
}
]
# Update tasks
optimized_tasks = client.get_optimized_tasks(job_id="my-job-id", tasks=tasks_to_optimize)
# Continue with sending the optimized tasks to Databricks via the Databricks python SDK, Airflow operator, etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zipher_sdk-0.3.9.tar.gz.
File metadata
- Download URL: zipher_sdk-0.3.9.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d188d96deae5ca15dddd342ba8f99df695a1cd3e3eb163254ca0994fbfd64b29
|
|
| MD5 |
2ad6de28e25f88996609f4c82ef9f5dc
|
|
| BLAKE2b-256 |
cbf51776d8ef38e1a8dfa37020adfd39d0a9cf9cf6e02308160b2a28ff92c2a0
|
File details
Details for the file zipher_sdk-0.3.9-py3-none-any.whl.
File metadata
- Download URL: zipher_sdk-0.3.9-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
446ee474b9235227951874db8582c07d5e91ff87b19ada3b4f62a9dc31b97d17
|
|
| MD5 |
25a6d88758db64f9365c2f332f47b028
|
|
| BLAKE2b-256 |
0a6fba180443ae50f85e726462cd0c46b282337e1b03349567004967e7c7398f
|