Pybuilder plugin providing tasks for assets deployment.
Project description
databricks-pybuilder-plugin
The plugin is considered to be used to deploy assets to a Databricks environment.
The plugin is activated with the following command in your build.by:
use_plugin('pypi:databricks_pybuilder_plugin')
It provides a set of tasks for uploading resources, workspaces, deploying jobs, or installing locally built egg dependencies into a databricks cluster.
Deployment to Databricks
Automated deployment is implemented on the PyBuilder tasks basis.
The list of available tasks:
- export_workspace - exporting a workspace. A workspace is considered to be a folder in Databricks holding a notebook, or a set of notebooks.
The task uploads the src/main/scripts/
content into a Databricks workspace.
It overrides files of the same names and leaves other files as is.
By default, a git branch name is used as a nested folder of a workspace
for uploading the content into the Databricks workspace, if it's available.
The default
folder is used otherwise.
Use the branch
parameter to import a workspace in your own folder:
pyb export_workspace -P branch={custom_directory}
The final output path would be /team_folder/application_name/{custom_directory}
in this way.
Executing the command from a master branch
pyb export_workspace
would upload the workspace files into /team_folder/application_name/master
.
Here is the list of related deployment settings
Property | Value | Description |
---|---|---|
project_workspace_path | src/main/scripts/ |
The path to a folder in the project tree holding notebooks. |
remote_workspace_path | /team_folder/application_name/ |
The Databricks folder the notebooks would be uploaded into from project_workspace_path. |
All of the properties could be overridden with a -P parameter.
Usage example:
pyb export_workspace [-P env={env}] [-P branch={branch}]
Environment specific properties Disabled by default.
- export_resources - exporting resources into dbfs. Uploads resource files into dbfs if any. Existing files are to be overridden.
Here is the list of related deployment settings
Property | Value | Description |
---|---|---|
project_resources_path | src/main/resources/ |
The path to the project resources. |
All of the properties could be overridden with a -P parameter.
Usage example:
pyb export_resources [-P env={env}] [-P branch={branch}]
- install_library - deploying an egg-archive to a Databricks cluster. Uploads an egg archive to dbfs, and re-attaches the library to a cluster by name. Re-installing a new library version triggers the cluster starting to uninstall old libraries versions and to install a new one. Repetitive installations of a library of the same version don't start the cluster. The library is just re-attached to a cluster in this way. Other installed libraries are not affected.
Here is the list of related deployment settings
Property | Value | Description |
---|---|---|
remote_cluster_name | Test_cluster_name |
The name of a remote Databricks cluster the library to be installed to. |
dbfs_library_path | dbfs:/FileStore/jars |
The dbfs path to a folder holding the egg archives. |
All of the properties could be overridden with a -P parameter.
Usage example:
pyb install_library
- deploy_to_cluster - a full deployment to a cluster.
Runs the
export_resources
,export_workspace
,install_library
in a row.
Usage example:
pyb deploy_to_cluster
- deploy_job - deploying a job to the Databricks by name. Please, make sure that the job is created on the Databricks side.
Executes export_resources
and export_workspace
tasks preliminarily.
Updates the existing job using a job definition file.
The definition file supports the jinja2 template syntax.
Please, check documentation for details: https://jinja.palletsprojects.com/en/2.11.x/
Here is the list of related deployment settings
Property | Value | Description |
---|---|---|
job_definition_path | src/main/databricks/databricks_job_settings.json |
The project path to a job definition file. |
All of the properties could be overridden with a -P parameter.
Usage example:
pyb deploy_job [-P env={env}] [-P branch={branch}]
To Run a notebook with a custom dependency
-
Build the egg-archive with the
pyb
command. -
Deploy all the assets using the command
pyb deploy_to_cluster
. -
Get to the target folder in the Databricks workspace.
-
Attach the notebook to a cluster and run the script.
All properties list
Property | Default Value | Description |
---|---|---|
databricks_credentials | { 'dev': {'host': '', 'token': ''} 'qa': {'host': '', 'token': ''} 'prod': {'host': '', 'token': ''} } |
Please specify credentials in the dictionary format: host and token. |
default_environment | dev |
There are 3 supported environments: dev , qa and prod . |
project_workspace_path | src/main/scripts/ |
The directory content is going to be uploaded into a databricks workspace. These are considered to be notebook scripts. |
remote_workspace_path | The databricks workspace that files in project_workspace_path are copied to. |
|
include_git_branch_into_output_workspace_path | True |
The flag enables adding an extra directory with the branch name to the remote_workspace_path . Requires git to be installed. |
enable_env_sensitive_workspace_properties | False |
The flag enables environment properties chosen by the env_config_workspace_path . |
env_config_workspace_path | environment-settings/{env}.py |
The path to a property file to be chosen as a env properties. By default env included into a file name is used to pick properties. |
env_config_name | env |
The expected environment properties file name. The env_config_workspace_path will be copied to databricks workspace with name. |
with_dbfs_resources | False |
The flage enables uploading resource files from the project_resources_path directory to databricks hdfs dbfs_resources_path . |
project_resources_path | src/main/resources/ |
The local directory path holding resource files to be copied (txt, csv etc). |
dbfs_resources_path | The output hdfs directory on databricks environment holding resources. | |
dbfs_library_path | dbfs:/FileStore/jars |
The output hdfs directory on databricks environment holding a built dependency (egg-archive). |
attachable_lib_envs | ['dev'] |
The list of environments that requires a dependency attached to a databricks cluster. The dependency is preliminary must be uploaded to the dbfs_library_path . |
cluster_init_timeout | 5 * 60 |
The timeout of waiting a databricks cluster while it changes its state (initiating, restarting etc). |
remote_cluster_name | The name of a databricks cluster that dependency is attached to. | |
job_definition_path | 'src/main/databricks/job_settings.json' | The path to a dataricks job configuration in a json format - https://docs.databricks.com/dev-tools/api/2.0/jobs.html. It supports Jinja template syntax in order to setup env sensitive properties. It also supports multiple jobs definitions - use a json array for that. |
deploy_single_job | The name of a job to be deployed. If your databricks job config contains multiple definitions, you can deploy just one of these jobs specifying a name of the particular job. | |
extra_rendering_args | Custom properties to be populated in the job definition file. Use a dicionary as an argument. For example: {'app_name': name} . |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file databricks-pybuilder-plugin-0.0.4.tar.gz
.
File metadata
- Download URL: databricks-pybuilder-plugin-0.0.4.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8aa71697e6adff54aa4d109f35086fb443deae8ebc928a487ad1fd9ed6d42dcb |
|
MD5 | 1ed32c06fa9a80a4ca6b33fff9ff9c59 |
|
BLAKE2b-256 | 3f3e703f46c3081700fe934ce7528b02b2876a2d61dca34309484c273d488113 |
File details
Details for the file databricks_pybuilder_plugin-0.0.4-py3.8.egg
.
File metadata
- Download URL: databricks_pybuilder_plugin-0.0.4-py3.8.egg
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fe1e5ba86e1344adf35c7b68169b6e81416a3541175b6c6b91c856078ba4954 |
|
MD5 | 370f2c98061ed4c8523c3f563e383a32 |
|
BLAKE2b-256 | 2174d4798285595e84c34aecbc12f273587fc65ded84edca7656c785150e08c9 |
File details
Details for the file databricks_pybuilder_plugin-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: databricks_pybuilder_plugin-0.0.4-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb261251bd37e56b5fd866f978816538639ef391d84620443a2838139c0a2e9b |
|
MD5 | 272b085b9f92be9558aed00c09e98073 |
|
BLAKE2b-256 | 6aa8c4e96ed8269e705ceb32222a28348a4a0c358f03f415791d6b3a0d65b0c1 |