Skip to main content

Pybuilder plugin providing tasks for assets deployment.

Project description

databricks-pybuilder-plugin

The plugin is considered to be used to deploy assets to a Databricks environment.

The plugin is activated with the following command in your build.by:

use_plugin('pypi:databricks_pybuilder_plugin')

It provides a set of tasks for uploading resources, workspaces, deploying jobs, or installing locally built egg dependencies into a databricks cluster.

Deployment to Databricks

Automated deployment is implemented on the PyBuilder tasks basis.

The list of available tasks:

  1. export_workspace - exporting a workspace. A workspace is considered to be a folder in Databricks holding a notebook, or a set of notebooks.

The task uploads the src/main/scripts/ content into a Databricks workspace. It overrides files of the same names and leaves other files as is.

By default, a git branch name is used as a nested folder of a workspace for uploading the content into the Databricks workspace, if it's available. The default folder is used otherwise. Use the branch parameter to import a workspace in your own folder:

pyb export_workspace -P branch={custom_directory}

The final output path would be /team_folder/application_name/{custom_directory} in this way.

Executing the command from a master branch

pyb export_workspace

would upload the workspace files into /team_folder/application_name/master.

Here is the list of related deployment settings

Property Value Description
project_workspace_path src/main/scripts/ The path to a folder in the project tree holding notebooks.
remote_workspace_path /team_folder/application_name/ The Databricks folder the notebooks would be uploaded into from project_workspace_path.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb export_workspace [-P env={env}] [-P branch={branch}]

Environment specific properties Disabled by default.

  1. export_resources - exporting resources into dbfs. Uploads resource files into dbfs if any. Existing files are to be overridden.

Here is the list of related deployment settings

Property Value Description
project_resources_path src/main/resources/ The path to the project resources.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb export_resources [-P env={env}] [-P branch={branch}]

  1. install_library - deploying an egg-archive to a Databricks cluster. Uploads an egg archive to dbfs, and re-attaches the library to a cluster by name. Re-installing a new library version triggers the cluster starting to uninstall old libraries versions and to install a new one. Repetitive installations of a library of the same version don't start the cluster. The library is just re-attached to a cluster in this way. Other installed libraries are not affected.

Here is the list of related deployment settings

Property Value Description
remote_cluster_name Test_cluster_name The name of a remote Databricks cluster the library to be installed to.
dbfs_library_path dbfs:/FileStore/jars The dbfs path to a folder holding the egg archives.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb install_library

  1. deploy_to_cluster - a full deployment to a cluster. Runs the export_resources, export_workspace, install_library in a row.

Usage example:

pyb deploy_to_cluster

  1. deploy_job - deploying a job to the Databricks by name. Please, make sure that the job is created on the Databricks side.

Executes export_resources and export_workspace tasks preliminarily. Updates the existing job using a job definition file. The definition file supports the jinja2 template syntax. Please, check documentation for details: https://jinja.palletsprojects.com/en/2.11.x/

Here is the list of related deployment settings

Property Value Description
job_definition_path src/main/databricks/databricks_job_settings.json The project path to a job definition file.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb deploy_job [-P env={env}] [-P branch={branch}]

To Run a notebook with a custom dependency

  1. Build the egg-archive with thepyb command.

  2. Deploy all the assets using the command pyb deploy_to_cluster.

  3. Get to the target folder in the Databricks workspace.

  4. Attach the notebook to a cluster and run the script.

All properties list

Property Default Value Description
databricks_credentials {
'dev': {'host': '', 'token': ''}
'qa': {'host': '', 'token': ''}
'prod': {'host': '', 'token': ''}
}
Please specify credentials in the dictionary format: host and token.
default_environment dev There are 3 supported environments: dev, qa and prod.
project_workspace_path src/main/scripts/ The directory content is going to be uploaded into a databricks workspace.
These are considered to be notebook scripts.
remote_workspace_path The databricks workspace that files in project_workspace_path are copied to.
include_git_branch_into_output_workspace_path True The flag enables adding an extra directory with the branch name to the remote_workspace_path. Requires git to be installed.
enable_env_sensitive_workspace_properties False The flag enables environment properties chosen by the env_config_workspace_path.
env_config_workspace_path environment-settings/{env}.py The path to a property file to be chosen as a env properties. By default env included into a file name is used to pick properties.
env_config_name env The expected environment properties file name. The env_config_workspace_path will be copied to databricks workspace with name.
with_dbfs_resources False The flage enables uploading resource files from the project_resources_path directory to databricks hdfs dbfs_resources_path.
project_resources_path src/main/resources/ The local directory path holding resource files to be copied (txt, csv etc).
dbfs_resources_path The output hdfs directory on databricks environment holding resources.
dbfs_library_path dbfs:/FileStore/jars The output hdfs directory on databricks environment holding a built dependency (egg-archive).
attachable_lib_envs ['dev'] The list of environments that requires a dependency attached to a databricks cluster. The dependency is preliminary must be uploaded to the dbfs_library_path.
cluster_init_timeout 5 * 60 The timeout of waiting a databricks cluster while it changes its state (initiating, restarting etc).
remote_cluster_name The name of a databricks cluster that dependency is attached to.
job_definition_path 'src/main/databricks/job_settings.json' The path to a dataricks job configuration in a json format - https://docs.databricks.com/dev-tools/api/2.0/jobs.html. It supports Jinja template syntax in order to setup env sensitive properties. It also supports multiple jobs definitions - use a json array for that.
extra_rendering_args Custom properties to be populated in the job definition file. Use a dicionary as an argument. For example: {'app_name': name}.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-pybuilder-plugin-0.0.3.tar.gz (11.1 kB view details)

Uploaded Source

Built Distributions

databricks_pybuilder_plugin-0.0.3-py3.8.egg (15.2 kB view details)

Uploaded Source

File details

Details for the file databricks-pybuilder-plugin-0.0.3.tar.gz.

File metadata

File hashes

Hashes for databricks-pybuilder-plugin-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c7383fd2f59fca5d5694a59f32f9de3271e23efda21e31b6e1e6c664d8bc9c1e
MD5 cb0e3cfa1499028274baa7d569628c97
BLAKE2b-256 43a3e9c65706b6c5ac92d67e6fd4ad2b00f97976c38e0c8f309b1c073f8b4cda

See more details on using hashes here.

File details

Details for the file databricks_pybuilder_plugin-0.0.3-py3.8.egg.

File metadata

File hashes

Hashes for databricks_pybuilder_plugin-0.0.3-py3.8.egg
Algorithm Hash digest
SHA256 163539466e11fbc9191d3b19bc4e57595476c0f5787c986d64d125cda982d132
MD5 a3ad14be8e0dd2f629f41c60cb253219
BLAKE2b-256 645ab6f857126c4cb8e8d0d721338d690e08ae66f251436543052ce3b6cc8620

See more details on using hashes here.

File details

Details for the file databricks_pybuilder_plugin-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_pybuilder_plugin-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0712a86000104c04529a66f44046b0f26fe8ec3544b2954b52dae0ffd82d358e
MD5 3260472cc32482e72936be8ea8a045e5
BLAKE2b-256 a2c9b55ba090264636142fec515e6717895f20aae8eb7c8686e0eb8da6fc6115

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page