Skip to main content

Pybuilder plugin providing tasks for assets deployment.

Project description

databricks-pybuilder-plugin

The plugin is considered to be used to deploy assets to a Databricks environment.

The plugin is activated with the following command in your build.by:

use_plugin('pypi:databricks_pybuilder_plugin')

It provides a set of tasks for uploading resources, workspaces, deploying jobs, or installing locally built egg dependencies into a databricks cluster.

Deployment to Databricks

Automated deployment is implemented on the PyBuilder tasks basis.

The list of available tasks:

  1. export_workspace - exporting a workspace. A workspace is considered to be a folder in Databricks holding a notebook, or a set of notebooks.

The task uploads the src/main/scripts/ content into a Databricks workspace. It overrides files of the same names and leaves other files as is.

By default, a git branch name is used as a nested folder of a workspace for uploading the content into the Databricks workspace, if it's available. The default folder is used otherwise. Use the branch parameter to import a workspace in your own folder:

pyb export_workspace -P branch={custom_directory}

The final output path would be /team_folder/application_name/{custom_directory} in this way.

Executing the command from a master branch

pyb export_workspace

would upload the workspace files into /team_folder/application_name/master.

Here is the list of related deployment settings

Property Value Description
project_workspace_path src/main/scripts/ The path to a folder in the project tree holding notebooks.
remote_workspace_path /team_folder/application_name/ The Databricks folder the notebooks would be uploaded into from project_workspace_path.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb export_workspace [-P env={env}] [-P branch={branch}]

Environment specific properties Disabled by default.

  1. export_resources - exporting resources into dbfs. Uploads resource files into dbfs if any. Existing files are to be overridden.

Here is the list of related deployment settings

Property Value Description
project_resources_path src/main/resources/ The path to the project resources.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb export_resources [-P env={env}] [-P branch={branch}]

  1. install_library - deploying an egg-archive to a Databricks cluster. Uploads an egg archive to dbfs, and re-attaches the library to a cluster by name. Re-installing a new library version triggers the cluster starting to uninstall old libraries versions and to install a new one. Repetitive installations of a library of the same version don't start the cluster. The library is just re-attached to a cluster in this way. Other installed libraries are not affected.

Here is the list of related deployment settings

Property Value Description
remote_cluster_name Test_cluster_name The name of a remote Databricks cluster the library to be installed to.
dbfs_library_path dbfs:/FileStore/jars The dbfs path to a folder holding the egg archives.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb install_library

  1. deploy_to_cluster - a full deployment to a cluster. Runs the export_resources, export_workspace, install_library in a row.

Usage example:

pyb deploy_to_cluster

  1. deploy_job - deploying a job to the Databricks by name. Please, make sure that the job is created on the Databricks side.

Executes export_resources and export_workspace tasks preliminarily. Updates the existing job using a job definition file. The definition file supports the jinja2 template syntax. Please, check documentation for details: https://jinja.palletsprojects.com/en/2.11.x/

Here is the list of related deployment settings

Property Value Description
job_definition_path src/main/databricks/databricks_job_settings.json The project path to a job definition file.

All of the properties could be overridden with a -P parameter.

Usage example:

pyb deploy_job [-P env={env}] [-P branch={branch}]

To Run a notebook with a custom dependency

  1. Build the egg-archive with thepyb command.

  2. Deploy all the assets using the command pyb deploy_to_cluster.

  3. Get to the target folder in the Databricks workspace.

  4. Attach the notebook to a cluster and run the script.

All properties list

Property Default Value Description
databricks_credentials {
'dev': {'host': '', 'token': ''}
'qa': {'host': '', 'token': ''}
'prod': {'host': '', 'token': ''}
}
Please specify credentials in the dictionary format: host and token.
default_environment dev There are 3 supported environments: dev, qa and prod.
project_workspace_path src/main/scripts/ The directory content is going to be uploaded into a databricks workspace.
These are considered to be notebook scripts.
remote_workspace_path The databricks workspace that files in project_workspace_path are copied to.
include_git_branch_into_output_workspace_path True The flag enables adding an extra directory with the branch name to the remote_workspace_path. Requires git to be installed.
enable_env_sensitive_workspace_properties False The flag enables environment properties chosen by the env_config_workspace_path.
env_config_workspace_path environment-settings/{env}.py The path to a property file to be chosen as a env properties. By default env included into a file name is used to pick properties.
env_config_name env The expected environment properties file name. The env_config_workspace_path will be copied to databricks workspace with name.
with_dbfs_resources False The flage enables uploading resource files from the project_resources_path directory to databricks hdfs dbfs_resources_path.
project_resources_path src/main/resources/ The local directory path holding resource files to be copied (txt, csv etc).
dbfs_resources_path The output hdfs directory on databricks environment holding resources.
dbfs_library_path dbfs:/FileStore/jars The output hdfs directory on databricks environment holding a built dependency (egg-archive).
attachable_lib_envs ['dev'] The list of environments that requires a dependency attached to a databricks cluster. The dependency is preliminary must be uploaded to the dbfs_library_path.
cluster_init_timeout 5 * 60 The timeout of waiting a databricks cluster while it changes its state (initiating, restarting etc).
remote_cluster_name The name of a databricks cluster that dependency is attached to.
job_definition_path 'src/main/databricks/job_settings.json' The path to a dataricks job configuration in a json format - https://docs.databricks.com/dev-tools/api/2.0/jobs.html. It supports Jinja template syntax in order to setup env sensitive properties. It also supports multiple jobs definitions - use a json array for that.
deploy_single_job The name of a job to be deployed. If your databricks job config contains multiple definitions, you can deploy just one of these jobs specifying a name of the particular job.
extra_rendering_args Custom properties to be populated in the job definition file. Use a dicionary as an argument. For example: {'app_name': name}.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-pybuilder-plugin-0.0.4.tar.gz (11.3 kB view details)

Uploaded Source

Built Distributions

databricks_pybuilder_plugin-0.0.4-py3.8.egg (15.5 kB view details)

Uploaded Source

File details

Details for the file databricks-pybuilder-plugin-0.0.4.tar.gz.

File metadata

File hashes

Hashes for databricks-pybuilder-plugin-0.0.4.tar.gz
Algorithm Hash digest
SHA256 8aa71697e6adff54aa4d109f35086fb443deae8ebc928a487ad1fd9ed6d42dcb
MD5 1ed32c06fa9a80a4ca6b33fff9ff9c59
BLAKE2b-256 3f3e703f46c3081700fe934ce7528b02b2876a2d61dca34309484c273d488113

See more details on using hashes here.

File details

Details for the file databricks_pybuilder_plugin-0.0.4-py3.8.egg.

File metadata

File hashes

Hashes for databricks_pybuilder_plugin-0.0.4-py3.8.egg
Algorithm Hash digest
SHA256 4fe1e5ba86e1344adf35c7b68169b6e81416a3541175b6c6b91c856078ba4954
MD5 370f2c98061ed4c8523c3f563e383a32
BLAKE2b-256 2174d4798285595e84c34aecbc12f273587fc65ded84edca7656c785150e08c9

See more details on using hashes here.

File details

Details for the file databricks_pybuilder_plugin-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_pybuilder_plugin-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fb261251bd37e56b5fd866f978816538639ef391d84620443a2838139c0a2e9b
MD5 272b085b9f92be9558aed00c09e98073
BLAKE2b-256 6aa8c4e96ed8269e705ceb32222a28348a4a0c358f03f415791d6b3a0d65b0c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page