Skip to main content

DataBricks CLI eXtensions aka dbx

Project description

logo

DataBricks CLI eXtensions - aka dbx is a CLI tool for advanced Databricks jobs management.

Documentation Status Latest Python Release GitHub Workflow Status (branch) codecov lgtm-alerts lgtm-code-quality downloads We use black for formatting

Concept

dbx simplifies jobs launch and deployment process across multiple environments. It also helps to package your project and deliver it to your Databricks environment in a versioned fashion. Designed in a CLI-first manner, it is built to be actively used both inside CI/CD pipelines and as a part of local tooling for fast prototyping.

Requirements

  • Python Version > 3.6

  • pip or conda

Installation

  • with pip:

pip install dbx

Quickstart

Please refer to the Quickstart section.

Documentation

Please refer to the docs page.

Differences from other tools

Tool

Comment

databricks-cli

dbx is NOT a replacement for databricks-cli. Quite the opposite - dbx is heavily dependent on databricks-cli and uses most of the APIs exactly from databricks-cli SDK.

mlflow cli

dbx is NOT a replacement for mlflow cli. dbx uses some of the MLflow APIs under the hood to store serialized job objects, but doesn’t use mlflow CLI directly.

Databricks Terraform Provider

While dbx is primarily oriented on versioned job management, Databricks Terraform Provider provides much wider set of infrastructure settings. In comparison, dbx doesn’t provide infrastructure management capabilities, but brings more flexible deployment and launch options.

Databricks Stack CLI

Databricks Stack CLI is a great component for managing a stack of objects. dbx concentrates on the versioning and packaging jobs together, not treating files and notebooks as a separate component.

Limitations

  • Development:

    • dbx currently doesn’t provide interactive debugging capabilities.
      If you want to use interactive debugging, you can use Databricks Connect + dbx for deployment operations.
    • dbx execute only supports Python-based projects which use local files (Notebooks or Repos are not supported in dbx execute).

    • dbx execute can only be used on clusters with Databricks ML Runtime 7.X or higher.

  • General:

    • dbx doesn’t support Delta Live Tables at the moment.

    • host in your profile configuration in ~/.databrickscfg shall only consist of two parts: {scheme}://netlog, e.g. https://some-host.cloud.databricks.com.
      Strings like https://some-host.cloud.databricks.com/?o=XXXX# are not supported. As a symptom if this you might the the error below:
raise MlflowException("%s. Response body: '%s'" % (base_msg, response.text))
mlflow.exceptions.MlflowException: API request to endpoint was successful but the response body was not in a valid JSON format.

Versioning

For CLI interfaces, we support SemVer approach. However, for API components we don’t use SemVer as of now. This may lead to instability when using dbx API methods directly.

Feedback

Issues with dbx? Found a bug? Have a great idea for an addition? Feel free to file an issue.

Contributing

Please find more details about contributing to dbx in the contributing doc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

dbx-0.6.4-py3-none-any.whl (80.9 kB view details)

Uploaded Python 3

File details

Details for the file dbx-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: dbx-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 80.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dbx-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eb0c2c796e8ed636b255e13855dcf39d7097a9b9a180e9f01ac2538e15caa099
MD5 2473fd2f2594502c8387887188f5eac0
BLAKE2b-256 48346e9f19668c9ac56d5cdfa826bfdf5c4399247e8012989c399d087b027a78

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page