Prefect integrations with Databricks
Project description
prefect-databricks
Visit the full docs here to see additional examples and the API reference.
Welcome!
Prefect integrations for interacting with Databricks
The tasks within this collection were created by a code generator using the service's OpenAPI spec.
The service's REST API documentation can be found here.
Getting Started
Python setup
Requires an installation of Python 3.7+.
We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.
These tasks are designed to work with Prefect 2. For more information about how to use Prefect, please refer to the Prefect documentation.
Installation
Install prefect-databricks
with pip
:
pip install prefect-databricks
A list of available blocks in prefect-databricks
and their setup instructions can be found here.
Lists jobs on the Databricks instance
from prefect import flow
from prefect_databricks import DatabricksCredentials
from prefect_databricks.jobs import jobs_list
@flow
def example_execute_endpoint_flow():
databricks_credentials = DatabricksCredentials.load("my-block")
jobs = jobs_list(
databricks_credentials,
limit=5
)
return jobs
example_execute_endpoint_flow()
Use with_options
to customize options on any existing task or flow
custom_example_execute_endpoint_flow = example_execute_endpoint_flow.with_options(
name="My custom flow name",
retries=2,
retry_delay_seconds=10,
)
Launch a new cluster and run a Databricks notebook
Notebook named example.ipynb
on Databricks which accepts a name parameter:
name = dbutils.widgets.get("name")
message = f"Don't worry {name}, I got your request! Welcome to prefect-databricks!"
print(message)
Prefect flow that launches a new cluster to run example.ipynb
:
from prefect import flow
from prefect_databricks import DatabricksCredentials
from prefect_databricks.jobs import jobs_runs_submit
from prefect_databricks.models.jobs import (
AutoScale,
AwsAttributes,
JobTaskSettings,
NotebookTask,
NewCluster,
)
@flow
def jobs_runs_submit_flow(notebook_path, **base_parameters):
databricks_credentials = DatabricksCredentials.load("my-block")
# specify new cluster settings
aws_attributes = AwsAttributes(
availability="SPOT",
zone_id="us-west-2a",
ebs_volume_type="GENERAL_PURPOSE_SSD",
ebs_volume_count=3,
ebs_volume_size=100,
)
auto_scale = AutoScale(min_workers=1, max_workers=2)
new_cluster = NewCluster(
aws_attributes=aws_attributes,
autoscale=auto_scale,
node_type_id="m4.large",
spark_version="10.4.x-scala2.12",
spark_conf={"spark.speculation": True},
)
# specify notebook to use and parameters to pass
notebook_task = NotebookTask(
notebook_path=notebook_path,
base_parameters=base_parameters,
)
# compile job task settings
job_task_settings = JobTaskSettings(
new_cluster=new_cluster,
notebook_task=notebook_task,
task_key="prefect-task"
)
run = jobs_runs_submit(
databricks_credentials=databricks_credentials,
run_name="prefect-job",
tasks=[job_task_settings]
)
return run
jobs_runs_submit_flow("/Users/username@gmail.com/example.ipynb", name="Marvin")
Note, instead of using the built-in models, you may also input valid JSON. For example, AutoScale(min_workers=1, max_workers=2)
is equivalent to {"min_workers": 1, "max_workers": 2}
.
For more tips on how to use tasks and flows in a Collection, check out Using Collections!
Resources
If you encounter any bugs while using prefect-databricks
, feel free to open an issue in the prefect-databricks repository.
If you have any questions or issues while using prefect-databricks
, you can find help in either the Prefect Discourse forum or the Prefect Slack community.
Feel free to star or watch prefect-databricks
for updates too!
Contributing
If you'd like to help contribute to fix an issue or add a feature to prefect-databricks
, please propose changes through a pull request from a fork of the repository.
Here are the steps:
- Fork the repository
- Clone the forked repository
- Install the repository and its dependencies:
pip install -e ".[dev]"
- Make desired changes
- Add tests
- Insert an entry to CHANGELOG.md
- Install
pre-commit
to perform quality checks prior to commit:
pre-commit install
git commit
,git push
, and create a pull request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for prefect_databricks-0.2.11.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d75322222e3b8e5fb170b701509fd014a8dc5132ec7e2705b4f25c66d043b739 |
|
MD5 | d17e9394a6a429676770ee99d17245a3 |
|
BLAKE2b-256 | 182fc7db6e5cc7ee3a252e402317e2e26f51d23698108c7cc86141015b077dfa |
Hashes for prefect_databricks-0.2.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7325ad8eb5ca0e3ae58f128313f7ebfb8163c709e373cc41018aee3412b08df |
|
MD5 | 4ef8423d36e663555a7f1116576ad54d |
|
BLAKE2b-256 | f55c17836f1b595ae673c8f93bd4b0f69bbf08eb59b90692b5fcac063b195382 |