Skip to main content

Ease-of-use utility tools for databricks notebooks.

Project description

databricks-utils

Python version Pyspark version Build Status

databricks-utils is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

Installation

pip install databricks-utils

Features

  • S3Bucket class to easily interact with a S3 bucket via dbfs and databricks spark.

  • vega_embed to render charts from Vega and Vega-Lite specifications.

Documentation

API documentation can be found at https://e2fyi.github.io/databricks-utils/.

Quick start

S3Bucket

import json
from databricks_utils.aws import S3Bucket

# need to attach notebook's dbutils
# before S3Bucket can be used
S3Bucket.attach_dbutils(dbutils)

# create an instance of the s3 bucket
bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
          .allow_spark(sc) # local spark context
          .mount("somebucketname")) # mount location name (resolves as `/mnt/somebucketname`)

# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")

# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))

# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))

# umount
bucket.umount()

Vega
Vega and Vega-Lite are high-level grammars of interactive graphics. They provide concise JSON syntax for rapidly generating visualizations to support analysis.

from databricks_utils.vega import vega_embed

# vega-lite spec for a bar chart
spec = {
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
  }
}

# plot out the vega chart in databricks notebook
displayHTML(vega_embed(spec=spec))

Developer

# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-utils-0.0.7.tar.gz (4.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page