Skip to main content

Ease-of-use utility tools for databricks notebooks.

Project description

databricks-utils

Python version Pyspark version Build Status

databricks-utils is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

Installation

pip install databricks-utils

Features

  • S3Bucket class to easily interact with a S3 bucket via dbfs and databricks spark.

  • vega_embed to render charts from Vega and Vega-Lite specifications.

Documentation

API documentation can be found at https://e2fyi.github.io/databricks-utils/.

Quick start

S3Bucket

import json
from databricks_utils.aws import S3Bucket

bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
          .allow_spark()
          .mount("s3/somebucketname"))

# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")

# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))

# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))

# umount
bucket.umount()

Vega
Vega and Vega-Lite are high-level grammars of interactive graphics. They provide concise JSON syntax for rapidly generating visualizations to support analysis.

from databricks_utils.vega import vega_embed

# vega-lite spec for a bar chart
spec = {
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
  }
}

# plot out the vega chart in databricks notebook
vega_embed(spec=spec, plot=True)

Developer

# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-utils-0.0.3.tar.gz (4.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page