Skip to main content

Ease-of-use utility tools for databricks notebooks.

Project description

databricks-utils

Python version Pyspark version Build Status

databricks-utils is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

Installation

pip install databricks-utils

Features

  • S3Bucket class to easily interact with a S3 bucket via dbfs and databricks spark.

  • vega_embed to render charts from Vega and Vega-Lite specifications.

Documentation

API documentation can be found at https://e2fyi.github.io/databricks-utils/.

Quick start

S3Bucket

import json
from databricks_utils.aws import S3Bucket

# need to attach notebook's dbutils
# before S3Bucket can be used
S3Bucket.attach_dbutils(dbutils)

# create an instance of the s3 bucket
bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
          .allow_spark(sc) # local spark context
          .mount("somebucketname")) # mount location name (resolves as `/mnt/somebucketname`)

# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")

# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))

# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))

# umount
bucket.umount()

Vega
Vega and Vega-Lite are high-level grammars of interactive graphics. They provide concise JSON syntax for rapidly generating visualizations to support analysis.

from databricks_utils.vega import vega_embed

# vega-lite spec for a bar chart
spec = {
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
  }
}

# plot out the vega chart in databricks notebook
displayHTML(vega_embed(spec=spec))

Developer

# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-utils-0.0.7.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file databricks-utils-0.0.7.tar.gz.

File metadata

File hashes

Hashes for databricks-utils-0.0.7.tar.gz
Algorithm Hash digest
SHA256 0dfe371cbdc65f29cebdb1ff99905b28addea579500ab3cf29a278c11f66b4ca
MD5 fe61aea95875a9ae324e75ecf832c792
BLAKE2b-256 89054e40e0546bd2415b3fb38eab0d7fd48bead8877cf6121b5e64dc5401c69b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page