Ease-of-use utility tools for databricks notebooks.
Project description
databricks-utils
databricks-utils
is a python package that provide several utility classes/func
that improve ease-of-use in databricks notebook.
Installation
pip install databricks-utils
Features
-
S3Bucket
class to easily interact with a S3 bucket viadbfs
and databricks spark. -
vega_embed
to render charts from Vega and Vega-Lite specifications.
Documentation
API documentation can be found at https://e2fyi.github.io/databricks-utils/.
Quick start
S3Bucket
import json
from databricks_utils.aws import S3Bucket
# need to attach notebook's dbutils
# before S3Bucket can be used
S3Bucket.attach_dbutils(dbutils)
# create an instance of the s3 bucket
bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
.allow_spark(sc) # local spark context
.mount("somebucketname")) # mount location name (resolves as `/mnt/somebucketname`)
# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")
# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))
# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))
# umount
bucket.umount()
Vega
Vega and Vega-Lite
are high-level grammars of interactive graphics. They provide concise JSON
syntax for rapidly generating visualizations to support analysis.
from databricks_utils.vega import vega_embed
# vega-lite spec for a bar chart
spec = {
"data": {
"values": [
{"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
{"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
{"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "ordinal"},
"y": {"field": "b", "type": "quantitative"}
}
}
# plot out the vega chart in databricks notebook
displayHTML(vega_embed(spec=spec))
Developer
# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file databricks-utils-0.0.7.tar.gz
.
File metadata
- Download URL: databricks-utils-0.0.7.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dfe371cbdc65f29cebdb1ff99905b28addea579500ab3cf29a278c11f66b4ca |
|
MD5 | fe61aea95875a9ae324e75ecf832c792 |
|
BLAKE2b-256 | 89054e40e0546bd2415b3fb38eab0d7fd48bead8877cf6121b5e64dc5401c69b |