Pyspark tools for everyday use
Project description
pyspark-me
Pyspark and Databricks tools for everyday life
Synopsis
Create Databricks connection
# Get Databricks workspace connection
dbc = pysparkme.databricks.connect(
bearer_token='dapixyzabcd09rasdf',
url='https://westeurope.azuredatabricks.net')
DBFS
# Get list of items at path /FileStore
dbc.dbfs.ls('/FileStore')
# Check if file or directory exists
dbc.dbfs.exists('/path/to/heaven')
# Make a directory and it's parents
dbc.dbfs.mkdirs('/path/to/heaven')
# Delete a directory recusively
dbc.dbfs.rm('/path', recursive=True)
# Download file block starting 1024 with size 2048
dbc.dbfs.read('/data/movies.csv', 1024, 2048)
# Download entire file
dbc.dbfs.read_all('/data/movies.csv')
Databricks workspace
# List root workspace directory
dbc.workspace.ls('/')
# Check if workspace item exists
dbc.workspace.exists('/explore')
# Check if workspace item is a directory
dbc.workspace.is_directory('/')
# Export notebook in default (SOURCE) format
dbc.workspace.export('/my_notebook')
# Export notebook in HTML format
dbc.workspace.export('/my_notebook', 'HTML')
Databricks CLI
Get CLI help
python -m pysparkme.databricks.cli --help
Export the whole Databricks workspace into a directory explore/export
.
Databricks token is taken from DATABRICKS_BEARER_TOKEN
environment variable.
python -m pysparkme.databricks.cli workspace export -o explore/export ''
DBFS
# List items on DBFS
python -m pysparkme.databricks.cli dbfs ls --json-indent 2 ''
# Download a file and print to STDOUT
python -m pysparkme.databricks.cli dbfs get ml-latest-small/movies.csv
# Download recursively entire directory and store locally
python -m pysparkme.databricks.cli dbfs get -o ml-local ml-latest-small
Build and publish
python setup.py sdist bdist_wheel
python -m twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark-me-0.0.5.tar.gz
(8.6 kB
view hashes)
Built Distribution
pyspark_me-0.0.5-py3-none-any.whl
(19.9 kB
view hashes)
Close
Hashes for pyspark_me-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | feb3a0913626c12a5d0a58ac126698fb0e5f28d4b03ed657d27c0f4f9301b272 |
|
MD5 | fb36649e95d215288907aaff86bb3a62 |
|
BLAKE2b-256 | 77c5a1340daf66f8ef1491576445042d9d765546d0d57181ba9873336ab11cd0 |